Re: net: GPF in __netlink_ns_capable

From: Eric W. Biederman
Date: Mon Jan 18 2016 - 15:36:07 EST



Apparently we have missed entirely the folks who added this chunk
of this code to the kernel on this thread, so adding them now.

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:

> Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> writes:
>
>> On Sat, Jan 16, 2016 at 01:08:33AM +0100, Richard Weinberger wrote:
>>> On Fri, Jan 15, 2016 at 11:31 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> > Call Trace:
>>> > [< inline >] netlink_ns_capable net/netlink/af_netlink.c:1417
>>> > [<ffffffff8529c0a5>] netlink_capable+0x25/0x30 net/netlink/af_netlink.c:1432
>>>
>>> Hmm, we're crashing because NETLINK_CB(skb).sk is NULL.
>>> netlink_dump() creates a new skb without a netlink control block,
>>> but infiniband's dump functions use netlink_capable() which needs a valid
>>> NETLINK_CB(skb).sk.
>>>
>>> What about something like that?
>>
>> No you can't do it here as netlink_unicast also calls this and for
>> that case you'd be overwriting the original sending user-space
>> socket with the kernel socket.
>>
>> I'm adding Eric Bierderman as he wrote some of the code in question.
>
> *Scratches my head*
>
> I think I am just going to recommend removing that bit of code from
> the infiniband stack.
>
> There are several things very wrong here.
>
> First netlinnk_capable and it's ilk are for the very specific purpose of
> handling backwards compatibility as a truly clean solution of checking
> at open or connect time would break existing applications.
> ib_nl_handle_resolv_resp is new code. So it can almost certainly do
> something cleaner.
>
> netlink_capable is very much not for filtering netlink dumps, but for
> filtering the queries themselves. So it appears the capability check is
> very much in the wrong place.
>
> All of this is newish code and apparently never even tested as this code
> should have failed this way for everyone. So since the code does not
> work not apparently has never worked, it is probably easiest just to
> remove the problematic code (AKA revert), and start fresh and not
> something that requires backwards compatibility hacks from day one.
>
> By new I mean code that came in through the commit below.
>
> commit 2ca546b92a024d07adedd15b4c262b1c2c0786ec
> Author: Kaike Wan <kaike.wan@xxxxxxxxx>
> Date: Fri Aug 14 08:52:09 2015 -0400
>
> IB/sa: Route SA pathrecord query through netlink
>
> This patch routes a SA pathrecord query to netlink first and processes the
> response appropriately. If a failure is returned, the request will be sent
> through IB. The decision whether to route the request to netlink first is
> determined by the presence of a listener for the local service netlink
> multicast group. If the user-space local service netlink multicast group
> listener is not present, the request will be sent through IB, just like
> what is currently being done.
>
> Signed-off-by: Kaike Wan <kaike.wan@xxxxxxxxx>
> Signed-off-by: John Fleck <john.fleck@xxxxxxxxx>
> Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>

What was this code trying to do with netlink_capable besides oops the
kernel?

Eric