Re: [PATCH] SUNRPC: Fix UAF in svc_tcp_listen_data_ready()

From: Chuck Lever III
Date: Fri May 19 2023 - 10:54:35 EST




> On May 15, 2023, at 10:55 AM, Ding Hui <dinghui@xxxxxxxxxxxxxx> wrote:
>
> On 2023/5/15 20:01, Chuck Lever III wrote:
>>> On May 14, 2023, at 10:13 PM, Ding Hui <dinghui@xxxxxxxxxxxxxx> wrote:
>>>
>>> After the listener svc_sock be freed, and before invoking svc_tcp_accept()
>>> for the established child sock, there is a window that the newsock
>>> retaining a freed listener svc_sock in sk_user_data which cloning from
>>> parent.
>> Thank you, I will apply this (after testing it).
>> The next step is to figure out why SUNRPC is trying to accept
>> on a dead listener. Any thoughts about that?
>
> A child sock is cloned from the listener sock, it inherits sk_data_ready
> and sk_user_data from its parent sock, which is svc_tcp_listen_data_ready()
> and listener svc_sock, the sk_state of the child becomes ESTABLISHED once
> after TCP handshake in protocol stack.
>
> Case 1:
>
> listener sock | child sock | nfsd thread
> =>sk_data_ready | =>sk_data_ready |
> -------------------+-----------------------+----------------------
> svc_tcp_listen_data_ready
> svsk is listener svc_sock
> set_bit(XPT_CONN)
> svc_recv
> svc_tcp_accept(listener)
> kernel_accept get child sock as newsock
> svc_setup_socket(newsock)
> newsock->sk_data_ready=svc_data_ready
> newsock->sk_user_data=newsvsk
> svc_data_ready
> svsk is newsvsk
>
>
> Case 2:
>
> listener sock | child sock | nfsd thread
> =>sk_data_ready | =>sk_data_ready |
> -------------------+-----------------------+----------------------
> svc_tcp_listen_data_ready
> svsk is listener svc_sock
> set_bit(XPT_CONN)
> svc_tcp_listen_data_ready
> svsk is listener svc_sock
> svc_recv
> svc_tcp_accept(listener)
> kernel_accept get the child sock as newsock
> svc_setup_socket(newsock)
> newsock->sk_data_ready=svc_data_ready
> newsock->sk_user_data=newsvsk
> svc_data_ready
> svsk is newsvsk
>
>
> The UAF case:
>
> listener sock | child sock | rpc.nfsd 0
> =>sk_data_ready | =>sk_data_ready |
> -------------------+-----------------------+----------------------
> svc_tcp_listen_data_ready
> svsk is listener svc_sock
> set_bit(XPT_CONN)
> svc_xprt_destroy_all
> svc_xprt_free
> kfree listener svc_sock
> // the child sock has not yet been accepted,
> // so it is not managed by SUNRPC for now.
> svc_tcp_listen_data_ready
> svsk is listener svc_sock
> svsk->sk_odata // UAF!

Thanks for the ladder diagrams, that helps.


>>> In the race windows if data is received on the newsock, we will
>>> observe use-after-free report in svc_tcp_listen_data_ready().
>>>
>>> Reproduce by two tasks:
>>>
>>> 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done
>>> 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done
>> I will continue attempting to reproduce, as I would like a
>> root cause for this issue.

svc_xprt shutdown seems to be unordered. I would think that
it should unregister with the portmapper, close the listener
so no new connections can be established, then close the
children last. That doesn't appear to be how it works, though.

But if the listener happens to be closed first, that frees
the svc_sock that might be relied upon by children sockets.
The shutdown ordering seems to be the crux of the problem,
and pinning the svc_xprt won't help, as you pointed out.

Making the children not rely on the listener does address
the crash, but doesn't fix the design. But the design
problem doesn't seem to be urgent. So I'm going to file a
low priority bug to document the design issue.


>>> KASAN report:
>>>
>>> ==================================================================
>>> BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc]
>>> Read of size 8 at addr ffff888139d96228 by task nc/102553
>>> CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18
>>> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> Call Trace:
>>> <IRQ>
>>> dump_stack_lvl+0x33/0x50
>>> print_address_description.constprop.0+0x27/0x310
>>> print_report+0x3e/0x70
>>> kasan_report+0xae/0xe0
>>> svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc]
>>> tcp_data_queue+0x9f4/0x20e0
>>> tcp_rcv_established+0x666/0x1f60
>>> tcp_v4_do_rcv+0x51c/0x850
>>> tcp_v4_rcv+0x23fc/0x2e80
>>> ip_protocol_deliver_rcu+0x62/0x300
>>> ip_local_deliver_finish+0x267/0x350
>>> ip_local_deliver+0x18b/0x2d0
>>> ip_rcv+0x2fb/0x370
>>> __netif_receive_skb_one_core+0x166/0x1b0
>>> process_backlog+0x24c/0x5e0
>>> __napi_poll+0xa2/0x500
>>> net_rx_action+0x854/0xc90
>>> __do_softirq+0x1bb/0x5de
>>> do_softirq+0xcb/0x100
>>> </IRQ>
>>> <TASK>
>>> ...
>>> </TASK>
>>>
>>> Allocated by task 102371:
>>> kasan_save_stack+0x1e/0x40
>>> kasan_set_track+0x21/0x30
>>> __kasan_kmalloc+0x7b/0x90
>>> svc_setup_socket+0x52/0x4f0 [sunrpc]
>>> svc_addsock+0x20d/0x400 [sunrpc]
>>> __write_ports_addfd+0x209/0x390 [nfsd]
>>> write_ports+0x239/0x2c0 [nfsd]
>>> nfsctl_transaction_write+0xac/0x110 [nfsd]
>>> vfs_write+0x1c3/0xae0
>>> ksys_write+0xed/0x1c0
>>> do_syscall_64+0x38/0x90
>>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>
>>> Freed by task 102551:
>>> kasan_save_stack+0x1e/0x40
>>> kasan_set_track+0x21/0x30
>>> kasan_save_free_info+0x2a/0x50
>>> __kasan_slab_free+0x106/0x190
>>> __kmem_cache_free+0x133/0x270
>>> svc_xprt_free+0x1e2/0x350 [sunrpc]
>>> svc_xprt_destroy_all+0x25a/0x440 [sunrpc]
>>> nfsd_put+0x125/0x240 [nfsd]
>>> nfsd_svc+0x2cb/0x3c0 [nfsd]
>>> write_threads+0x1ac/0x2a0 [nfsd]
>>> nfsctl_transaction_write+0xac/0x110 [nfsd]
>>> vfs_write+0x1c3/0xae0
>>> ksys_write+0xed/0x1c0
>>> do_syscall_64+0x38/0x90
>>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>
>>> Fix the UAF by simply doing nothing in svc_tcp_listen_data_ready()
>>> if state != TCP_LISTEN, that will avoid dereferencing svsk for all
>>> child socket.
>>>
>>> Link: https://lore.kernel.org/lkml/20230507091131.23540-1-dinghui@xxxxxxxxxxxxxx/
>>> Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding")
>>> Signed-off-by: Ding Hui <dinghui@xxxxxxxxxxxxxx>
>>> Cc: <stable@xxxxxxxxxxxxxxx>
>>> ---
>>> net/sunrpc/svcsock.c | 23 +++++++++++------------
>>> 1 file changed, 11 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index a51c9b989d58..9aca6e1e78e4 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -825,12 +825,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>>>
>>> trace_sk_data_ready(sk);
>>>
>>> - if (svsk) {
>>> - /* Refer to svc_setup_socket() for details. */
>>> - rmb();
>>> - svsk->sk_odata(sk);
>>> - }
>>> -
>>> /*
>>> * This callback may called twice when a new connection
>>> * is established as a child socket inherits everything
>>> @@ -839,13 +833,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>>> * when one of child sockets become ESTABLISHED.
>>> * 2) data_ready method of the child socket may be called
>>> * when it receives data before the socket is accepted.
>>> - * In case of 2, we should ignore it silently.
>>> + * In case of 2, we should ignore it silently and DO NOT
>>> + * dereference svsk.
>>> */
>>> - if (sk->sk_state == TCP_LISTEN) {
>>> - if (svsk) {
>>> - set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
>>> - svc_xprt_enqueue(&svsk->sk_xprt);
>>> - }
>>> + if (sk->sk_state != TCP_LISTEN)
>>> + return;
>>> +
>>> + if (svsk) {
>>> + /* Refer to svc_setup_socket() for details. */
>>> + rmb();
>>> + svsk->sk_odata(sk);
>>> + set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
>>> + svc_xprt_enqueue(&svsk->sk_xprt);
>>> }
>>> }
>>>
>>> --
>>> 2.17.1
>>>
>> --
>> Chuck Lever
>
> --
> Thanks,
> -dinghui


--
Chuck Lever