Re: [PATCH] xfrm: policy: Restructure RCU-read locking in xfrm_sk_policy_lookup

From: Varad Gautam
Date: Mon Jun 21 2021 - 05:11:29 EST


On 6/21/21 10:29 AM, Steffen Klassert wrote:
> On Fri, Jun 18, 2021 at 04:11:01PM +0200, Varad Gautam wrote:
>> Commit "xfrm: policy: Read seqcount outside of rcu-read side in
>> xfrm_policy_lookup_bytype" [Linked] resolved a locking bug in
>> xfrm_policy_lookup_bytype that causes an RCU reader-writer deadlock on
>> the mutex wrapped by xfrm_policy_hash_generation on PREEMPT_RT since
>> 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated
>> lock").
>>
>> However, xfrm_sk_policy_lookup can still reach xfrm_policy_lookup_bytype
>> while holding rcu_read_lock(), as:
>> xfrm_sk_policy_lookup()
>> rcu_read_lock()
>> security_xfrm_policy_lookup()
>> xfrm_policy_lookup()
>
> Hm, I don't see that call chain. security_xfrm_policy_lookup() calls
> a hook with the name xfrm_policy_lookup. The only LSM that has
> registered a function to that hook is selinux. It registers
> selinux_xfrm_policy_lookup() and I don't see how we can call
> xfrm_policy_lookup() from there.
>
> Did you actually trigger that bug?
>

Right, I misread the call chain - security_xfrm_policy_lookup does not reach
xfrm_policy_lookup, making this patch unnecessary. The bug I have is:

T1, holding hash_resize_mutex and sleeping inside synchronize_rcu:

__schedule
schedule
schedule_timeout
wait_for_completion
__wait_rcu_gp
synchronize_rcu
xfrm_hash_resize

And T2 producing RCU-stalls since it blocked on the mutex:

__schedule
schedule
__rt_mutex_slowlock
rt_mutex_slowlock_locked
rt_mutex_slowlock
xfrm_policy_lookup_bytype.constprop.77
__xfrm_policy_check
udpv6_queue_rcv_one_skb
__udp6_lib_rcv
ip6_protocol_deliver_rcu
ip6_input_finish
ip6_input
ip6_mc_input
ipv6_rcv
__netif_receive_skb_one_core
process_backlog
net_rx_action
__softirqentry_text_start
__local_bh_enable_ip
ip6_finish_output2
ip6_output
ip6_send_skb
udp_v6_send_skb
udpv6_sendmsg
sock_sendmsg
____sys_sendmsg
___sys_sendmsg
__sys_sendmsg
do_syscall_64

So, despite the patch here [1], there is another way to reach
xfrm_policy_lookup_bytype within an RCU-read side - which on PREEMPT_RT will
deadlock with xfrm_hash_resize. Does softirq processing on RT happen within
rcu_read_lock/unlock - this would explain the stalls.

[1] https://lore.kernel.org/r/20210528160407.32127-1-varad.gautam@xxxxxxxx/

Regards,
Varad

--
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany

HRB 36809, AG Nürnberg
Geschäftsführer: Felix Imendörffer