[PATCH] netlink: Fix the netlink socket malfunction due to concurrency

From: Qingjie Xing
Date: Fri Aug 18 2023 - 14:43:12 EST


The concurrent Invocation of netlink_attachskb() and netlink_recvmsg()
on different CPUs causes malfunction of netlink socket.

The concurrent scenario of netlink_recvmsg() and netlink_attachskb()
as following:

CPU A CPU B
======== ========
netlink_recvmsg() netlink_attachskb()
[1]bit NETLINK_S_CONGESTED is set
netlink_overrun()
netlink_rcv_wake()
[2]sk_receive_queue is empty
clear bit NETLINK_S_CONGESTED
[3]NETLINK_F_RECV_NO_ENOBUFS not set
set bit NETLINK_S_CONGESTED

In this scenario, the socket's receive queue is empty. Additionally,
due to the NETLINK_S_CONGESTED flag being set, all packets sent to
this socket are discarded.

To prevent this situation, we need to introduce a check for whether
the socket receive buffer is full before setting the NETLINK_S_CONGESTED
flag.

Signed-off-by: Qingjie Xing <xqjcool@xxxxxxxxx>
---
net/netlink/af_netlink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 383631873748..80bcce9acbfc 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -352,7 +352,8 @@ static void netlink_overrun(struct sock *sk)
struct netlink_sock *nlk = nlk_sk(sk);

if (!(nlk->flags & NETLINK_F_RECV_NO_ENOBUFS)) {
- if (!test_and_set_bit(NETLINK_S_CONGESTED,
+ if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf
+ && !test_and_set_bit(NETLINK_S_CONGESTED,
&nlk_sk(sk)->state)) {
sk->sk_err = ENOBUFS;
sk_error_report(sk);
--
2.41.0