[PATCH v1 1/1] poll: fix the data race in the use of pwq->triggered in poll_schedule_timeout()

From: Mirsad Goran Todorovac
Date: Mon Sep 18 2023 - 00:50:08 EST


KCSAN had discovered the following data-race:

[ 139.315774] ==================================================================
[ 139.315798] BUG: KCSAN: data-race in poll_schedule_timeout.constprop.0 / pollwake

[ 139.315830] write to 0xffffc90003f3fb60 of 4 bytes by task 1848 on cpu 6:
[ 139.315843] pollwake+0xc0/0x110
[ 139.315860] __wake_up_common+0x7a/0x150
[ 139.315877] __wake_up_common_lock+0x7f/0xd0
[ 139.315893] __wake_up_sync_key+0x20/0x50
[ 139.315905] sock_def_readable+0x67/0x160
[ 139.315917] unix_stream_sendmsg+0x35f/0x990
[ 139.315932] sock_sendmsg+0x15d/0x170
[ 139.315947] ____sys_sendmsg+0x3d5/0x500
[ 139.315962] ___sys_sendmsg+0x9e/0x100
[ 139.315976] __sys_sendmsg+0x6f/0x100
[ 139.315990] __x64_sys_sendmsg+0x47/0x60
[ 139.316005] do_syscall_64+0x5d/0xa0
[ 139.316022] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

[ 139.316043] read to 0xffffc90003f3fb60 of 4 bytes by task 1877 on cpu 18:
[ 139.316055] poll_schedule_timeout.constprop.0+0x4e/0xc0
[ 139.316071] do_sys_poll+0x50d/0x760
[ 139.316081] __x64_sys_poll+0x5f/0x210
[ 139.316091] do_syscall_64+0x5d/0xa0
[ 139.316105] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

[ 139.316125] value changed: 0x00000000 -> 0x00000001

[ 139.316143] Reported by Kernel Concurrency Sanitizer on:
[ 139.316153] CPU: 18 PID: 1877 Comm: gdbus Tainted: G L 6.6.0-rc1-kcsan-00269-ge789286468a9-dirty #3
[ 139.316167] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 139.316177] ==================================================================

The data race appears to be here in poll_schedule_timeout():

fs/select.c:
237 static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
238 ktime_t *expires, unsigned long slack)
239 {
240 int rc = -EINTR;
241
242 set_current_state(state);
→ 243 if (!pwq->triggered)
244 rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
245 __set_current_state(TASK_RUNNING);
246
247 /*
248 * Prepare for the next iteration.
249 *
250 * The following smp_store_mb() serves two purposes. First, it's
251 * the counterpart rmb of the wmb in pollwake() such that data
252 * written before wake up is always visible after wake up.
253 * Second, the full barrier guarantees that triggered clearing
254 * doesn't pass event check of the next iteration. Note that
255 * this problem doesn't exist for the first iteration as
256 * add_wait_queue() has full barrier semantics.
257 */
258 smp_store_mb(pwq->triggered, 0);
259
260 return rc;
261 }

The problem seems to be fixed by using READ_ONCE() around pwq->triggered, which
silences the KCSAN warning:

→ if (!READ_ONCE(pwq->triggered))
rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);

This is a quick fix that removes the symptom, but probably more isses need to be
observed around the use of pwq->triggered.

Having the value of pwq->triggered changed under one's fingers obviously has the
effect of the wrong branch in the "if" statement and wrong schedule_hrtimeout_range()
invocation.

Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@xxxxxxxxxxxx>
Fixes: 5f820f648c92a ("poll: allow f_op->poll to sleep")
Cc: Tejun Heo <htejun@xxxxxxxxx>
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Christian Brauner <brauner@xxxxxxxxxx>
Cc: linux-fsdevel@xxxxxxxxxxxxxxx
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@xxxxxxxxxxxx>
---
v1:
the proposed fix (RFC)

fs/select.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/select.c b/fs/select.c
index 0ee55af1a55c..38e12084daf1 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -240,7 +240,7 @@ static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
int rc = -EINTR;

set_current_state(state);
- if (!pwq->triggered)
+ if (!READ_ONCE(pwq->triggered))
rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
__set_current_state(TASK_RUNNING);

--
2.34.1