Re: [PATCH v1 1/1] xarray: fix the data-race in xas_find_chunk() by using READ_ONCE()

From: Mirsad Todorovac
Date: Mon Sep 18 2023 - 06:21:06 EST


On 9/18/23 11:41, Jan Kara wrote:
On Mon 18-09-23 06:47:40, Mirsad Goran Todorovac wrote:
KCSAN has discovered the following data-race:

[ 206.510010] ==================================================================
[ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked

[ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
[ 206.510081] xas_clear_mark (./arch/x86/include/asm/bitops.h:178 ./include/asm-generic/bitops/instrumented-non-atomic.h:115 lib/xarray.c:102 lib/xarray.c:914)
[ 206.510097] __xa_clear_mark (lib/xarray.c:1923)
[ 206.510114] __folio_end_writeback (mm/page-writeback.c:2981)
[ 206.510128] folio_end_writeback (mm/filemap.c:1616)
[ 206.510143] end_page_writeback (mm/folio-compat.c:28)
[ 206.510155] btrfs_page_clear_writeback (fs/btrfs/subpage.c:646) btrfs
[ 206.510994] end_bio_extent_writepage (./include/linux/bio.h:84 fs/btrfs/extent_io.c:542) btrfs
[ 206.511817] __btrfs_bio_end_io (fs/btrfs/bio.c:117 fs/btrfs/bio.c:112) btrfs
[ 206.512640] btrfs_orig_bbio_end_io (fs/btrfs/bio.c:164) btrfs
[ 206.513497] btrfs_simple_end_io (fs/btrfs/bio.c:380) btrfs
[ 206.514350] bio_endio (block/bio.c:1617)
[ 206.514362] blk_mq_end_request_batch (block/blk-mq.c:837 block/blk-mq.c:1073)
[ 206.514377] nvme_pci_complete_batch (drivers/nvme/host/pci.c:986) nvme
[ 206.514437] nvme_irq (drivers/nvme/host/pci.c:1086) nvme
[ 206.514500] __handle_irq_event_percpu (kernel/irq/handle.c:158)
[ 206.514517] handle_irq_event (kernel/irq/handle.c:195 kernel/irq/handle.c:210)
[ 206.514533] handle_edge_irq (kernel/irq/chip.c:836)
[ 206.514549] __common_interrupt (./include/linux/irqdesc.h:161 arch/x86/kernel/irq.c:238 arch/x86/kernel/irq.c:257)
[ 206.514563] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
[ 206.514583] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636)
[ 206.514599] kcsan_setup_watchpoint (kernel/kcsan/core.c:705 (discriminator 1))
[ 206.514612] __tsan_read8 (kernel/kcsan/core.c:1025)
[ 206.514626] steal_from_bitmap.part.0 (./include/linux/find.h:186 fs/btrfs/free-space-cache.c:2557 fs/btrfs/free-space-cache.c:2613) btrfs
[ 206.515491] __btrfs_add_free_space (fs/btrfs/free-space-cache.c:2689 fs/btrfs/free-space-cache.c:2667) btrfs
[ 206.516361] btrfs_add_free_space_async_trimmed (fs/btrfs/free-space-cache.c:2798) btrfs
[ 206.517231] add_new_free_space (fs/btrfs/block-group.c:550) btrfs
[ 206.518095] load_free_space_tree (fs/btrfs/free-space-tree.c:1595 fs/btrfs/free-space-tree.c:1658) btrfs
[ 206.518953] caching_thread (fs/btrfs/block-group.c:873) btrfs
[ 206.519800] btrfs_work_helper (fs/btrfs/async-thread.c:314) btrfs
[ 206.520643] process_one_work (kernel/workqueue.c:2600)
[ 206.520658] worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2752)
[ 206.520672] kthread (kernel/kthread.c:389)
[ 206.520684] ret_from_fork (arch/x86/kernel/process.c:145)
[ 206.520701] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)

[ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
[ 206.520735] xas_find_marked (./include/linux/xarray.h:1706 lib/xarray.c:1354)
[ 206.520750] filemap_get_folios_tag (mm/filemap.c:1975 mm/filemap.c:2273)
[ 206.520763] __filemap_fdatawait_range (mm/filemap.c:519)
[ 206.520777] filemap_fdatawait_range (mm/filemap.c:556)
[ 206.520790] btrfs_wait_ordered_range (fs/btrfs/ordered-data.c:839) btrfs
[ 206.521641] btrfs_sync_file (fs/btrfs/file.c:1859) btrfs
[ 206.522495] vfs_fsync_range (fs/sync.c:188)
[ 206.522509] __x64_sys_fsync (./include/linux/file.h:45 fs/sync.c:213 fs/sync.c:220 fs/sync.c:218 fs/sync.c:218)
[ 206.522522] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 206.522535] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[ 206.522557] value changed: 0xfffffffffff80000 -> 0xfffffffffff00000

[ 206.522574] Reported by Kernel Concurrency Sanitizer on:
[ 206.522585] CPU: 6 PID: 2793 Comm: tracker-extract Tainted: G L 6.5.0-rc6+ #44
[ 206.522600] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 206.522608] ==================================================================

Thanks for working on this. I guess the full KCSAN warning isn't that
useful in the changelog. Rather I'd spend more time explaining the real
problem here ...

As Jan Kara explained, the problem is in the function xas_find_chuck():

/* Private */
static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance,
xa_mark_t mark)
{
unsigned long *addr = xas->xa_node->marks[(__force unsigned)mark];
unsigned int offset = xas->xa_offset;

if (advance)
offset++;
if (XA_CHUNK_SIZE == BITS_PER_LONG) {
if (offset < XA_CHUNK_SIZE) {
→ unsigned long data = *addr & (~0UL << offset);
if (data)
return __ffs(data);

... which is that xas_find_chunk() is called only under RCU protection and
thus the two uses of 'data' in the above code can yield different results.

}
return XA_CHUNK_SIZE;
}

return find_next_bit(addr, XA_CHUNK_SIZE, offset);
}

In particular, the line

unsigned long data = *addr & (~0UL << offset);

contains a data race that is best avoided using READ_ONCE(), which eliminated the KCSAN
data-race warning completely.

Yes, this improves the situation for xarray use on 64-bit architectures but
doesn't fix cases on 32-bit archs or if CONFIG_BASE_SMALL is set. As I
mentioned in my previous reply, I'd rather:

1) Fix find_next_bit(), find_first_bit() and related functions in
lib/find_bit.c to use READ_ONCE() - such as _find_first_bit() etc. It is
quite some churn but I don't see how else to make these functions safe when
the underlying contents can change.

Thank you for your review.

I assume you have the big picture, but just a stupid question:

if (XA_CHUNK_SIZE == BITS_PER_LONG) {
if (offset < XA_CHUNK_SIZE) {
unsigned long data = READ_ONCE(*addr) & (~0UL << offset);
if (data)
return __ffs(data);
}
return XA_CHUNK_SIZE;
}

I would hate to argue, but ...

Wouldn't BITS_PER_LONG simply change to 32 on 32-bit architectures?

Is there something I am missing?

From include/asm-generic/bitsperlong.h:
----------------------------------------
#ifdef CONFIG_64BIT
#define BITS_PER_LONG 64
#else
#define BITS_PER_LONG 32
#endif /* CONFIG_64BIT */

About the CONFIG_BASE_SMALL I cannot tell:
----------------------------------------
#ifndef XA_CHUNK_SHIFT
#define XA_CHUNK_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)
#endif
#define XA_CHUNK_SIZE (1UL << XA_CHUNK_SHIFT)
#define XA_CHUNK_MASK (XA_CHUNK_SIZE - 1)
#define XA_MAX_MARKS 3
#define XA_MARK_LONGS DIV_ROUND_UP(XA_CHUNK_SIZE, BITS_PER_LONG)
----------------------------------------

I see why you would want find_next_bit() and find_first_bit() fixed, but I am not that deep
into those bitops, so I guess I cannot make this in one step ... Probably it would require
a lot of homework.

_find_*_bit() functions and/or macros cause quite a number of KCSAN BUG warnings:

95 _find_first_and_bit (lib/find_bit.c:114 (discriminator 10))
31 _find_first_zero_bit (lib/find_bit.c:125 (discriminator 10))
173 _find_next_and_bit (lib/find_bit.c:171 (discriminator 2))
655 _find_next_bit (lib/find_bit.c:133 (discriminator 2))
5 _find_next_zero_bit

... but I am simply not certain what is the right thing to do ATM about those and whether they
are false positives.

AFAICS, READ_ONCE() here solves the case of 64 and 32 architectures which is
an incremental step, and it works ... I am just not ready for an universal solution ATM.

2) Change xas_find_chunk() to unconditionally use find_next_bit() as the
special case XA_CHUNK_SIZE == BITS_PER_LONG seems pointless these days
because find_next_bit() is inline and does small_const_nbits(size) check.

I see your point. A generalised solution would of course be better. But from the report about
data-races in those functions it seems that they need a major rethink. It isn't that obvious
to me what should be READ_ONCE()-ed in a bit field ...

Those functions are extensively used throughout the kernel and I get the notion it is a job
for someone with more experience ...

Best regards,
Mirsad

Honza

Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@xxxxxxxxxxxx>
Suggested-by: Jan Kara <jack@xxxxxxx>
Fixes: b803b42823d0d ("xarray: Add XArray iterators")
Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Chris Mason <clm@xxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Josef Bacik <josef@xxxxxxxxxxxxxx>
Cc: David Sterba <dsterba@xxxxxxxx>
Cc: linux-btrfs@xxxxxxxxxxxxxxx
Cc: linux-fsdevel@xxxxxxxxxxxxxxx
Cc: linux-mm@xxxxxxxxx
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@xxxxxxxxxxxx>
---
v1: the proposed fix (RFC)

include/linux/xarray.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index cb571dfcf4b1..1715fd322d62 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1720,7 +1720,7 @@ static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance,
offset++;
if (XA_CHUNK_SIZE == BITS_PER_LONG) {
if (offset < XA_CHUNK_SIZE) {
- unsigned long data = *addr & (~0UL << offset);
+ unsigned long data = READ_ONCE(*addr) & (~0UL << offset);
if (data)
return __ffs(data);
}
--
2.34.1