Re: lockdep + kasan bug?

From: Peter Zijlstra
Date: Tue Nov 21 2023 - 06:41:51 EST


On Tue, Nov 21, 2023 at 11:14:37AM +0000, Mark Rutland wrote:

> > > 05117 The buggy address belongs to the variable:
> > > 05117 nr_large_chain_blocks+0x3c/0x40
> >
> > This is weird, nr_lage_chain_blocks is a single variable, if the
> > compiler keeps layout according to the source file, this would be
> > chaing_block_bucket[14] or something weird like that.
>
> I think the size here is bogus; IIUC that's determined form the start of the
> next symbol, which happens to be 64 bytes away from the start of
> nr_lage_chain_blocks.
>
> From the memory state dump, there's padding/redzone between two global objects,
> and I think we're accessing a negative offset from the next object. More on
> that below.
>
> > Perhaps figure out what it things the @size argument to
> > add_chain_block() would be?
> >
> > > 05117
> > > 05117 The buggy address belongs to the virtual mapping at
> > > 05117 [ffffffc081710000, ffffffc088861000) created by:
> > > 05117 paging_init+0x260/0x820
> > > 05117
> > > 05117 The buggy address belongs to the physical page:
> > > 05117 page:00000000ce625900 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x41d7a
> > > 05117 flags: 0x4000(reserved|zone=0)
> > > 05117 page_type: 0xffffffff()
> > > 05117 raw: 0000000000004000 fffffffe00075e88 fffffffe00075e88 0000000000000000
> > > 05117 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > > 05117 page dumped because: kasan: bad access detected
> > > 05117
> > > 05117 Memory state around the buggy address:
> > > 05117 ffffffc081b7a780: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
> > > 05117 ffffffc081b7a800: 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9
> > > 05117 >ffffffc081b7a880: 04 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
> > > 05117 ^
>
> In this dump:
>
> * '00' means all 8 bytes of an 8-byte region areaccessible
> * '04' means the first 4 bytes on an 8-byte region are accessible
> * 'f9' means KASAN_GLOBAL_REDZONE / padding between objects
>
> So at 0xffffffc081b7a880 we have a 4-byte object, 60 bytes of padding, then a
> 64-byte object.
>
> I think the 4-byte object at 0xffffffc081b7a880 is nr_large_chain_blocks, and
> the later 64-byte object is chain_block_buckets[].

Oh! That's very helpful, thanks!

> I suspect the dodgy access is to chain_block_buckets[-1], which hits the last 4
> bytes of the redzone and gets (incorrectly/misleadingly) attributed to
> nr_large_chain_blocks.

That would mean @size == 0, at which point size_to_bucket() returns -1
and the above happens.

alloc_chain_hlocks() has 'size - req', for the first with the
precondition 'size >= rq', which allows the 0.

The second is an iteration with the condition size > req, which does not
allow the 0 case.

So the first, thing, IIRC, this is trying to split a block,
del_chain_block() takes what we need, and add_chain_block() puts back
the remainder, except in the above case the remainder is 0 sized and
things go sideways or so.

Does the below help?

---
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e85b5ad3e206..151bd3de5936 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3497,7 +3497,8 @@ static int alloc_chain_hlocks(int req)
size = chain_block_size(curr);
if (likely(size >= req)) {
del_chain_block(0, size, chain_block_next(curr));
- add_chain_block(curr + req, size - req);
+ if (size > req)
+ add_chain_block(curr + req, size - req);
return curr;
}
}