Re: KASAN: slab-use-after-free Read in radix_tree_lookup in&after Linux Kernel 6.4-rc6

From: Qi Zheng
Date: Fri Oct 20 2023 - 11:19:29 EST


Hi Matthew,

On 2023/10/20 22:58, Matthew Wilcox wrote:
On Fri, Oct 20, 2023 at 09:51:18PM +0800, Qi Zheng wrote:
Hi all,

On 2023/10/20 20:34, Matthew Wilcox wrote:
On Fri, Oct 20, 2023 at 10:26:31AM +0200, Petr Mladek wrote:
Adding Matthew into Cc in the hope that he is still familiar with the
code. Also adding Andrew who accepts patches.

oh joy. i love dealing with cves.

I agree, this issue looks to be in kernel-core radix tree code in ./lib/radix-tree.c in two of any places.

the radix tree code is the victim here. maybe also the perpetrator, but
it's rather hard to say.

shrink_slab_memcg()
down_read_trylock(&shrinker_rwsem)
shrinker = idr_find(&shrinker_idr, i);

i assume is the path to this bug. the reporter didn't run the
stacktrace through scripts/decode_stacktrace.sh so it's less useful than
we might want.

prealloc_memcg_shrinker() calls idr_alloc and idr_remove under
shrinker_rwsem in write mode, so that should be fine.

unregister_memcg_shrinker() calls idr_remove after asserting &shrinker_rwsem
is held (although not asserting that it's held for write ... hmm ... but
both callers appear to hold it for write anyway)

so i don't see why we'd get a UAF here.

anyway, adding Qi Zheng to the cc since they're responsible for the
shrinker code.

Thanks for CC'ing me, I'd be happy to troubleshoot any issues that may
be shrinker related.

Between v6.4-rc1 and v6.4 versions, we briefly implemented lockless slab
shrink using the SRCU method. In these versions, we call idr_alloc and
idr_remove under shrinker_mutex, and idr_find under srcu_read_lock.

These are all legitimate uses of the IDR APIs and the shrinker_idr
will never be destroyed, so at a quick glance I didn't see why it would
cause UAF here.

I'm not an expert on how all the RCU flavours interact, but I don't
think that's safe. The IDR (radix tree) will RCU-free nodes, but I
don't think holding the srcu_read_lock is enough to prevent the nodes
being freed.

Oh, Indeed, I just saw the Documeentation/RCU/checklist.rst:

```
If the updater uses call_rcu() or synchronize_rcu(), then
the corresponding readers may use: (1) rcu_read_lock() and
rcu_read_unlock(), (2) any pair of primitives that disables
and re-enables softirq, for example, rcu_read_lock_bh() and
rcu_read_unlock_bh(), or (3) any pair of primitives that disables
and re-enables preemption, for example, rcu_read_lock_sched() and
rcu_read_unlock_sched(). If the updater uses synchronize_srcu()
or call_srcu(), then the corresponding readers must use
srcu_read_lock() and srcu_read_unlock(), and with the same
srcu_struct.
```

I think you'd need to take the rcu_read_lock() around
the call to idr_find().
For latest RCU+refcount method, we call idr_find() under
rcu_read_lock(), so it's safe.

Thanks,
Qi


Anyway I will keep working on this issue, and it would be nice if
there was a way to reproduce it.

So I think the CVE is inappropriately issued. The SRCU code was added in
v6.4-rc1 and removed before v6.4. I don't think CVEs are appropriate for
bugs which only existed in development kernels. How do we revoke CVEs?