Re: KASAN: slab-use-after-free Read in radix_tree_lookup in&after Linux Kernel 6.4-rc6

From: Matthew Wilcox
Date: Fri Oct 20 2023 - 10:58:18 EST


On Fri, Oct 20, 2023 at 09:51:18PM +0800, Qi Zheng wrote:
> Hi all,
>
> On 2023/10/20 20:34, Matthew Wilcox wrote:
> > On Fri, Oct 20, 2023 at 10:26:31AM +0200, Petr Mladek wrote:
> > > Adding Matthew into Cc in the hope that he is still familiar with the
> > > code. Also adding Andrew who accepts patches.
> >
> > oh joy. i love dealing with cves.
> >
> > > > > I agree, this issue looks to be in kernel-core radix tree code in ./lib/radix-tree.c in two of any places.
> >
> > the radix tree code is the victim here. maybe also the perpetrator, but
> > it's rather hard to say.
> >
> > shrink_slab_memcg()
> > down_read_trylock(&shrinker_rwsem)
> > shrinker = idr_find(&shrinker_idr, i);
> >
> > i assume is the path to this bug. the reporter didn't run the
> > stacktrace through scripts/decode_stacktrace.sh so it's less useful than
> > we might want.
> >
> > prealloc_memcg_shrinker() calls idr_alloc and idr_remove under
> > shrinker_rwsem in write mode, so that should be fine.
> >
> > unregister_memcg_shrinker() calls idr_remove after asserting &shrinker_rwsem
> > is held (although not asserting that it's held for write ... hmm ... but
> > both callers appear to hold it for write anyway)
> >
> > so i don't see why we'd get a UAF here.
> >
> > anyway, adding Qi Zheng to the cc since they're responsible for the
> > shrinker code.
>
> Thanks for CC'ing me, I'd be happy to troubleshoot any issues that may
> be shrinker related.
>
> Between v6.4-rc1 and v6.4 versions, we briefly implemented lockless slab
> shrink using the SRCU method. In these versions, we call idr_alloc and
> idr_remove under shrinker_mutex, and idr_find under srcu_read_lock.
>
> These are all legitimate uses of the IDR APIs and the shrinker_idr
> will never be destroyed, so at a quick glance I didn't see why it would
> cause UAF here.

I'm not an expert on how all the RCU flavours interact, but I don't
think that's safe. The IDR (radix tree) will RCU-free nodes, but I
don't think holding the srcu_read_lock is enough to prevent the nodes
being freed. I think you'd need to take the rcu_read_lock() around
the call to idr_find().

> Anyway I will keep working on this issue, and it would be nice if
> there was a way to reproduce it.

So I think the CVE is inappropriately issued. The SRCU code was added in
v6.4-rc1 and removed before v6.4. I don't think CVEs are appropriate for
bugs which only existed in development kernels. How do we revoke CVEs?