Re: PROBLEM: SLAB use-after-free with ceph(fs)

From: Jeff Layton
Date: Tue Jan 04 2022 - 07:29:12 EST


On Tue, 2022-01-04 at 13:20 +0100, Bastian Blank wrote:
> Hi
>
> On Tue, Jan 04, 2022 at 07:00:31AM -0500, Jeff Layton wrote:
> > On Tue, 2022-01-04 at 10:49 +0100, Bastian Blank wrote:
> > > > [152791.777458] cache_from_obj: Wrong slab cache. jbd2_journal_handle but object is from kmalloc-256
>
> > At first blush, this looks like the same problem as:
> > https://tracker.ceph.com/issues/52283
> > ...but that should have been fixed in v5.14.
>
> Nope, does not make sense. This reported issue tried to free a
> "ceph_cap_flush", while mine tries to free "jbd2_journal_handle", which
> is far away.
>

There was some ambiguity in how those objects got freed. What you're
seeing could just be a different manifestation of the same problem, but
it could be something else as well.

> > You may also want to try v5.16-rc8 if you're able to build your own
> > kernels. There were some patches that went in to improve how the client
> > handles inodes that become inaccessible.
>
> I try to get them to install a 5.16-rc8 or newer, get a new crash dump
> and report that to https://tracker.ceph.com/.
>

Sounds good. I suspect you have more than one problem.

The crash is clearly a kernel bug, but it's occurring while the client
is removing caps due to receiving a session message. It may be that the
MDS blocklisted the client in this case. You may want to check the MDS
logs or see if the kernel logged anything to that effect.

v5.16 may help the client more sanely handle tearing down inodes in this
situation, but it may not do anything to help whatever is causing the
blocklisting in the first place.
--
Jeff Layton <jlayton@xxxxxxxxxx>