Re: [GIT PULL] bcachefs

From: Jens Axboe
Date: Wed Jun 28 2023 - 12:57:14 EST


On 6/28/23 8:58?AM, Jens Axboe wrote:
> On 6/27/23 10:01?PM, Kent Overstreet wrote:
>> On Tue, Jun 27, 2023 at 09:16:31PM -0600, Jens Axboe wrote:
>>> On 6/27/23 2:15?PM, Kent Overstreet wrote:
>>>>> to ktest/tests/xfstests/ and run it with -bcachefs, otherwise it kept
>>>>> failing because it assumed it was XFS.
>>>>>
>>>>> I suspected this was just a timing issue, and it looks like that's
>>>>> exactly what it is. Looking at the test case, it'll randomly kill -9
>>>>> fsstress, and if that happens while we have io_uring IO pending, then we
>>>>> process completions inline (for a PF_EXITING current). This means they
>>>>> get pushed to fallback work, which runs out of line. If we hit that case
>>>>> AND the timing is such that it hasn't been processed yet, we'll still be
>>>>> holding a file reference under the mount point and umount will -EBUSY
>>>>> fail.
>>>>>
>>>>> As far as I can tell, this can happen with aio as well, it's just harder
>>>>> to hit. If the fput happens while the task is exiting, then fput will
>>>>> end up being delayed through a workqueue as well. The test case assumes
>>>>> that once it's reaped the exit of the killed task that all files are
>>>>> released, which isn't necessarily true if they are done out-of-line.
>>>>
>>>> Yeah, I traced it through to the delayed fput code as well.
>>>>
>>>> I'm not sure delayed fput is responsible here; what I learned when I was
>>>> tracking this down has mostly fell out of my brain, so take anything I
>>>> say with a large grain of salt. But I believe I tested with delayed_fput
>>>> completely disabled, and found another thing in io_uring with the same
>>>> effect as delayed_fput that wasn't being flushed.
>>>
>>> I'm not saying it's delayed_fput(), I'm saying it's the delayed putting
>>> io_uring can end up doing. But yes, delayed_fput() is another candidate.
>>
>> Sorry - was just working through my recollections/initial thought
>> process out loud
>
> No worries, it might actually be a combination and this is why my
> io_uring side patch didn't fully resolve it. Wrote a simple reproducer
> and it seems to reliably trigger it, but is fixed with an flush of the
> delayed fput list on mount -EBUSY return. Still digging...

I discussed this with Christian offline. I have a patch that is pretty
simple, but it does mean that you'd wait for delayed fput flush off
umount. Which seems kind of iffy.

I think we need to back up a bit and consider if the kill && umount
really is sane. If you kill a task that has open files, then any fput
from that task will end up being delayed. This means that the umount may
very well fail.

It'd be handy if we could have umount wait for that to finish, but I'm
not at all confident this is a sane solution for all cases. And as
discussed, we have no way to even identify which files we'd need to
flush out of the delayed list.

Maybe the test case just needs fixing? Christian suggested lazy/detach
umount and wait for sb release. There's an fsnotify hook for that,
fsnotify_sb_delete(). Obviously this is a bit more involved, but seems
to me that this would be the way to make it more reliable when killing
of tasks with open files are involved.

--
Jens Axboe