Re: [GIT PULL] bcachefs

From: Jens Axboe
Date: Wed Jun 28 2023 - 16:44:34 EST


On 6/28/23 11:52?AM, Kent Overstreet wrote:
> On Wed, Jun 28, 2023 at 10:57:02AM -0600, Jens Axboe wrote:
>> I discussed this with Christian offline. I have a patch that is pretty
>> simple, but it does mean that you'd wait for delayed fput flush off
>> umount. Which seems kind of iffy.
>>
>> I think we need to back up a bit and consider if the kill && umount
>> really is sane. If you kill a task that has open files, then any fput
>> from that task will end up being delayed. This means that the umount may
>> very well fail.
>>
>> It'd be handy if we could have umount wait for that to finish, but I'm
>> not at all confident this is a sane solution for all cases. And as
>> discussed, we have no way to even identify which files we'd need to
>> flush out of the delayed list.
>>
>> Maybe the test case just needs fixing? Christian suggested lazy/detach
>> umount and wait for sb release. There's an fsnotify hook for that,
>> fsnotify_sb_delete(). Obviously this is a bit more involved, but seems
>> to me that this would be the way to make it more reliable when killing
>> of tasks with open files are involved.
>
> No, this is a real breakage. Any time we introduce unexpected
> asynchrony there's the potential for breakage: case in point, there was
> a filesystem that made rm asynchronous, then there were scripts out
> there that deleted until df showed under some threshold.. whoops...

This is nothing new - any fput done from an exiting task will end up
being deferred. The window may be a bit wider now or a bit different,
but it's the same window. If an application assumes it can kill && wait
on a task and be guaranteed that the files are released as soon as wait
returns, it is mistaken. That is NOT the case.

> this would break anyone that does fuser; umount; and making the umount
> lazy just moves the race to the next thing that uses the block device.
>
> I'd like to know how delayed_fput() avoids this.

What is "this" here? The delayed fput list is processed async, so it's
really down to timing for the size of the window. Either the 388 test is
fixed so that it monitors for sb release like Christian described, or we
can paper around it with a sync and sleep or something like that. The
former would obviously be a lot more reliable.

--
Jens Axboe