Re: [GIT PULL] bcachefs

From: Kent Overstreet
Date: Wed Jun 28 2023 - 18:14:20 EST


On Wed, Jun 28, 2023 at 03:17:43PM -0600, Jens Axboe wrote:
> Case in point, just changed my reproducer to use aio instead of
> io_uring. Here's the full script:
>
> #!/bin/bash
>
> DEV=/dev/nvme1n1
> MNT=/data
> ITER=0
>
> while true; do
> echo loop $ITER
> sudo mount $DEV $MNT
> fio --name=test --ioengine=aio --iodepth=2 --filename=$MNT/foo --size=1g --buffered=1 --overwrite=0 --numjobs=12 --minimal --rw=randread --output=/dev/null &
> Y=$(($RANDOM % 3))
> X=$(($RANDOM % 10))
> VAL="$Y.$X"
> sleep $VAL
> ps -e | grep fio > /dev/null 2>&1
> while [ $? -eq 0 ]; do
> killall -9 fio > /dev/null 2>&1
> echo will wait
> wait > /dev/null 2>&1
> echo done waiting
> ps -e | grep "fio " > /dev/null 2>&1
> done
> sudo umount /data
> if [ $? -ne 0 ]; then
> break
> fi
> ((ITER++))
> done
>
> and if I run that, fails on the first umount attempt in that loop:
>
> axboe@m1max-kvm ~> bash test2.sh
> loop 0
> will wait
> done waiting
> umount: /data: target is busy.
>
> So yeah, this is _nothing_ new. I really don't think trying to address
> this in the kernel is the right approach, it'd be a lot saner to harden
> the xfstest side to deal with the umount a bit more sanely. There are
> obviously tons of other ways that a mount could get pinned, which isn't
> too relevant here since the bdev and mount point are basically exclusive
> to the test being run. But the kill and delayed fput is enough to make
> that case imho.

Uh, count me very much not in favor of hacking around bugs elsewhere.

Al, do you know if this has been considered before? We've got fput()
being called from aio completion, which often runs out of a worqueue (if
not a workqueue, a bottom half of some sort - what happens then, I
wonder) - so the effect is that it goes on the global list, not the task
work list.

hence, kill -9ing a process doing aio (or io_uring io, for extra
reasons) causes umount to fail with -EBUSY.

and since there's no mechanism for userspace to deal with this besides
sleep and retry, this seems pretty gross.

I'd be willing to tackle this for aio since I know that code...