Re: INFO: task hung in fuse_reverse_inval_entry

From: Miklos Szeredi
Date: Mon Jul 23 2018 - 08:33:13 EST


On Mon, Jul 23, 2018 at 2:22 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Mon, Jul 23, 2018 at 2:12 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>> On Mon, Jul 23, 2018 at 10:11 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> On Mon, Jul 23, 2018 at 9:59 AM, syzbot
>>> <syzbot+bb6d800770577a083f8c@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>> Hello,
>>>>
>>>> syzbot found the following crash on:
>>>>
>>>> HEAD commit: d72e90f33aa4 Linux 4.18-rc6
>>>> git tree: upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1324f794400000
>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=68af3495408deac5
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=bb6d800770577a083f8c
>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11564d1c400000
>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16fc570c400000
>>>
>>>
>>> Hi fuse maintainers,
>>>
>>> We are seeing a bunch of such deadlocks in fuse on syzbot. As far as I
>>> understand this is mostly working-as-intended (parts about deadlocks
>>> in Documentation/filesystems/fuse.txt). The intended way to resolve
>>> this is aborting connections via fusectl, right?
>>
>> Yes. Alternative is with "umount -f".
>>
>>> The doc says "Under
>>> the fuse control filesystem each connection has a directory named by a
>>> unique number". The question is: if I start a process and this process
>>> can mount fuse, how do I kill it? I mean: totally and certainly get
>>> rid of it right away? How do I find these unique numbers for the
>>> mounts it created?
>>
>> It is the device number found in st_dev for the mount. Other than
>> doing stat(2) it is possible to find out the device number by reading
>> /proc/$PID/mountinfo (third field).
>
> Thanks. I will try to figure out fusectl connection numbers and see if
> it's possible to integrate aborting into syzkaller.
>
>>> Taking into account that there is usually no
>>> operator attached to each server, I wonder if kernel could somehow
>>> auto-abort fuse on kill?
>>
>> Depends on what the fuse server is sleeping on. If it's trying to
>> acquire an inode lock (e.g. unlink(2)), which is classical way to
>> deadlock a fuse filesystem, then it will go into an uninterruptible
>> sleep. There's no way in which that process can be killed except to
>> force a release of the offending lock, which can only be done by
>> aborting the request that is being performed while holding that lock.
>
> I understand that it is not killed today, but I am asking if we can
> make it killable. It's all code that we can change, and if a human
> operator can do it, it can be done pure programmatically on kill too,
> right?

Hmm, you mean if a process is in an uninterruptible sleep trying to
acquire a lock on a fuse filesystem and is killed, then the fuse
filesystem should be aborted?

Even if we'd manage to implement that, it's a large backward
incompatibility risk.

I don't argue that it can be done, but I would definitely argue *if*
it should be done.

Thanks,
Miklos