Re: INFO: task hung in fuse_reverse_inval_entry

From: Dmitry Vyukov
Date: Mon Jul 23 2018 - 09:37:43 EST


On Mon, Jul 23, 2018 at 3:05 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>>>>>> <syzbot+bb6d800770577a083f8c@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> syzbot found the following crash on:
>>>>>>>
>>>>>>> HEAD commit: d72e90f33aa4 Linux 4.18-rc6
>>>>>>> git tree: upstream
>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1324f794400000
>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=68af3495408deac5
>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=bb6d800770577a083f8c
>>>>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>>>>>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11564d1c400000
>>>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16fc570c400000
>>>>>>
>>>>>>
>>>>>> Hi fuse maintainers,
>>>>>>
>>>>>> We are seeing a bunch of such deadlocks in fuse on syzbot. As far as I
>>>>>> understand this is mostly working-as-intended (parts about deadlocks
>>>>>> in Documentation/filesystems/fuse.txt). The intended way to resolve
>>>>>> this is aborting connections via fusectl, right?
>>>>>
>>>>> Yes. Alternative is with "umount -f".
>>>>>
>>>>>> The doc says "Under
>>>>>> the fuse control filesystem each connection has a directory named by a
>>>>>> unique number". The question is: if I start a process and this process
>>>>>> can mount fuse, how do I kill it? I mean: totally and certainly get
>>>>>> rid of it right away? How do I find these unique numbers for the
>>>>>> mounts it created?
>>>>>
>>>>> It is the device number found in st_dev for the mount. Other than
>>>>> doing stat(2) it is possible to find out the device number by reading
>>>>> /proc/$PID/mountinfo (third field).
>>>>
>>>> Thanks. I will try to figure out fusectl connection numbers and see if
>>>> it's possible to integrate aborting into syzkaller.
>>>>
>>>>>> Taking into account that there is usually no
>>>>>> operator attached to each server, I wonder if kernel could somehow
>>>>>> auto-abort fuse on kill?
>>>>>
>>>>> Depends on what the fuse server is sleeping on. If it's trying to
>>>>> acquire an inode lock (e.g. unlink(2)), which is classical way to
>>>>> deadlock a fuse filesystem, then it will go into an uninterruptible
>>>>> sleep. There's no way in which that process can be killed except to
>>>>> force a release of the offending lock, which can only be done by
>>>>> aborting the request that is being performed while holding that lock.
>>>>
>>>> I understand that it is not killed today, but I am asking if we can
>>>> make it killable. It's all code that we can change, and if a human
>>>> operator can do it, it can be done pure programmatically on kill too,
>>>> right?
>>>
>>> Hmm, you mean if a process is in an uninterruptible sleep trying to
>>> acquire a lock on a fuse filesystem and is killed, then the fuse
>>> filesystem should be aborted?
>>>
>>> Even if we'd manage to implement that, it's a large backward
>>> incompatibility risk.
>>>
>>> I don't argue that it can be done, but I would definitely argue *if*
>>> it should be done.
>>
>>
>> I understand that we should abort only if we are sure that it's
>> actually deadlocked and there is no other way.
>> So if fuse-user process is blocked on fuse lock, then we probably
>> should do nothing. However, if the fuse-server is killed, then perhaps
>> we could abort the connection at that point. Namely, if a process that
>> has a fuse fd open is killed and it is the only process that shared
>> this fd, then we could abort the connection on arrival of the kill
>> signal (rather than wait untill all it's threads finish and then start
>> closing all fd's, this is where we get the deadlock -- some of its
>> threads won't finish). I don't know if such synchronous kill hook is
>> available, though. If several processes shared the same fuse fd, then
>> we could close the fd in each process on SIGKILL arrival, then when
>> all of these processes are killed, fuse fd will be closed and we can
>> abort the connection, which will un-deadlock all of these processes.
>> Does this look any reasonable?
>
> Biggest conceptual problem: your definition of fuse-server is weak.
> Take the following example: process A is holding the fuse device fd
> and is forwarding requests and replies to/from process B via a pipe.
> So basically A is just a proxy that does nothing interesting, the
> "real" server is B. But according to your definition B is not a
> server, only A is.

I proposed to abort fuse conn when all fuse device fd's are "killed"
(all processes having the fd opened are killed). So if _only_ process
B is killed, then, yes, it will still hang. However if A is killed or
both A and B (say, process group, everything inside of pid namespace,
etc) then the deadlock will be autoresolved without human
intervention.

> And this is just a simple example, parts of the server might be on
> different machines, etc... It's impossible to automatically detect if
> a process is acting as a fuse server or not.

It does not seem we need the precise definition. If no one ever can
write anything into the fd, we can safely abort the connection (?). If
we don't, we can either get that the process exits normally and the
connection is doomed anyway, so no difference in behavior, or we can
get a deadlock.

> We could let the fuse server itself notify the kernel that it's a fuse
> server. That might help in the cases where the deadlock is
> accidental, but obviously not in the case when done by a malicious
> agent. I'm not sure it's worth the effort. Also I have no idea how
> the respective maintainers would take the idea of "kill hooks"... It
> would probably be a lot of work for little gain.

What looks wrong to me here is that fuse is only (?) subsystem in
kernel that stops SIGKILL from working and requires complex custom
dance performed by a human operator (which is not necessary there at
all). Say, if a process has opened a socket, whatever, I don't need to
locate and abort something in socketctl fs, just SIGKILL. If a
processes has opened a file, I don't need to locate the fd in /proc
and abort it, just SIGKILL. If a process has created an ipc object, I
don't need to do any special dance, just SIGKILL. fuse is somehow very
special, if we have more such cases, it definitely won't scale.
I understand that there can be implementation difficulties, but
fundamentally that's how things should work -- choose target
processes, kill, done, right?