Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race

From: Andy Lutomirski
Date: Mon Dec 07 2015 - 21:00:10 EST


On Mon, Dec 7, 2015 at 2:46 PM, Dominique Martinet
<dominique.martinet@xxxxxx> wrote:
> Andy Lutomirski wrote on Mon, Dec 07, 2015:
>> On Thu, Dec 3, 2015 at 9:52 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> > Sometimes udevadm trigger --action=add hangs the system, and the splat
>> > below happens. This seems to be timing dependent, and I haven't been
>> > able to trigger it yet with lockdep enabled, sadly.
>> >
>> > Any ideas? I not, I'll try to instrument it better tomorrow.
>>
>> More details: this is caused by a storm of /sbin/hotplug UMH calls
>> (yes, misconfigured kernel, but still). /sbin is a symlink to
>> /usr/sbin, /usr/sbin/hotplug doesn't exist, and all of the above is on
>> rootfs, which is 9p over virtio.
>>
>> Pointing uevent_helper at /usr/sbin/hotplug (which still doesn't
>> exist) seems to work around it.
>
> Can you reproduce it on a booted system with something like
> `seq 1 1000000 | xargs -P 1024 -I{} cat /sbin/foo >&/dev/null` ?

This doesn't reproduce it.

This doesn't either:

seq 1 1000000 | xargs -P 1024 -I{} bash -c 'exec /sbin/foo' &>/dev/null

>
> (trying execs might be closer to your workload, not sure how much this
> or using umh might change)
>
>
> Also, what qemu version please just to try to match your environment ?

$ qemu-system-x86_64 --version
QEMU emulator version 2.4.1 (qemu-2.4.1-1.fc23), Copyright (c)
2003-2008 Fabrice Bellard

My reproducer is:

$ virtme-run --kdir . --pwd

using this virtme version:

https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git/commit/?id=17363c2900e8b796c80c920c6fcdcc6747784ef7

Bad kernel config attached. This config with v4.4-rc3 (and no
additional patches) reproduces it reliably for me.

With the latest virtme, I don't reproduce it -- the latest virtme
turns off uevent_helper early in boot, which suppresses the bug for
me, at least most of the time.


If I dump all task states (see attached typescript), I see a bunch of
things blocked in 9p rpc. This makes me think it could be a QEMU bug,
not a kernel bug.

--Andy



--
Andy Lutomirski
AMA Capital Management, LLC

Attachment: bad-config.xz
Description: application/xz

Attachment: typescript.xz
Description: application/xz