Re: [RFC][PATCH 1/3] test-ww_mutex: Use prng instead of rng to avoid hangs at bootup

From: John Stultz
Date: Tue Aug 08 2023 - 16:03:49 EST


On Tue, Aug 8, 2023 at 7:05 AM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
>
> Hi Peter, John,
>
> On Tue, Aug 8, 2023 at 12:36 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Aug 08, 2023 at 06:26:41AM +0000, John Stultz wrote:
> > > Booting w/ qemu without kvm, I noticed we'd sometimes seem to get
> > > stuck in get_random_u32_below(). This seems potentially to be
> > > entropy exhaustion (with the test module linked statically, it
> > > runs pretty early in the bootup).
> > >
> > > I'm not 100% sure on this, but this patch switches to use the
> > > prng instead since we don't need true randomness, just mixed up
> > > orders for testing ww_mutex lock acquisitions.
> > >
> > > With this patch, I no longer see hangs in get_random_u32_below()
> > >
> > > Feedback would be appreciated!
> >
> > Jason, I thought part of the 'recent' random rework was avoiding the
> > exhaustion problem, could you please give an opinion on the below?
>
> Thanks for looping me in. I actually can't reproduce this. I'm using a
> minimal config and using QEMU without KVM. The RNG doesn't initialize
> until much later on in the boot process, expectedly, yet
> get_random_u32_below() does _not_ hang in my trials. And indeed it's
> designed to never hang, since that would create boot deadlocks. So I'm
> not sure why you're seeing a hang.

Ok. My hesitancy here was in part due to my understanding the entropy
exhaustion hangs weren't supposed to be an issue anymore (thanks,
btw).

> It is worth noting that in those early boot test-case scenarios,
> before the RNG initializes, get_random_u32_below() will be somewhat
> slower than it normally is, and also slower than prandom_u32_state().
> (But only in this early boot scenario edge case; this isn't a general
> statement about speed.) It's possible that in your QEMU machine,
> things are slow enough that you're simply noticing the difference. On
> my system, however, I replaced `get_random_u32_below()` with `static
> u32 x; return ++x % ceil;` and I didn't see any difference running it
> under TCG -- it took about 7 seconds either way.
>
> So, from my perspective, you shouldn't see any hang. That function
> never blocks. I'm happy to look more into what's happening on your end
> though. Maybe share your .config and qemu command line and I'll see if
> I can repro?

Yeah, it may just be that the real RNG is slow enough that I'm hitting
the hung task watchdog?
(I'm running with 64 cpus, so the test is trying to use 128 threads
all hitting get_random_u32_below over and over to create their own
random order of 16 locks)

When the softlockup watchdog hits, the trace looks like:
[ 64.268881] _get_random_bytes+0x3b/0x140
[ 64.268881] ? __kmem_cache_alloc_node+0x17a/0x310
[ 64.268881] ? get_random_order+0x28/0xa0
[ 64.268881] get_random_u32+0x1c2/0x1e0
[ 64.268881] __get_random_u32_below+0xd/0x60
[ 64.268881] get_random_order+0x60/0xa0
[ 64.268881] stress_inorder_work+0x26/0x390
[ 64.268881] ? lock_acquire+0xd4/0x290
[ 64.268881] process_one_work+0x267/0x520
[ 64.268881] worker_thread+0x4a/0x390
[ 64.268881] ? __pfx_worker_thread+0x10/0x10
[ 64.268881] kthread+0xf5/0x130
[ 64.268881] ? __pfx_kthread+0x10/0x10
[ 64.268881] ret_from_fork+0x2b/0x40
[ 64.268881] ? __pfx_kthread+0x10/0x10
[ 64.268881] ret_from_fork_asm+0x1b/0x30


Attached is the defconfig I just used.

My script to run is below.

thanks
-john

#base boot args
kcmd="root=/dev/vda2 console=ttyS0 earlycon "
#debug helper
kcmd+="ftrace_dump_on_oops sysrq_always_enabled debug_boot_weak_hash "
#locktorture bits
kcmd+="torture.random_shuffle=1 locktorture.writer_fifo=1
locktorture.torture_type=mutex_lock locktorture.nested_locks=8
locktorture.rt_boost=1 locktorture.rt_boost_factor=50
locktorture.stutter=0 "

cpu_count=64
mem_size="6G"
#use_kvm="-enable-kvm"
device_config="-smp ${cpu_count} -m ${mem_size} ${use_kvm} \
-drive if=virtio,format=qcow2,file=ubuntu.img \
-device virtio-scsi-pci,id=scsi0 \
-object rng-random,filename=/dev/urandom,id=rng0\
-device virtio-rng-pci,rng=rng0 \
-device virtio-net-pci,netdev=net0 \
-netdev user,id=net0,hostfwd=tcp::8022-:22 \
-chardev stdio,id=char0,mux=on,logfile=serial.log,signal=off \
-serial chardev:char0 -mon chardev=char0 \
-nographic \
"
kbin="bzImage"

qemu-system-x86_64 \
-kernel "${kbin}" -initrd initrd.img -append "${kcmd}" \
${device_config}

Attachment: defconfig
Description: Binary data