Re: [RFC][PATCH 1/3] test-ww_mutex: Use prng instead of rng to avoid hangs at bootup

From: Jason A. Donenfeld
Date: Tue Aug 08 2023 - 15:09:49 EST


Hi Peter, John,

On Tue, Aug 8, 2023 at 12:36 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Aug 08, 2023 at 06:26:41AM +0000, John Stultz wrote:
> > Booting w/ qemu without kvm, I noticed we'd sometimes seem to get
> > stuck in get_random_u32_below(). This seems potentially to be
> > entropy exhaustion (with the test module linked statically, it
> > runs pretty early in the bootup).
> >
> > I'm not 100% sure on this, but this patch switches to use the
> > prng instead since we don't need true randomness, just mixed up
> > orders for testing ww_mutex lock acquisitions.
> >
> > With this patch, I no longer see hangs in get_random_u32_below()
> >
> > Feedback would be appreciated!
>
> Jason, I thought part of the 'recent' random rework was avoiding the
> exhaustion problem, could you please give an opinion on the below?

Thanks for looping me in. I actually can't reproduce this. I'm using a
minimal config and using QEMU without KVM. The RNG doesn't initialize
until much later on in the boot process, expectedly, yet
get_random_u32_below() does _not_ hang in my trials. And indeed it's
designed to never hang, since that would create boot deadlocks. So I'm
not sure why you're seeing a hang.

It is worth noting that in those early boot test-case scenarios,
before the RNG initializes, get_random_u32_below() will be somewhat
slower than it normally is, and also slower than prandom_u32_state().
(But only in this early boot scenario edge case; this isn't a general
statement about speed.) It's possible that in your QEMU machine,
things are slow enough that you're simply noticing the difference. On
my system, however, I replaced `get_random_u32_below()` with `static
u32 x; return ++x % ceil;` and I didn't see any difference running it
under TCG -- it took about 7 seconds either way.

So, from my perspective, you shouldn't see any hang. That function
never blocks. I'm happy to look more into what's happening on your end
though. Maybe share your .config and qemu command line and I'll see if
I can repro?

Jason