Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

From: André Przywara
Date: Wed Nov 11 2020 - 05:46:36 EST


On 11/11/2020 10:05, Ard Biesheuvel wrote:

Hi,

> On Wed, 11 Nov 2020 at 10:45, André Przywara <andre.przywara@xxxxxxx> wrote:
>>
>> On 11/11/2020 08:19, Ard Biesheuvel wrote:
>>
>> Hi,
>>
>>> (+ Eric)
>>>
>>> On Thu, 5 Nov 2020 at 16:29, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>>>>
>>>> When reseeding the CRNG periodically, arch_get_random_seed_long() is
>>>> called to obtain entropy from an architecture specific source if one
>>>> is implemented. In most cases, these are special instructions, but in
>>>> some cases, such as on ARM, we may want to back this using firmware
>>>> calls, which are considerably more expensive.
>>>>
>>>> Another call to arch_get_random_seed_long() exists in the CRNG driver,
>>>> in add_interrupt_randomness(), which collects entropy by capturing
>>>> inter-interrupt timing and relying on interrupt jitter to provide
>>>> random bits. This is done by keeping a per-CPU state, and mixing in
>>>> the IRQ number, the cycle counter and the return address every time an
>>>> interrupt is taken, and mixing this per-CPU state into the entropy pool
>>>> every 64 invocations, or at least once per second. The entropy that is
>>>> gathered this way is credited as 1 bit of entropy. Every time this
>>>> happens, arch_get_random_seed_long() is invoked, and the result is
>>>> mixed in as well, and also credited with 1 bit of entropy.
>>>>
>>>> This means that arch_get_random_seed_long() is called at least once
>>>> per second on every CPU, which seems excessive, and doesn't really
>>>> scale, especially in a virtualization scenario where CPUs may be
>>>> oversubscribed: in cases where arch_get_random_seed_long() is backed
>>>> by an instruction that actually goes back to a shared hardware entropy
>>>> source (such as RNDRRS on ARM), we will end up hitting it hundreds of
>>>> times per second.
>>
>> May I ask why this should be a particular problem? Form what I gathered
>> on the web, it seems like most h/w RNGs have a capacity of multiple
>> MBit/s. Wikipedia [1] suggests that the x86 CPU instructions generate at
>> least 20 Mbit/s (worst case: AMD's 2500 cycles @ 800 MHz), and I
>> measured around 78 Mbit/s with the raw entropy source on my Juno
>> (possibly even limited by slow MMIO).
>> So it seems unlikely that a few kbit/s drain the hardware entropy source.
>>
>> If we consider this interface comparably cheap, should we then not try
>> to plug the Arm firmware interface into this?
>>
>
> I'm not sure I follow. Are you saying we should not wire up a
> comparatively expensive firmware interface to
> arch_get_random_seed_long() because we currently assume it is backed
> by something cheap?

Yes. I wanted to (ab)use this patch to clarify this. x86 and arm64 use
CPU instructions (so far), S390 copies from some buffer. PPC uses either
a CPU instruction or an MMIO access. All of these I would consider
comparably cheap, especially when compared to a firmware call with
unknown costs. In fact the current Trusted Firmware implementation[1] is
not really terse, also the generic SMC dispatcher calls a platform
defined routine, which could do anything.
So to also guide the implementation in TF-A, it would be good to
establish what arch_get_random expects to be. The current
implementations and the fact that it lives in a header file suggests
that it's meant as a slim wrapper around something cheap.

> Because doing so would add significantly to the cost. Also note that a
> firmware interface would permit other ways of gathering entropy that
> are not necessarily backed by a dedicated high bandwidth noise source
> (and we already have examples of this)

Yes, agreed.
So I have a hwrng driver for the Arm SMCCC TRNG interface ready. I would
post this, but would like to know if we should drop the proposed
arch_get_random implementation [2][3] of this interface.

>> I am not against this patch, actually am considering this a nice
>> cleanup, to separate interrupt generated entropy from other sources.
>> Especially since we call arch_get_random_seed_long() under a spinlock here.
>> But I am curious about the expectations from arch_get_random in general.
>>
>
> I think it is reasonable to clean this up a little bit. A random
> *seed* is not the same thing as a random number, and given that we
> expose both interfaces, it makes sense to permit the seed variant to
> be more costly, and only use it as intended (i.e., to seed a random
> number generator)

That's true, it seems we chickened out on the arm64 implementation
already, by not using the intended stronger instruction for seed
(RNDRRS), and not implementing arch_get_random_long() at all.
But I guess that's another story.

Cheers,
Andre.

[1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/5585/3
[2]
http://lists.infradead.org/pipermail/linux-arm-kernel/2020-November/615375.html
[3]
http://lists.infradead.org/pipermail/linux-arm-kernel/2020-November/615376.html

>>>> So let's drop the call to arch_get_random_seed_long() from
>>>> add_interrupt_randomness(), and instead, rely on crng_reseed() to call
>>>> the arch hook to get random seed material from the platform.
>>
>> So I tested this and it works as expected: I see some calls on
>> initialisation, then a handful of calls every few seconds from the
>> periodic reseeding. The large number of calls every second are gone.
>>
>
> Excellent, thanks for confirming.
>
>>>>
>>>> Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
>>
>> Since the above questions are unrelated to this particular patch:
>>
>> Reviewed-by: Andre Przywara <andre.przywara@xxxxxxx>
>> Tested-by: Andre Przywara <andre.przywara@xxxxxxx>
>>
>> Cheers,
>> Andre
>>
>> [1] https://en.wikipedia.org/wiki/RDRAND#Performance
>>
>>>> ---
>>>> drivers/char/random.c | 15 +--------------
>>>> 1 file changed, 1 insertion(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>>>> index 2a41b21623ae..a9c393c1466d 100644
>>>> --- a/drivers/char/random.c
>>>> +++ b/drivers/char/random.c
>>>> @@ -1261,8 +1261,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
>>>> cycles_t cycles = random_get_entropy();
>>>> __u32 c_high, j_high;
>>>> __u64 ip;
>>>> - unsigned long seed;
>>>> - int credit = 0;
>>>>
>>>> if (cycles == 0)
>>>> cycles = get_reg(fast_pool, regs);
>>>> @@ -1298,23 +1296,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
>>>>
>>>> fast_pool->last = now;
>>>> __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
>>>> -
>>>> - /*
>>>> - * If we have architectural seed generator, produce a seed and
>>>> - * add it to the pool. For the sake of paranoia don't let the
>>>> - * architectural seed generator dominate the input from the
>>>> - * interrupt noise.
>>>> - */
>>>> - if (arch_get_random_seed_long(&seed)) {
>>>> - __mix_pool_bytes(r, &seed, sizeof(seed));
>>>> - credit = 1;
>>>> - }
>>>> spin_unlock(&r->lock);
>>>>
>>>> fast_pool->count = 0;
>>>>
>>>> /* award one bit for the contents of the fast pool */
>>>> - credit_entropy_bits(r, credit + 1);
>>>> + credit_entropy_bits(r, 1);
>>>> }
>>>> EXPORT_SYMBOL_GPL(add_interrupt_randomness);
>>>>
>>>> --
>>>> 2.17.1
>>>>
>>