Re: [DISCUSSION PATCH 00/41] random: possible ways towards NIST SP800-90B compliance

From: Stephan Mueller
Date: Mon Sep 21 2020 - 04:42:24 EST


Am Montag, 21. September 2020, 09:58:16 CEST schrieb Nicolai Stange:

Hi Nicolai,

> Hi all,
>
> first of all, my apologies for the patch bomb following up in reply to this
> mail here -- it's not meant to receive any serious review at all, but only
> to support the discussion I'm hoping to get going.

Thank you for this effort!
>
> As some of you might already be aware of, all new submissions for FIPS
> certification will be required to comply with NIST SP800-90B from Nov 7th
> on ([1], sec. 7.18 "Entropy Estimation and Compliance with SP 800-90B").
> For reference: broadly speaking, NIST SP800-90B is about noise sources,
> SP800-90A about the DRBG algorithms stacked on top and SP800-90C about how
> everything is supposed to be glued together. The main requirements from
> SP800-90B are
> - no correlations between different noise sources,
> - to continuously run certain health tests on a noise source's output and
> - to provide an interface enabling access to the raw noise samples for
> validation purposes.
>
> To my knowledge, all SP800-90B compliant noise sources available on Linux
> today are either based on the Jitter RNG one way or another or on
> architectural RNGs like e.g. x86's RDSEED or arm64's RNDRRS. Currently,
> there's an in-kernel Jitter RNG implementation getting registered (c.f.
> crypto/drbg.c, (*)) with the Crypto RNG API, which is also accessible from
> userspace via AF_ALG. The userspace haveged ([2]) or jitterentropy
> integrations ([3]) are worth mentioning in this context, too. So in
> summary, I think that for the in-kernel entropy consumers falling under the
> scope of FIPS, the currently only way to stay compliant would be to draw it
> from said Crypto API RNG. For userspace applications there's the additional
> option to invoke haveged and alike.
>
> OTOH, CPU jitter based techniques are not uncontroversial ([4]). In any
> case, it would certainly be a good idea to mix (xor or whatever) any jitter
> output with entropy obtained from /dev/random (**). If I'm not mistaken,
> the mentioned Crypto API RNG implementation (crypto/drbg.c) follows exactly
> this approach, but doesn't enforce it yet: there's no
> wait_for_random_bytes() and early DRBG invocations could in principle run
> on seeds dominated entirely by jitterentropy. However, this can probably
> get sorted quite easily and thus, one reasonable way towards maintaining
> FIPS resp. SP800-90 compliance would be to
> - make crypto/drbg.c invoke wait_for_random_bytes(),
> - make all relevant in-kernel consumers to draw their random numbers from
> the Crypto RNG API, if not already the case and
> - convert all relevant userspace to use a SP800-90B conforming Jitter RNG
> style noise source for compliance reasons, either by invoking the
> kernel's Crypto RNG API or by diffent means, and mix that with
> /dev/random.
>
> Even though this would probably be feasible, I'm not sure that giving up on
> /dev/random being the primary, well established source of randomness in
> favor of each and every userspace crypto library rolling its own entropy
> collection scheme is necessarily the best solution (it might very well be
> though).
>
> An obvious alternative would be to make /dev/random conform to SP800-90B.
> Stephan Müller posted his "LRNG" patchset ([5]), in which he proposed to
> introduce a second, independent implementation aiming at SP800-90[A-C]
> conformance. However, it's in the 35th iteration now and my impression is
> that there's hardly any discussion happening around this for quite a while
> now. I haven't followed the earlier development, but I can imagine several
> reasons for that:
> - people are not really interested in FIPS or even questioning the whole
> concept in the first place (c.f. Theodore Ts'o remarks on this topic
> at [6]),
> - potential reviewers got merely discouraged by the diffstat or

Maybe I followed the Linux principle a bit to much here? Release early,
release often.

But with the v35, all goals I tried to achieve are now in (namely the last was
to get rid of any non-cryptographic conditioning functions) and to have a very
clean data processing / entropy analysis. I do not expect big changes any
more.

> - people dislike the approach of having two competing implementations for
> what is basically the same functionality in the kernel.

Is this really so bad considering the security implications on this topic? We
also have multiple file systems, multiple memory allocators, etc...
>
> In either case, I figured it might perhaps help further discussion to
> provide at least a rough idea of how bad the existing /dev/random
> implementation would get cluttered when worked towards SP800-90B
> compliance. So I implemented the required health tests for the interrupt
> noise source -- the resulting patches can be found in reply to this mail.
> I'd like to stress(!) that this should really only be considered a first
> step and that there would still be a long way towards a complete solution;
> known open items are listed below. Also, I'm fully aware that making those
> continuous health tests block the best effort primary_crng reseeds upon
> failure is a ridiculous thing to do -- that's again meant for demonstration
> purposes only, c.f. the commit log from the next to last patch. Anyway,
> those of you who are interested in some more details beyond the mere
> diffstat can find them after the list of references below.
>
> In summary, I can imagine three feasible ways towards SP800-90 compliance:
> 1.) Put the burden on consumers. For in-kernel users this would mean
> conversion to the Jitter backed Crypto RNG API, in case that hasn't
> happened yet. Userspace is free to use any approved Jitter based
> mechanism for compliance reasons, but is encouraged to mix that with
> /dev/random.
> 2.) Merge Stephan's LRNG. Users/distros would have to decide between either
> of the two competing implementations at kernel config time.
> 3.) Develop the existing /dev/random towards compliance, ideally w/o
> affecting !fips_enabled users too much. This would likely require some
> redundancies as well as some atrocities imposed by the specs.
>
> I'm looking forward to hearing your opinions and suggestions! In case you
> happen to know of anybody who's not on CC but might potentially be
> interested in FIPS, I'd highly appreciate it if you could point him/her to
> this thread. The usual suspects are probably (enterprise?) distro folks,
> but there might be others I haven't thought of.
>
> Many thanks for your time!
>
> Nicolai
>
>
> (*) That's an oversimplification for the sake of brevity: actually
> SP800-90A DRBGs stacked on top of the SP800-90B conforming
> jitterentropy source get registered with the Crypto API.
> (**) "/dev/random" is used as a synonym for everything related to
> drivers/char/random.c throughout this mail.
>
> [1]
> https://csrc.nist.gov/csrc/media/projects/cryptographic-module-validation-p
> rogram/documents/fips140-2/fips1402ig.pdf [2]
> http://www.issihosts.com/haveged/
> [3] http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
> c.f. appendices C-E
> [4] https://lwn.net/Articles/642166/
> [5] https://lkml.kernel.org/r/5667034.lOV4Wx5bFT@xxxxxxxxxxxxxxxxxxx
> [6] https://lkml.kernel.org/r/20170919133959.5fgtioyonlsdyjf5@xxxxxxxxx
> https://lkml.kernel.org/r/20170920011642.cczekznqebf2zq5u@xxxxxxxxx
> [7] https://lkml.kernel.org/r/aef70b42-763f-0697-f12e-1b8b1be13b07@xxxxxxxxx
>
>
> As promised above, some more details on the RFC series sent alongside
> follow. The primary goal was to implement that health test functionality as
> required by SP800-90B for the existing drivers/char/random.c without
> affecting !fips_enabled users in any way. As outlined below, I failed quite
> miserably as far as performance is concerned, but that shouldn't be
> something which cannot get rectified. Kernel version v5.9-rc4 had been used
> as a basis. The series can be logically subdivided into the following
> parts:
> - [1-5]: Preparatory cleanup.
> - [6-17]: Implement support for deferring entropy credit dispatch to the
> global balance to long after the corresponding pool mixing operation has
> taken place. Needed for "holding back" entropy until the health tests
> have finished on the latest pending batch of samples.
> - [18-21]: Move arch_get_random_{seed_,}long() out of the interrupt path.
> Needed to adhere to how SP800-90C expects multiple noise source to get
> combined, but is also worthwhile on its own from a performance POV.
> - [22-23]: Don't award entropy to non-SP800-90B conforming architectural
> RNGs if fips_enabled is set.
> - [24]: Move rand_initialize() to after time_init(). A "fix" for what is
> currently a non-issue, but it's a prerequisite for the subsequent patch.
> - [25]: Detect cycle counter resolution, subsequently needed for making a
> per-IRQ entropy assessment.
> - [26-28]: Follow Stephan's LRNG approach in how much entropy gets
> awarded to what: a lot more than before to add_interrupt_randomness(),
> none to add_{disk,input}_randomness() anymore.
> - [29-33]: Introduce empty health test stubs and wire them up to
> add_interrupt_randomness().
> - [34-36]: Implement the Adaptive Proportion Test (APT) as specified by
> SP800-90B and squeeze some more statistical power out of it.
> - [37]: Implement SP800-90B's Repetition Count Test (RCT).
> - [38-40]: Implement the startup tests, which are nothing but the
> continuous tests (APT + RCT) run on a specified amount of samples at
> boot time.
> - [41]: Attempt to keep the system going in case the entropy estimate
> had been too optimistic and the health tests keep failing.
>
> As the health tests are run from interrupt context on each sample, a
> performance measurement is due. To this end, I configured a Raspberry Pi 2B
> (ARMv7 Cortex A7) to disable all peripherals, gated a
> 19.2 MHz / 2048 ~= 9.3 kHz clock signal to some edge triggered GPIO and
> function_graph traced add_interrupt_randomness() for 10 min from a busybox
> initramfs. Unfortunately, the results had been a bit disappointing: with
> fips_enabled being unset there had been a runtime degradation of ~12.5% w/o
> SMP and ~5% w/ SMP resp. on average merely due to the application of the
> patches onto the v5.9-rc4 base. However, as the amount of work should not
> have changed much and given that struct fast_pool still fits into a single
> cacheline, I'm optimistic that this can get rectified by e.g. introducing
> a static_key for fips_enabled and perhaps shuffling branches a bit such
> that the !fips_enabled code becomes more linear. OTOH, the impact of
> enabling the health tests by means of setting fips_enabled had not been so
> dramatic: the observed increase in average add_interrupt_randomness()
> runtimes had been 6% w/o SMP and 5% w/ SMP respectively.
>
> Apart from those well controlled experiments on a RPi, I also did some
> lax benchmarking on my x86 desktop (which has some Intel i9, IIRC).
> More specifically, I simply didn't touch the system and ftraced
> add_interrupt_randomness() for 15 mins. The number of captured events had
> been about 2000 in each configuration. Here the add_interrupt_randomness()
> performance improved greatly: from 4.3 us on average w/o the patches down
> to 2.0 us with the patches applied and fips_enabled. However, I suppose
> this gain was due to the removal of RDSEED from add_interrupt_randomness().
> Indeed, when inspecting the distribution of add_interrupt_randomness()
> runtimes on plain v5.9-rc4 more closely, it can be seen that there's a
> good portion of events (about 1/4th) where add_interrupt_randomness() took
> about 10us. So I think that this comparison isn't really a fair one...
>
>
> To the best of my knowledge, these are the remaining open questions/items
> towards full SP800-90[A-C] compliance:
> - There's no (debugfs?) interface for accessing raw samples for validation
> purposes yet. That would be doable though.

I use the patch 12/13 from my patch series successfully for random.c too.

> - try_to_generate_entropy() should probably get wired up to the health
> tests as well. More or less straightfoward to implement, too.

Correct, but when assessing this function as part of [1], I found hardly any
entropy being provided with this component.

[1] https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/
LinuxRNG/LinuxRNG_EN_V4_1.pdf?__blob=publicationFile&v=2

> - Diverting fast_pool contents into net_rand_state is not allowed (for a
> related discussion on this topic see [7]).

Totally agreed, this is a bad design.

> - I've been told that SP800-90A is not a hard requirement yet, but I
> suppose it will eventually become one. This would mean that the chacha20
> RNG would have to get replaced by something approved for fips_enabled.
> - The sequence of fast_pool -> input_pool -> extract_buf() operations
> is to be considered a "non-vetted conditioning component" in SP800-90B
> speak. It would follow that the output can't be estimated as having full
> entropy, but only 0.999 of its length at max. (c.f. sec. 3.1.5.2). This
> could be resolved by running a SP800-90A derivation function at CRNG
> reseeding for fips_enabled. extract_buf(), which is already SHA1 based,
> could perhaps be transformed into such one as well.

The core issue may very well be the SHA-1: using SHA-1 and folding its output
in half may be an artificial limit of the upper bound of entropy.

> - The only mention of combining different noise sources I was able to find
> had been in SP800-90C, sec. 5.3.4 ("Using Multiple Entropy Sources"):
> it clearly states that the outputs have to be combined by concatenation.
> add_hwgenerator_randomness() mixes into the same input_pool as
> add_interrupt_randomness() though and I would expect that this isn't
> allowed, independent of whether the noise source backing the former
> is SP800-90B compliant or not. IIUC, Stephan solved this for his LRNG
> by maintaing a separate pool for the hw generator.

See 90B section 3.1.6: combining both via vetted conditioning function is
accepted.

> - SP800-90A sets an upper bound on how many bits may be drawn from a
> DRBG/crng before a reseed *must* take place ("reseed_interval"). In
> principle that shouldn't matter much in practice, at least not with
> CONFIG_NUMA: with reseed_interval == 2^32 bits, a single CRNG instance
> would be allowed to hand out only 500MB worth of randomness before
> reseeding, but a (single) numa crng chained to the primary_crng may
> produce as much as 8PB before the latter must eventually get reseeded
> from the input_pool. But AFAICT, a SP800-90A conforming implementation
> would still have to provide provisions for a blocking extract_crng().

The hard limit is 2^48 generate operation with up to 2^16 bytes each. This is
a very large amount of data. If we start triggering reseeds early enough, a
hard blocking may not be needed.

> - It's entirely unclear to me whether support for "prediction resistance
> requests" is optional. It would be a pity if it weren't, because IIUC
> that would effectively imply a return to the former blocking_pool
> behaviour, which is obviously a no-no.

This is not required.


Ciao
Stephan