Re: [PATCH v7 1/2] drivers/misc: sysgenid: add system generation id driver

From: MacCarthaigh, Colm
Date: Wed Feb 24 2021 - 18:02:09 EST




On 2/24/21, 2:44 PM, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> The mmap mechanism allows the PRNG to reseed after a genid change. Because
> we don't have an event mechanism for this code path, that can happen minutes
> after the resume. But that's ok, we "just" have to ensure that nobody is
> consuming secret data at the point of the snapshot.


Something I am still not clear on is whether it's really important to
skip the system call here. If not I think it's prudent to just stick
to read for now, I think there's a slightly lower chance that
it will get misused. mmap which gives you a laggy gen id value
really seems like it would be hard to use correctly.

It's not uncommon for these user-space PRNGs to used quite a lot in very performance critical paths. If you negotiate a TLS session that uses an explicit IV, the RNG is being called for every TLS record sent. Same for IPSec depending on the cipher-suite. Every TLS hello message has 28-32 bytes of data from the RNG, or if you've got ECDSA as your signature algorithm, it's inline again. Using RSA_PSS? Same again. Many Post-Quantum algorithms are even more veraciously entropy hungry. We examine the compiled instructions for ours by hand to check it's all as tight as it can be.

To give more of an idea, several crypto libraries took out the getpid() guards they had for fork detection in the RNGs, though VDSO could have helped there and I'm not sure they would have needed to if VDSO were more widely used at the time. I don't think we'd get a patch into OpenSSL/libcrypto that involves a full syscall. VDSO might be ok, but even that's not going to have the speed that a single memory lookup can do with the mmap/madvise approach ... since we already have to use WIPEONFORK.

In practice I don't think it will be that hard to use correctly; snapshots and restores of this nature really have to happen only when the activity is quiescent. If operations are in-flight, it's not easy to reason about the potential multi-restore problems at all and it only makes sense to think about transactional correctness at the level of all transactions that may have been in-flight. The mmap solution is more about integrating with existing library APIs and semantics than it is about somehow solving that at the kernel level. That part has to be solved at the system level.

-
Colm