Re: [PATCH v2 1/2] misc: Add Nitro Secure Module driver

From: Alexander Graf
Date: Mon Oct 02 2023 - 08:28:42 EST


Hey Greg,

On 30.09.23 08:20, Greg Kroah-Hartman wrote:
On Fri, Sep 29, 2023 at 09:26:16PM +0200, Alexander Graf wrote:
Hi Arnd!

On 29.09.23 19:28, Arnd Bergmann wrote:
On Fri, Sep 29, 2023, at 09:33, Alexander Graf wrote:
When running Linux inside a Nitro Enclave, the hypervisor provides a
special virtio device called "NSM". This device has 2 main functions:

1) Provide attestation reports
2) Modify PCR state
3) Provide entropy

This patch adds the core NSM driver that exposes a /dev/nsm device node
which user space can use to request attestation documents and influence
PCR states. A follow up patch will add a hwrng driver to feed its entropy
into the kernel.

Originally-by: Petre Eftime <petre.eftime@xxxxxxxxx>
Signed-off-by: Alexander Graf <graf@xxxxxxxxxx>
Hi Alex,

I've taken a first look at this driver and have some minor comments.

Thanks a bunch!


The main point here is that I think we need to look at possible
alternatives for the user space interface, and (if possible) change
to a set of higher-level ioctl commands from the simple passthrough.

I'm slightly torn on that bit. I think in hindsight the NSM device probably
should have been a reserved vsock CID and the hwrng one should have just
been virtio-rng.

The problem is that Nitro Enclaves were launched in 2020 and since an
ecosystem developed in multiple languages to support building code inside:

https://github.com/aws/aws-nitro-enclaves-nsm-api/blob/main/src/driver/mod.rs#L66
https://github.com/donkersgoed/aws-nsm-interface/blob/main/aws_nsm_interface/__init__.py#L264-L274
https://github.com/hf/nsm/blob/main/nsm.go#L99-L129


All of these use the (downstream) ioctl that this patch also implements. We
could change it, but instead of making it easier for user space to adapt the
device node, it would probably hurt more.

I agree that this is not a great place to be in. This driver absolutely
should have been upstreamed 3 years ago. But I can't turn back time (yet)
:).
As you know, this is no excuse to put an api in the kernel that isn't
correct or good for the long-term. Just because people do foolish
things outside of the kernel tree never means we have to accept them in
our tree. Instead we can ask them to fix them properly as part of us
taking the code.

So please, work on doing this right.


Sorry if my message above came over as a push to put an "incorrect api" into the kernel.

In situations like this where you can either give user space full access to the device's command space through a generic API or you can create command awareness in the kernel and make it the kernel's task to learn about each command, IMHO it's never a clear cut on which one is better. Especially in virtual environments where the set of commands can change quickly over time.

So what I was trying to say above is that *if* we consider both paths equally viable, I'd err on the one that enables the existing ecosystem. However if there are good reasons to not do command pass-through, I'm all for abstracting it away :)

Looking at prior art, the most similar implementations to this are TPMs and virtio-vsock. With virtio-vsock, kernel space has no idea what it talks to on the other hand and makes it 100% user space's problem. With TPMs, you typically use /dev/tpm0 to gain raw command access to the target device. So while we could engineer something smarter here, I'm not convinced yet it's a net win.


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879