Re: [RFC v1 00/26] Add TDX Guest Support

From: Dave Hansen
Date: Fri Apr 02 2021 - 11:27:42 EST


On 4/1/21 7:48 PM, Andi Kleen wrote:
>> I've heard things like "we need to harden the drivers" or "we need to do
>> audits" and that drivers might be "whitelisted".
>
> The basic driver allow listing patches are already in the repository,
> but not currently posted or complete:
>
> https://github.com/intel/tdx/commits/guest

That lists exactly 8 ids:

> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1000 }, /* Virtio NET */
> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1001 }, /* Virtio block */
> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1003 }, /* Virtio console */
> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1009 }, /* Virtio FS */
>
> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041 }, /* Virtio 1.0 NET */
> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1042 }, /* Virtio 1.0 block */
> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1043 }, /* Virtio 1.0 console */
> { PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1049 }, /* Virtio 1.0 FS */

How many places do those 8 drivers touch MMIO?

>> Are there any "real" hardware drivers
>> involved like how QEMU emulates an e1000 or rtl8139 device?
>
> Not currently (but some later hypervisor might rely on one of those)
>
>> What about
>> the APIC or HPET?
>
> No IO-APIC, but the local APIC. No HPET.

Sean seemed worried about other x86-specific oddities. Are there any
more, or is the local APIC the only non-driver MMIO?

>> Without something concrete, it's really hard to figure out if we should
>> go full-blown paravirtualized MMIO, or do something like the #VE
>> trapping that's in this series currently.
>
> As Sean says the concern about MMIO is less drivers (which should
> be generally ok if they work on other architectures which require MMIO
> magic), but other odd code that only ran on x86 before.
>
> I really don't understand your crusade against #VE. It really
> isn't that bad if we can avoid the few corner cases.

The problem isn't with #VE per se. It's with posting a series that
masquerades as a full solution while *NOT* covering or even enumerating
the corner cases. That's exactly what happened with #VE to start with:
it was implemented in a way that exposed the kernel to #VE during the
syscall gap (and the SWAPGS gap for that matter).

So, I'm pushing for a design that won't have corner cases. If MMIO
itself is disallowed, then we can scream about *any* detected MMIO.
Then, there's no worry about #VE nesting. No #VE, no #VE nesting. We
don't even have to consider if #VE needs NMI-like semantics.

> For me it would seem wrong to force all MMIO for all drivers to some
> complicated paravirt construct, blowing up code side everywhere
> and adding complicated self modifying code, when it's only needed for very
> few drivers. But we also don't want to patch every MMIO to be special cased
> even those few drivers.
>
> #VE based MMIO avoids all that cleanly while being nicely non intrusive.

But, we're not selling used cars here. Using #VE is has downsides.
Let's not pretend that it doesn't.

If we go this route, what are the rules and restrictions? Do we have to
say "no MMIO in #VE"?

I'm really the most worried about the console. Consoles and NMIs have
been a nightmare, IIRC. Doesn't this just make it *WORSE* because now
the deepest reaches of the console driver are guaranteed to #VE?

Which brings up another related point: How do you debug TD guests? Does
earlyprintk work?