Re: [PATCH] docs: security: Confidential computing intro and threat model

From: Carlos Bilbao
Date: Wed Apr 26 2023 - 11:09:34 EST


Hello Sean,

On 4/26/23 8:32 AM, Reshetova, Elena wrote:
> Hi Sean,
>
> Thank you for your review! Please see my comments inline.
>
>> On Mon, Mar 27, 2023, Carlos Bilbao wrote:
>>> +Kernel developers working on confidential computing for the cloud operate
>>> +under a set of assumptions regarding the Linux kernel threat model that
>>> +differ from the traditional view. In order to effectively engage with the
>>> +linux-coco mailing list and contribute to its initiatives, one must have a
>>> +thorough familiarity with these concepts. This document provides a concise,
>>> +architecture-agnostic introduction to help developers gain a foundational
>>
>> Heh, vendor agnostic maybe, but certainly not architecture agnostic.
>
> I guess it depends where you draw a distinction between vendor and architecture.
> What was meant here is that we try to write down the overall threat model
> and high-level design that existing technologies use today.
> But I don’t mind change to vendor agnostic, if it seems more correct.
>
>>
>>> +understanding of the subject.
>>> +
>>> +Overview and terminology
>>> +========================
>>> +
>>> +Confidential Cloud Computing (CoCo) refers to a set of HW and SW
>>
>> As per Documentation/security/secrets/coco.rst and every discussion I've
>> observed,
>> CoCo is Confidential Computing. "Cloud" is not part of the definition. That's
>> true even if this discussion is restricted to CoCo VMs, e.g. see pKVM.
>
> Yes, I personally not sure we have a single good term to describe this particular
> angle of confidential computing. Generally Confidential Computing can mean
> any CoCo technology, including things that do not relate to virtualization (like SGX).
> This document doesn’t attempt to cover all CoCo, but only a subset of them that
> relates to virtualization. Academia researches have been using term "Confidential Cloud
> Computing" (quick search on google scholar gives relevant papers), so this was
> a reason to adapt this term. If you have a better proposal, please tell.
>
>>
>>> +virtualization technologies that allow Cloud Service Providers (CSPs) to
>>
>> Again, CoCo isn't just for cloud use cases.
>
> See above.
>
>>
>>> +provide stronger security guarantees to their clients (usually referred to
>>> +as tenants) by excluding all the CSP's infrastructure and SW out of the
>>> +tenant's Trusted Computing Base (TCB).
>>
>> This is inaccurate, the provider may still have software and/or hardware in the
>> TCB.
>
> Well, this is the end goal where we want to be, the practical deployment can
> differ of course. We can rephrase that it "allows to exclude all the CSP's
> infrastructure and SW out of tenant's TCB."
>
>>
>> And for the cloud use case, I very, very strongly object to implying that the goal
>> of CoCo is to exclude the CSP from the TCB. Getting out of the TCB is the goal for
>> _some_ CSPs, but it is not a fundamental tenant of CoCo. This viewpoint is
>> heavily
>> tainted by Intel's and AMD's current offerings, which effectively disallow third
>> party code for reasons that have nothing to do with security.
>>
>> https://lore.kernel.org/all/Y+aP8rHr6H3LIf%2Fc@xxxxxxxxxx
>
> I am not fully sure what you imply with this. Minimal TCB is always a good goal
> from security point of view (less hw/sw equals less bugs). From a tenant point
> of view of course it is question of risk evaluation: do they think that CSP stack
> has a higher chance to have a bug that can be exploited or SW provided by
> HW vendors? You seem to imply that some tenants might consider CSP stack to
> be more robust? If so, why would they use CoCo? In this case they are better off
> with just normal legacy VMs, no?
>
>
>>
>>> +While the concrete implementation details differ between technologies, all
>>> +of these mechanisms provide increased confidentiality and integrity of CoCo
>>> +guest memory and execution state (vCPU registers), more tightly controlled
>>> +guest interrupt injection,
>>
>> This is highly dependent on how "interrupt" is defined, and how "controlled" is
>> defined.
>
> As you know there are some limitations on what type of interrupt vectors can be
> injected into a TD guest. Vectors 0-30 are not injectable. This is what is meant by
> "more tightly controlled".
>
>>
>>> as well as some additional mechanisms to control guest-host page mapping.
>>
>> This is flat out wrong for SNP for any reasonable definition of "page mapping".
>> SNP has _zero_ "control" over page tables, which is most people think of when
>> they
>> see "page mapping".
>
> Leaving for AMD guys to comment.

In SNP, the guest controls the association of a guest physical address to a
host physical address, so that the host can't switch that through the nested
page tables [1]. We will be more specific to avoid interpretations.

>
>>
>>> More details on the x86-specific solutions can be
>>> +found in
>>> +:doc:`Intel Trust Domain Extensions (TDX) </x86/tdx>` and
>>> +:doc:`AMD Memory Encryption </x86/amd-memory-encryption>`.
>>
>> So by the above definition, vanilla SEV and SEV-ES can't be considered CoCo. SEV
>> doesn't provide anything besides increased confidentiality of guest memory, and
>> SEV-ES doesn't provide integrity or validation of physical page assignment.
>>
>
> Same
>

Personally, I think it's reasonable to mention SEV/SEV-ES in the context of
confidential computing and acknowledge their relevance in this area.

But there is no mention to SEV or SEV-ES in this draft. And the document we
reference there covers AMD-SNP, which provides integrity.


>>> +The basic CoCo layout includes the host, guest, the interfaces that
>>> +communicate guest and host, a platform capable of supporting CoCo,
>>
>> CoCo VMs...
>
> Will fix.
>
>>
>>> and an intermediary between the guest virtual machine (VM) and the
>>> underlying platform that acts as security manager::
>>
>> Having an intermediary is very much an implementation detail.
>
> True, but it is kind of big component, so completely omitting it doesn’t sound
> right to me either.
>
>>
>>> +Confidential Computing threat model and security objectives
>>> +===========================================================
>>> +
>>> +Confidential Cloud Computing adds a new type of attacker to the above list:
>>> +an untrusted and potentially malicious host.
>>
>> I object to splattering "malicious host" everywhere. Many people are going to
>> read this and interpret "host" as "the CSP", and then make assumptions like
>> "CoCo assumes the CSP is malicious!". AIUI, the vast majority of use cases aren't
>> concerned so much about "the CSP" being malicious, but rather they're
>> concerned
>> about new attack vectors that come with running code/VMs on a stack that is
>> managed by a third party, on hardware that doesn't reside in a secured facility,
>> etc.
>
> I see your point. I propose to add paragraph in the beginning that explains that
> CSPs do not intend to be malicious (at least we hope they dont), but since they
> have a big codebase to manage, bugs in that codebase are normal and CoCo
> helps to protect tenants against this situations. Also change "malicious host" to
> "unintentionally misbehaving host" or smth like this.
>
>>
>>> +While the traditional hypervisor has unlimited access to guest data and
>>> +can leverage this access to attack the guest, the CoCo systems mitigate
>>> +such attacks by adding security features like guest data confidentiality
>>> +and integrity protection. This threat model assumes that those features
>>> +are available and intact.
>>
>> Again, if you're claiming integrity is a key tenant, then SEV and SEV-ES can't be
>> considered CoCo.

Again, nobody mentioned SEV/SEV-ES here.

>>
>>> +The **Linux kernel CoCo security objectives** can be summarized as follows:
>>> +
>>> +1. Preserve the confidentiality and integrity of CoCo guest private memory.
>>
>> So, registers are fair game?
>
> No, you are right, needs to be augmented here. What we meant here is that
> the end goal of the attacker is the tenant secrets and they can also be in registers.
>
>>
>>> +2. Prevent privileged escalation from a host into a CoCo guest Linux kernel.
>>> +
>>> +The above security objectives result in two primary **Linux kernel CoCo
>>> +assets**:
>>> +
>>> +1. Guest kernel execution context.
>>> +2. Guest kernel private memory.
>>
>> ...
>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index 7f86d02cb427..4a16727bf7f9 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -5307,6 +5307,12 @@ S: Orphan
>>> W: http://accessrunner.sourceforge.net/
>>> F: drivers/usb/atm/cxacru.c
>>>
>>> +CONFIDENTIAL COMPUTING THREAT MODEL
>>
>> This is not generic CoCo documentation, it's specific to CoCo VMs. E.g. SGX is
>> most definitely considered a CoCo feature, and it has no dependencies
>> whatsoever
>> on virtualization.
>
> Yes, so how we call it? CoCo VM is a term for a running entity.
> That's why the academic term Confidential Cloud Computing was used in the
> beginning, but you didn’t like it either.
>
>>
>>> +M: Elena Reshetova <elena.reshetova@xxxxxxxxx>
>>> +M: Carlos Bilbao <carlos.bilbao@xxxxxxx>
>>
>> I would love to see an M: or R: entry for someone that is actually _using_ CoCo.
>
> Would be more than welcomed!
>
>>
>> IMO, this document is way too Intel/AMD centric.
>
> Anyone is free to comment/participate on writing this and help us to adjust to
> even further to the rest of vendors, because for us it is hard to know details and
> applicability for other hw vendors.
> Adding Rivos guys now explicitly to CC list. >

I'm sure we can find a common ground for this document.

>
> Best Regards,
> Elena.

Thanks,
Carlos

[1] https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf