Re: [PATCH] docs: security: Confidential computing intro and threat model

From: James Bottomley
Date: Thu Apr 27 2023 - 12:17:48 EST


On Thu, 2023-04-27 at 15:47 +0000, Reshetova, Elena wrote:
> > On Thu, 2023-04-27 at 12:43 +0000, Reshetova, Elena wrote:
> > >
> > > > On Wed, Apr 26, 2023, James Bottomley wrote:
> > > > > On Wed, 2023-04-26 at 13:32 +0000, Reshetova, Elena wrote:
> > [...]
> > > > > > the practical deployment can differ of course. We can
> > > > > > rephrase that it "allows to exclude all the CSP's
> > > > > > infrastructure and SW out of tenant's TCB."
> > > > >
> > > > > That's getting even more inaccurate.  To run  in a Cloud with
> > > > > CoCo you usually have to insert some provided code, like OVMF
> > > > > and, for AMD, the SVSM.  These are often customized by the
> > > > > CSP to suit the cloud infrastructure, so you're running their
> > > > > code.  The goal, I think, is to make sure you only run code
> > > > > you trust (some of which may come from the CSP) in your TCB,
> > > > > which is very different from the statement above.
> > > >
> > > > Yes.  And taking things a step further, if we were to ask
> > > > security concious users what they would choose to have in their
> > > > TCB: (a) closed-source firmware written by a hardware vendor,
> > > > or (b) open-source software that is provided by CSPs, I am
> > > > betting the overwhelming majority would choose (b).
> > >
> > > As I already replied in my earlier message from yesterday, yes,
> > > this is the choice that anyone has and it is free to make this
> > > choice. No questions asked. (Btw, please note that the above
> > > statement is not 100% accurate since the source code for intel
> > > TDX module is at least public). However, if as you said the
> > > majority choose (b), why do they need to enable the Confidential
> > > cloud computing technologies like TDX or SEV-SNP? If they choose
> > > (b), then the whole threat model described in this document do
> > > not simply apply to them and they can forget about anything that
> > > we try to describe here.
> >
> > I think the problem is that the tenor of the document is that the
> > CSP should be seen as the enemy of the tenant.
>
> We didn’t intend this interpretation and it can be certainly be fixed
> if people see it this way.
>
> Whereas all CSP's want to be
> > seen as the partner of the tenant (admittedly so they can upsell
> > services). In particular, even if you adopt (b) there are several
> > reasons why you'd use confidential computing:
> >
> >    1. Protection from other tenants who break containment in the
> > cloud. These tenants could exfiltrate data from Non-CoCo VMs, but
> > likely would be detected before they had time to launch an attack
> > using vulnerabilities in the current linux device drivers.
>
> Not sure how this "likely to be detected" is going to happen in
> practice.

How do you arrive at that conclusion? Detecting malicious tenant
behaviour is bread and butter for clouds ... especially as a nasty
cloud break out is a potentially business destroying event.

> If you have a known vulnerability against a CoCo VM (let say in a
> device driver interface it exposes), is it so much more difficult for
> an attacker to break into CoCo VM vs non-CoCo VM before it is
> detected?

It's a question of practicality. Given that a tenant has broken
containment and potentially escalated to root, what, in addition, would
they have to do to exfiltrate data from a CoCo VM. The more they have
to do to launch the attack, the greater the chance of their being
detected.

> >    2. Legal data security.  There's a lot of value in a CSP being
> > able to make the legal statement that it does not have access to a
> > customer data because of CoCo.
>
> Let's leave legal out of technical discussion, not my area.

It *is* a technical argument. This is about compliance and Data
Sovereignty, which are both services most clouds are interested in
providing because they're a potentially huge and fast growing market.

> >    3. Insider threats (bribe a CSP admin employee).  This one might
> > get as far as trying to launch an attack on a CoCo VM, but having
> > checks at the CSP to detect and defeat this would work
> > instead of every insider threat having to be defeated inside the
> > VM.
>
> Ok, this angle might be valid from CSP point of view, i.e. noticing
> such insider attacks might be easier I guess with CoCo VMs.
>
> >
> > In all of those cases (which are not exhaustive) you can regard the
> > CSP as a partner of the tenant when it comes to preventing and
> > detecting threats to the CoCo VM, so extreme device driver
> > hardening becomes far less relevant to these fairly considerable
> > use cases.
>
> I think the first case still holds, as well as one case that you have
> not listed: a remote attacker attacking CSP stack using some
> discovered and not yet fixed vulnerability (stack is big, bugs
> happen), getting control of CSP stack and then going after the CoCo
> VMs to see what it can get there.

Well, that's not really any different from a containment break. Most
cloud security analysis is performed by outside entities who start with
"an attacker has gained root on your compute platform, what can they
do?". So they skip the how and move straight to what is the threat
potential.

> What you are saying is that you (as CSP) maintain the good first
> level defense to prevent attacker to get control of your/CSP stack to
> begin with. What we try to do is the next level of defense (very
> typical in any security): we assume that first line of defense has
> been broken for some reason and now there is a second one placed to
> actually protect customers end data. 

Well, that's where cloud security analyses also start. However, what
you've missed is that the cloud detecting the attack and usually
shutting down the node is a valid response. Clouds actually invest
significantly in intrusion detection and remediation systems for this
reason.

> > > Now from the pure security point of view the choice between (a)
> > > and (b) is not so easily done imo. Usually we take into account
> > > many factors that affect the risk/chances that certain piece of
> > > SW has a higher risk of having vulnerabilities. This includes the
> > > size of the codebase, its complexity, its attack surface exposure
> > > towards external interfaces, level of testing, whenever the code
> > > is public, code dependency chains, etc. Smaller codebase with no
> > > dependencies and small set of exposed interfaces is usually
> > > easier to review from security point of view given that the code
> > > is public.
> >
> > This reads like an argument that, from a security point of view,
> > smaller proprietary code is better than larger, open source, code.
> > I really don't think we want to open this can of worms.
>
> I don’t think I have made this statement: the code *has to be public*
> for anyone to review and I did explicitly list this in the statement
> above as "given that the code is public".

Public but not open source is still a problem. The federal government
has walked into several cloud accounts demanding a source code security
review, which means the code was made public to them but not generally.
Without all customers or some third party being able to build the code
and verify it (or ideally supply it ... think something like Red Hat
built the OVMF code this cloud is using and you can prove it using
their build signatures) how do you know the source you're given
corresponds to the binary the signature verifies.

>   Only thing I meant is that it is not not so easy to make a call
> between (a) and (b) in all cases from a pure security point of view.

Proper governance is usually listed as a requirement for security.
Public but not Open Source usually exists because of governance or
control issues, which can be cited as a security risk. After all,
whoever does this must have some reason for not running an open source
project in a security critical area.

James