RE: [PATCH v2] mlxbf-bootctl: correctly identify secure boot with development keys

From: David Thompson
Date: Thu Nov 30 2023 - 13:24:55 EST


> -----Original Message-----
> From: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> Sent: Wednesday, November 22, 2023 5:49 AM
> To: David Thompson <davthompson@xxxxxxxxxx>
> Cc: Hans de Goede <hdegoede@xxxxxxxxxx>; markgross@xxxxxxxxxx; Vadim
> Pasternak <vadimp@xxxxxxxxxx>; platform-driver-x86@xxxxxxxxxxxxxxx; LKML
> <linux-kernel@xxxxxxxxxxxxxxx>; Khalil Blaiech <kblaiech@xxxxxxxxxx>
> Subject: Re: [PATCH v2] mlxbf-bootctl: correctly identify secure boot with
> development keys
>
> On Tue, 21 Nov 2023, David Thompson wrote:
>
> > The secure boot state of the BlueField SoC is represented by two bits:
> > 0 = production state
> > 1 = secure boot enabled
> > 2 = non-secure (secure boot disabled)
> > 3 = RMA state
> > There is also a single bit to indicate whether production keys or
> > development keys are being used when secure boot is enabled.
>
> Thanks for the extra details but there are more bits that come into play here and
> you mention anything about them. More about this below with the relevant code.
>
> > The current logic in "lifecycle_state_show()" does not handle the case
> > where the SoC is configured for secure boot and is using development
> > keys.
>
> This still doesn't state why the current state is a problem. That is, why "GA
> Secured" is a problem.
>

"GA secured" is when secure boot is enabled with official production keys.
"Secured (development)" is when secure boot is enabled with development keys.
Without this fix "GA Secured" is displayed on development cards which is misleading.

> > This patch updates the logic in "lifecycle_state_show()" to support
> > this combination and properly report this state.
> >
> > Fixes: 79e29cb8fbc5c ("platform/mellanox: Add bootctl driver for
> > Mellanox BlueField Soc")
> > Reviewed-by: Khalil Blaiech <kblaiech@xxxxxxxxxx>
> > Signed-off-by: David Thompson <davthompson@xxxxxxxxxx>
> > ---
> > v1->v2
> > a) commit message was expanded and re-worded for clarity
> > b) replaced use of hardcoded 0x10 with BIT(4) for
> > MLXBF_BOOTCTL_SB_DEV_MASK
> > ---
> > drivers/platform/mellanox/mlxbf-bootctl.c | 24
> > +++++++++++++++++------
> > 1 file changed, 18 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/platform/mellanox/mlxbf-bootctl.c
> > b/drivers/platform/mellanox/mlxbf-bootctl.c
> > index 1ac7dab22c63..13c62a97a6f7 100644
> > --- a/drivers/platform/mellanox/mlxbf-bootctl.c
> > +++ b/drivers/platform/mellanox/mlxbf-bootctl.c
> > @@ -20,6 +20,7 @@
> >
> > #define MLXBF_BOOTCTL_SB_SECURE_MASK 0x03
> > #define MLXBF_BOOTCTL_SB_TEST_MASK 0x0c
> > +#define MLXBF_BOOTCTL_SB_DEV_MASK BIT(4)
>
> You only covered MLXBF_BOOTCTL_SB_SECURE_MASK and
> MLXBF_BOOTCTL_SB_DEV_MASK in your description above, is that correct?
>

When the chip lifecycle is 0 (production), test lifecycle bits can be used to simulate secure boot without having to burn the fuses.
The bits are OR-ed to the real lifecycle bits; thus, we are adding an extra mask to indicated whether we are test mode.

> > #define MLXBF_SB_KEY_NUM 4
> >
> > @@ -40,11 +41,18 @@ static struct mlxbf_bootctl_name boot_names[] = {
> > { MLXBF_BOOTCTL_NONE, "none" },
> > };
> >
> > +enum {
> > + MLXBF_BOOTCTL_SB_LIFECYCLE_PRODUCTION = 0,
> > + MLXBF_BOOTCTL_SB_LIFECYCLE_GA_SECURE = 1,
> > + MLXBF_BOOTCTL_SB_LIFECYCLE_GA_NON_SECURE = 2,
> > + MLXBF_BOOTCTL_SB_LIFECYCLE_RMA = 3
> > +};
> > +
> > static const char * const mlxbf_bootctl_lifecycle_states[] = {
> > - [0] = "Production",
> > - [1] = "GA Secured",
> > - [2] = "GA Non-Secured",
> > - [3] = "RMA",
> > + [MLXBF_BOOTCTL_SB_LIFECYCLE_PRODUCTION] = "Production",
> > + [MLXBF_BOOTCTL_SB_LIFECYCLE_GA_SECURE] = "GA Secured",
> > + [MLXBF_BOOTCTL_SB_LIFECYCLE_GA_NON_SECURE] = "GA Non-
> Secured",
> > + [MLXBF_BOOTCTL_SB_LIFECYCLE_RMA] = "RMA",
> > };
> >
> > /* Log header format. */
> > @@ -254,8 +262,9 @@ static ssize_t lifecycle_state_show(struct device *dev,
> > if (lc_state < 0)
> > return lc_state;
> >
> > - lc_state &=
> > - MLXBF_BOOTCTL_SB_TEST_MASK |
> MLXBF_BOOTCTL_SB_SECURE_MASK;
> > + lc_state &= (MLXBF_BOOTCTL_SB_TEST_MASK |
> > + MLXBF_BOOTCTL_SB_SECURE_MASK |
> > + MLXBF_BOOTCTL_SB_DEV_MASK);
> >
> > @@ -266,6 +275,9 @@ static ssize_t lifecycle_state_show(struct device
> > *dev,
>
> I'm quoting some extra code not fully visible in the contexts:
>
> /*
> * If the test bits are set, we specify that the current state may be
> * due to using the test bits.
> */
> if (lc_state & MLXBF_BOOTCTL_SB_TEST_MASK) {
> lc_state &= MLXBF_BOOTCTL_SB_SECURE_MASK;
>
> Here what is output also depends on MLXBF_BOOTCTL_SB_TEST_MASK, right?
> And those bits even takes precedence over the code you're adding into else if
> branch. So your description in commit message seems quite inadequate to me.
>

Please see above for description of how the test bits are used.

> Note that you've also added an out-of-bound accesses here since only
> MLXBF_BOOTCTL_SB_SECURE_MASK gets cleared from lc_state:
>

The next version of the patch (v3) will clarify this logic and will prevent
any out-of-bound accesses.

> >
> > return sprintf(buf, "%s(test)\n",
> > mlxbf_bootctl_lifecycle_states[lc_state]);
> > + } else if ((lc_state & MLXBF_BOOTCTL_SB_SECURE_MASK) ==
> MLXBF_BOOTCTL_SB_LIFECYCLE_GA_SECURE
> > + && (lc_state & MLXBF_BOOTCTL_SB_DEV_MASK)) {
> > + return sprintf(buf, "Secured (development)\n");
> > }
> >
> > return sprintf(buf, "%s\n",
> > mlxbf_bootctl_lifecycle_states[lc_state]);
>
> Here's another potential out-of-bound access if the holes in the above if logic
> aligns.
>
> --
> i.