Re: External USB disks not recognized with v6.1.8 when using Xen

From: Linus Torvalds
Date: Thu Feb 02 2023 - 15:25:03 EST


On Thu, Feb 2, 2023 at 3:38 AM Christian Kujau <lists@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, 1 Feb 2023, Juergen Gross wrote:
> > On 31.01.23 23:50, Christian Kujau wrote:
> > > [Leaving the full quote below for reference and adding more appropriate
> > > people.]
> > >
> > > After a far too long round of git-bisect I narrowed it down to:
> > >
> > > c1c59538337ab6d45700cb4a1c9725e67f59bc6e is the first bad commit
> > >
> > > x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
> > > commit 90b926e68f500844dff16b5bcea178dc55cf580a upstream.
> > >
> > > And indeed, reverting this single commit from v6.1.8 (stable) makes the
> > > disks appear again.
>
> This also happens on mainline, not only in stable. Reverting this patch
> from 6.2-rc6 makes the disks appear again.

I think the patch is simply wrong and should be reverted.

The way hardware works, MTRR_TYPE_INVALID implies UC-, not WB.

So that commit 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR
disabled case") is simply wrong to say "disabled means the same as
WB",.

That said, I think mtrr_type_lookup() is wrong too. It has two bugs

(a) it basically returns the wrong mtrr type for the "not enabled" conditions

(b) it doesn't set the "uniform" bit for said conditions, which then
causes problems for callers - the hugepage case in particular only
checks for that MTRR_TYPE_INVALID case because of this.

(c) it sets is_uniform wrongly for the fixed mtrr case, but I guess
the only thing that cares is largepage, so it works

and I think the !CONFIG_MTRR case has the same issue.

I'm not convinced it *ever* makes sense for mtrr_type_lookup() to
return MTRR_TYPE_INVALID (it makes sense for the helper functions to
do so to let the code know to look at other mtrrs, but not the final
lookup).

And at a minimum, the !MTRR_STATE_MTRR_ENABLED case seems very wrong -
if mtrr is disabled, it should return 'def_type', no?>

So I think that commit should be reverted as broken, and then people
should *maybe* look at something like this (intentionally whitespace
damaged, and people should *really* think about what the
MTRR_TYPE_INVALID case should be - returning UC- is probably what is
closest to "this is what the hardware does", but maybe doesn't make
sense for the largepage case, which might as well just always use
largepages in that case?)

--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -53,7 +53,8 @@ static inline u8 mtrr_type_lookup(..
/*
* Return no-MTRRs:
*/
- return MTRR_TYPE_INVALID;
+ *uniform = 1;
+ return MTRR_TYPE_INVALID; /* ??? */
}
#define mtrr_save_fixed_ranges(arg) do {} while (0)
#define mtrr_save_state() do {} while (0)
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -261,11 +261,13 @@ u8 mtrr_type_lookup(..
/* Make end inclusive instead of exclusive */
end--;

+ type = MTRR_TYPE_INVALID; /* ??? */
if (!mtrr_state_set)
- return MTRR_TYPE_INVALID;
+ goto out;

+ type = mtrr_state.def_type;
if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
- return MTRR_TYPE_INVALID;
+ goto out;

/*
* Look up the fixed ranges first, which take priority over

But note how I think the !MTRR_STATE_MTRR_ENABLED case really should
return the default mtrr type, and that looks fairly unambiguous to me.

Hmm?

Linus