Re: Regression: Linux v5.15+ does not boot on Freescale P2020

From: Segher Boessenkool
Date: Tue Jul 26 2022 - 09:46:36 EST


On Tue, Jul 26, 2022 at 11:02:59AM +0200, Arnd Bergmann wrote:
> On Tue, Jul 26, 2022 at 10:34 AM Pali Rohár <pali@xxxxxxxxxx> wrote:
> > On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote:
> > > The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu
> > > implementations actually raise an illegal insn exception on EH=1. It
> > > appears P2020 is one of those.
> >
> > P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason.
> > But in official Freescale/NXP documentation for e500 is documented that
> > lwarx supports also eh=1. Maybe it is not really supported.
> > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page 562)

(page 6-186)

> > At least there is NOTE:
> > Some older processors may treat EH=1 as an illegal instruction.

And the architecture says
Programming Note
Warning: On some processors that comply with versions of the
architecture that precede Version 2.00, executing a Load And Reserve
instruction in which EH = 1 will cause the illegal instruction error
handler to be invoked.

> In commit d6ccb1f55ddf ("powerpc/85xx: Make sure lwarx hint isn't set on ppc32")
> this was clarified to affect (all?) e500v1/v2,

e500v1/v2 based chips will treat any reserved field being set in an
opcode as illegal.

while the architecture says

Reserved fields in instructions are ignored by the processor.

Whoops :-) We need fixes for processor implementation bugs all the
time of course, but this is a massive *design* bug. I'm surprised this
CPU still works as well as it does!

Even the venerable PEM (last updated in 1997) shows the EH field as
reserved, always treated as 0.

> this one apparently
> fixed it before,
> but Christophe's commit effectively reverted that change.
>
> I think only the simple_spinlock.h file actually uses EH=1

That's right afaics.

> and this is not
> included in non-SMP kernels, so presumably the only affected machines were
> the rare dual-core e500v2 ones (p2020, MPC8572, bsc9132), which would
> explain why nobody noticed for the past 9 months.

Also people using an SMP kernel on older cores should see the problem,
no? Or is that patched out? Or does this use case never happen :-)


Segher