Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail

From: Boqun Feng
Date: Mon Apr 24 2023 - 11:31:00 EST


On Mon, Apr 24, 2023 at 10:13:51AM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Apr 24, 2023 at 11:14:00PM +1000, Michael Ellerman wrote:
> > Boqun Feng <boqun.feng@xxxxxxxxx> writes:
> > > On Sat, Apr 22, 2023 at 09:28:39PM +0200, Joel Fernandes wrote:
> > >> On Sat, Apr 22, 2023 at 2:47 PM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
> > >> > by debugging, I see the r10 is assigned with r13 on c000000000226eb4,
> > >> > but if there is a context-switch before c000000000226edc, a false
> > >> > positive will be reported.
>
> > I've never understood why the compiler wants to make a copy of a
> > register variable into another register!? >:#
>
> It is usually because a) you told it to (maybe via an earlyclobber), or
> b) it looked cheaper. I don't see either here :-(
>
> > > here I think that the compiler is using r10 as an alias to r13, since
> > > for userspace program, it's safe to assume the TLS pointer doesn't
> > > change. However this is not true for kernel percpu pointer.
>
> r13 is a "fixed" register, but that means it has a fixed purpose (so not
> available for allocation), it does not mean "unchanging".
>
> > > The real intention here is to compare 40(r1) vs 3192(r13) for stack
> > > guard checking, however since r13 is the percpu pointer in kernel, so
> > > the value of r13 can be changed if the thread gets scheduled to a
> > > different CPU after reading r13 for r10.
> >
> > Yeah that's not good.
>
> The GCC pattern here makes the four machine insns all stay together.
> That is to make sure to not leak any secret value, which is impossible
> to guarantee otherwise.
>
> What tells GCC r13 can randomly change behind its back? And, what then
> makes GCC ignore that fact?
>
> > > + asm volatile("" : : : "r13", "memory");
>
> Any asm without output is always volatile.
>
> > > Needless to say, the correct fix is to make ppc stack protector aware of
> > > r13 is volatile.
> >
> > I suspect the compiler developers will tell us to go jump :)
>
> Why would r13 change over the course of *this* function / this macro,
> why can this not happen anywhere else?
>
> > The problem of the compiler caching r13 has come up in the past, but I
> > only remember it being "a worry" rather than causing an actual bug.
>
> In most cases the compiler is smart enough to use r13 directly, instead
> of copying it to another reg and then using that one. But not here for
> some strange reason. That of course is a very minor generated machine
> code quality bug and nothing more :-(
>
> > We've had the DEBUG_PREEMPT checks in get_paca(), which have given us at
> > least some comfort that if the compiler is caching r13, it shouldn't be
> > doing it in preemptable regions.
> >
> > But obviously that doesn't help at all with the stack protector check.
> >
> > I don't see an easy fix.
> >
> > Adding "volatile" to the definition of local_paca seems to reduce but
> > not elimate the caching of r13, and the GCC docs explicitly say *not* to
> > use volatile. It also triggers lots of warnings about volatile being
> > discarded.
>
> The point here is to say some code clobbers r13, not the asm volatile?
>
> > Or something simple I haven't thought of? :)
>
> At what points can r13 change? Only when some particular functions are
> called?
>

r13 is the local paca:

register struct paca_struct *local_paca asm("r13");

, which is a pointer to percpu data.

So if a task schedule from one CPU to anotehr CPU, the value gets
changed.

Regards,
Boqun


>
> Segher