Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail

From: Zhouyi Zhou
Date: Tue Apr 25 2023 - 21:31:34 EST


On Wed, Apr 26, 2023 at 8:33 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Apr 25, 2023 at 9:50 AM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
> >
> > Hi
> >
> > On Tue, Apr 25, 2023 at 9:40 PM Christophe Leroy
> > <christophe.leroy@xxxxxxxxxx> wrote:
> > >
> > >
> > >
> > > Le 25/04/2023 à 13:06, Joel Fernandes a écrit :
> > > > On Tue, Apr 25, 2023 at 6:58 AM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
> > > >>
> > > >> hi
> > > >>
> > > >> On Tue, Apr 25, 2023 at 6:13 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >>>
> > > >>> On Mon, Apr 24, 2023 at 02:55:11PM -0400, Joel Fernandes wrote:
> > > >>>> This is amazing debugging Boqun, like a boss! One comment below:
> > > >>>>
> > > >>>>>>> Or something simple I haven't thought of? :)
> > > >>>>>>
> > > >>>>>> At what points can r13 change? Only when some particular functions are
> > > >>>>>> called?
> > > >>>>>>
> > > >>>>>
> > > >>>>> r13 is the local paca:
> > > >>>>>
> > > >>>>> register struct paca_struct *local_paca asm("r13");
> > > >>>>>
> > > >>>>> , which is a pointer to percpu data.
> > > >>>>>
> > > >>>>> So if a task schedule from one CPU to anotehr CPU, the value gets
> > > >>>>> changed.
> > > >>>>
> > > >>>> It appears the whole issue, per your analysis, is that the stack
> > > >>>> checking code in gcc should not cache or alias r13, and must read its
> > > >>>> most up-to-date value during stack checking, as its value may have
> > > >>>> changed during a migration to a new CPU.
> > > >>>>
> > > >>>> Did I get that right?
> > > >>>>
> > > >>>> IMO, even without a reproducer, gcc on PPC should just not do that,
> > > >>>> that feels terribly broken for the kernel. I wonder what clang does,
> > > >>>> I'll go poke around with compilerexplorer after lunch.
> > > >>>>
> > > >>>> Adding +Peter Zijlstra as well to join the party as I have a feeling
> > > >>>> he'll be interested. ;-)
> > > >>>
> > > >>> I'm a little confused; the way I understand the whole stack protector
> > > >>> thing to work is that we push a canary on the stack at call and on
> > > >>> return check it is still valid. Since in general tasks randomly migrate,
> > > >>> the per-cpu validation canary should be the same on all CPUs.
> > > >>>
> > > >>> Additionally, the 'new' __srcu_read_{,un}lock_nmisafe() functions use
> > > >>> raw_cpu_ptr() to get 'a' percpu sdp, preferably that of the local cpu,
> > > >>> but no guarantees.
> > > >>>
> > > >>> Both cases use r13 (paca) in a racy manner, and in both cases it should
> > > >>> be safe.
> > > >> New test results today: both gcc build from git (git clone
> > > >> git://gcc.gnu.org/git/gcc.git) and Ubuntu 22.04 gcc-12.1.0
> > > >> are immune from the above issue. We can see the assembly code on
> > > >> http://140.211.169.189/0425/srcu_gp_start_if_needed-gcc-12.txt
> > > >>
> > > >> while
> > > >> Both native gcc on PPC vm (gcc version 9.4.0), and gcc cross compiler
> > > >> on my x86 laptop (gcc version 10.4.0) will reproduce the bug.
> > > >
> > > > Do you know what fixes the issue? I would not declare victory yet. My
> > > > feeling is something changes in timing, or compiler codegen which
> > > > hides the issue. So the issue is still there but it is just a matter
> > > > of time before someone else reports it.
> > > >
> > > > Out of curiosity for PPC folks, why cannot 64-bit PPC use per-task
> > > > canary? Michael, is this an optimization? Adding Christophe as well
> > > > since it came in a few years ago via the following commit:
> > >
> > > It uses per-task canary. But unlike PPC32, PPC64 doesn't have a fixed
> > > register pointing to 'current' at all time so the canary is copied into
> > > a per-cpu struct during _switch().
> > >
> > > If GCC keeps an old value of the per-cpu struct pointer, it then gets
> > > the canary from the wrong CPU struct so from a different task.
> > This is a fruitful learning process for me!
>
> Nice work Zhouyi..
Thank Joel for your encouragement! Your encouragement is very
important to me ;-)
>
> > Christophe:
> > Do you think there is still a need to bisect GCC ? If so, I am very
> > glad to continue
>
> my 2 cents: It would be good to write a reproducer that Segher
> suggested (but that might be hard since you depend on the compiler to
> cache the r13 -- maybe some trial/error with CompilerExplorer will
> give you the magic recipe?).
I have reported to GCC bugzilla once before ;-) [1]
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88348
I think we could provide a preprocessed .i file, and give the command
line that invokes cc1,
The problem is the newest GCC is immune to our issue ;-(
>
> If I understand Christophe correctly, the issue requires the following
> ingredients:
> 1. Task A is running on CPU 1, and the task's canary is copied into
> the CPU1's per-cpu area pointed to by r13.
> 2. r13 is now cached into r10 in the offending function due to the compiler.
> 3. Task A running on CPU 1 now gets preempted right in the middle of
> the offending SRCU function and gets migrated to CPU 2.
> 4. CPU 2's per-cpu canary is updated to that of task A since task A
> is the current task now.
> 5. Task B now runs on CPU 1 and the per-cpu canary on CPU 1 is now that of B.
> 6. Task A exits the function, but stack checking code reads r10 which
> contains CPU 1's canary which is that of task B!
> 7. Boom.
Joel makes the learning process easier for me, indeed!
One question I have tried very hard to understand, but still confused.
for now, I know
r13 is fixed, but r1 is not, why "r9,40(r1)"'s 40(r1) can be assumed
to be equal to 3192(r10).
Thanks in advance.
>
> So the issue is precisely in #2. The issue is in the compiler that it
> does not treat r13 as volatile as Boqun had initially mentioned.
Please do not hesitate to email me if there is anything I can do (for
example bisecting ;-)). I am very glad to be of help ;-)

Cheers
Zhouyi
>
> - Joel
>
>
>
> >
> > Cheers
> > Zhouyi
> > >
> > > Christophe
> > >
> > > >
> > > > commit 06ec27aea9fc84d9c6d879eb64b5bcf28a8a1eb7
> > > > Author: Christophe Leroy <christophe.leroy@xxxxxx>
> > > > Date: Thu Sep 27 07:05:55 2018 +0000
> > > >
> > > > powerpc/64: add stack protector support
> > > >
> > > > On PPC64, as register r13 points to the paca_struct at all time,
> > > > this patch adds a copy of the canary there, which is copied at
> > > > task_switch.
> > > > That new canary is then used by using the following GCC options:
> > > > -mstack-protector-guard=tls
> > > > -mstack-protector-guard-reg=r13
> > > > -mstack-protector-guard-offset=offsetof(struct paca_struct, canary))
> > > >
> > > > Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>
> > > > Signed-off-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
> > > >
> > > > - Joel