Re: [PATCH linux-next] powerpc: disable sanitizer in irq_soft_mask_set

From: Christophe Leroy
Date: Tue Aug 23 2022 - 06:02:34 EST




Le 23/08/2022 à 10:33, Michael Ellerman a écrit :
> Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> writes:
>> In ppc, compiler based sanitizer will generate instrument instructions
>> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>>
>> 0xc000000000295cb0 <+0>: addis r2,r12,774
>> 0xc000000000295cb4 <+4>: addi r2,r2,16464
>> 0xc000000000295cb8 <+8>: mflr r0
>> 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
>> 0xc000000000295cc0 <+16>: mflr r0
>> 0xc000000000295cc4 <+20>: std r31,-8(r1)
>> 0xc000000000295cc8 <+24>: addi r3,r13,2354
>> 0xc000000000295ccc <+28>: mr r31,r13
>> 0xc000000000295cd0 <+32>: std r0,16(r1)
>> 0xc000000000295cd4 <+36>: stdu r1,-48(r1)
>> 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
>> 0xc000000000295cdc <+44>: nop
>> 0xc000000000295ce0 <+48>: li r9,1
>> 0xc000000000295ce4 <+52>: stb r9,2354(r31)
>> 0xc000000000295ce8 <+56>: addi r1,r1,48
>> 0xc000000000295cec <+60>: ld r0,16(r1)
>> 0xc000000000295cf0 <+64>: ld r31,-8(r1)
>> 0xc000000000295cf4 <+68>: mtlr r0
>>
>> If there is a context switch before "stb r9,2354(r31)", r31 may
>> not equal to r13, in such case, irq soft mask will not work.
>>
>> This patch disable sanitizer in irq_soft_mask_set.
>>
>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx>
>> ---
>> Dear PPC developers
>>
>> I found this bug when trying to do rcutorture tests in ppc VM of
>> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
>>
>> console.log report following bug:
>>
>> [ 346.527467][ T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
>> [ 346.529416][ T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
>> [ 346.531157][ T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253^M
>> [ 346.533620][ T100] Call Trace:^M
>> [ 346.534449][ T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
>> [ 346.536632][ T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
>> [ 346.538665][ T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
>> [ 346.540830][ T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
>> [ 346.542746][ T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
>> [ 346.544779][ T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
>> [ 346.546851][ T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
>> [ 346.548844][ T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
>> [ 346.550784][ T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
>> [ 346.552555][ T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
>>
>> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
>
> Thanks for spending 12 days debugging it! O_o
>
>> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
>> index 26ede09c521d..a5ae8d82cc9d 100644
>> --- a/arch/powerpc/include/asm/hw_irq.h
>> +++ b/arch/powerpc/include/asm/hw_irq.h
>> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
>> * for the critical section and as a clobber because
>> * we changed paca->irq_soft_mask
>> */
>> -static inline notrace void irq_soft_mask_set(unsigned long mask)
>> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
>> {
>> /*
>> * The irq mask must always include the STD bit if any are set.
>
> My worry is that this will force irq_soft_mask_set() out of line, which
> we would rather avoid. It's meant to be a fast path.
>
> In fact with this applied I see nearly 300 out-of-line copies of the
> function when building a defconfig, and ~1700 calls to it.
>
> Normally it is inlined at every call site.
>
>
> So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
> code irq_soft_mask helpers").

Could you revert it only partially ? In extenso, revert the
READ/WRITE_ONCE and bring back the inline asm in irq_soft_mask_return()
and irq_soft_mask_set(), but keep other changes.

>
> It was a nice looking cleanup, but those loads must not be instrumented
> by KASAN, but we also want them inlined, and AFAICS the only way to
> achieve that is to go back to inline asm.
>

It's a pitty.

Would it be acceptable to have it out of line when KASAN is selected and
inline otherwise ? In that case there is __no_sanitize_or_inline

Christophe