çå: çå: loop nesting in alignment exception and machine check

From: Wangshaobo (bobo)
Date: Thu Oct 31 2019 - 21:57:37 EST


Hi, Christophe

I am sorry that we are in some troubles for some unpredictable problems when we replay and haven't given you a quick reply.

I also want to ask does the phenomeon(use memcpy_toio when copy ioremap_address) only occurs in powerpc ? does any other
arch also has the same problem ? we are in persuit of asking why this phenomenon happened. Our linux kernel version is 4.4.

thanks very much.

-----éäåä-----
åää: Christophe Leroy [mailto:christophe.leroy@xxxxxx]
åéæé: 2019å10æ31æ 19:13
æää: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx>
æé: chengjian (D) <cj.chengjian@xxxxxxxxxx>; Libin (Huawei) <huawei.libin@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>; zhangyi (F) <yi.zhang@xxxxxxxxxx>
äé: Re: çå: loop nesting in alignment exception and machine check

Hi,

Did you try ? Does it work ?

Christophe

Le 28/10/2019 Ã 06:57, Wangshaobo (bobo) a ÃcritÂ:
> Hi,Christophe
>
> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
>
> -----éäåä-----
> åää: Christophe Leroy [mailto:christophe.leroy@xxxxxx]
> åéæé: 2019å10æ26æ 19:20
> æää: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx>
> æé: linux-arch@xxxxxxxxxxxxxxx; alistair@xxxxxxxxxxxx; chengjian (D)
> <cj.chengjian@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx; oss@xxxxxxxxxxxx; paulus@xxxxxxxxx;
> Libin (Huawei) <huawei.libin@xxxxxxxxxx>; agust@xxxxxxx;
> linuxppc-dev@xxxxxxxxxxxxxxxx
> äé: Re: loop nesting in alignment exception and machine check
>
> Hi,
>
> Le 26/10/2019 Ã 09:23, Wangshaobo (bobo) a ÃcritÂ:
>> Hi,
>>
>> I encountered a problem about a loop nesting occurred in
>> manufacturing the alignment exception in machine check, trigger background is :
>>
>> problem:
>>
>> machine checkout or critical interrupt ->â->kbox_write[for recording
>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)â
>>
>> when we enter memcpy,a command âdcbz r11,r6â will cause a alignment
>> exception, in this situation,r11 loads the ioremap address,which
>> leads to the alignment exception,
>
> You can't use memcpy() on something else than memory.
>
> For an ioremapped area, you have to use memcpy_toio()
>
> Christophe
>
>>
>> then the command can not be process successfully,as we still in
>> machine check.at the end ,it triggers a new irq machine check in irq
>> handler function,a loop nesting begins.
>>
>> analysis:
>>
>> We have analysed a lot,but it still can not come to a reasonable
>> description,in common,the alignment triggered in machine check
>> context can still be collected into the Kbox
>>
>> after alignment exception be handled by handler function, but how
>> does the machine checkout can be triggered in the handler fucntion
>> for any causes? We print relevant registers
>>
>> as follow when first enter machine check and alignment exception
>> handler
>> function:
>>
>> ÂÂÂÂÂÂÂÂ MSR:0x2ÂÂÂÂÂ MSR:0x0
>>
>> ÂÂÂÂÂÂÂÂ SRR1:0x2ÂÂÂÂÂ SRR1:0x21002
>>
>> ÂÂÂÂÂÂÂÂ But the manual says SRR1 should be set to MSR(0x2),why
>> that happened ?
>>
>> ÂÂÂÂÂÂÂÂ Then a branch in handler function copy the SRR1 to
>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>
>> Conclusion:
>>
>> ÂÂÂÂÂÂÂÂ 1)Â why the alignment exception can not be handled in
>> machine check ?
>>
>> ÂÂÂÂÂÂÂÂ 2)Â besides memcpy,any other function can cause the
>> alignment exception ?
>>
>> We still recurrent it, the line as follows:
>>
>> ÂÂÂÂÂÂÂÂ Cpu dead lock->watch log->trigger
>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>
>> ÂÂÂÂÂÂÂÂ but for those problems as below,what the kbox printed is empty.
>>
>> ------------------kbox restart:[ÂÂ 10.147594]----------------
>>
>> kbox verify fs magic fail
>>
>> kbox mem mabye destroyed, format it
>>
>> kbox: load OK
>>
>> lock-task: major[249] minor[0]
>>
>> -----start show_destroyed_kbox_mem_head----
>>
>> 00000000: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000010: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000020: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000030: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000040: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000050: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000060: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000070: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000080: 00000000 00000000 00000000 00000000Â ................
>>
>> 00000090: 00000000 00000000 00000000 00000000Â ................
>>