Re: [PATCH V4 23/23] crypto: hisilicon/qm: Workaround to enable build with RISC-V clang

From: Arnd Bergmann
Date: Wed Apr 19 2023 - 10:35:38 EST


On Tue, Apr 11, 2023, at 13:42, Weili Qian wrote:
> On 2023/4/5 16:16, Arnd Bergmann wrote:
>> On Tue, Apr 4, 2023, at 20:20, Sunil V L wrote:
>>> With CONFIG_ACPI enabled for RISC-V, this driver gets enabled in
>>> allmodconfig build. The gcc tool chain builds this driver removing the
>>> inline arm64 assembly code. However, clang for RISC-V tries to build
>>> the arm64 assembly and below error is seen.
>>>
>>> drivers/crypto/hisilicon/qm.c:627:10: error: invalid output constraint
>>> '+Q' in asm
>>> "+Q" (*((char __iomem *)fun_base))
>>> ^
>>> It appears that RISC-V clang is not smart enough to detect
>>> IS_ENABLED(CONFIG_ARM64) and remove the dead code.
>>>
>>> As a workaround, move this check to preprocessing stage which works
>>> with the RISC-V clang tool chain.
>>>
>>> Signed-off-by: Sunil V L <sunilvl@xxxxxxxxxxxxxxxx>
>>
>> Your patch looks correct for this particular problem, but I
>> see that there are a couple of other issues in the same function:
>>
>>> - }
>>> +#if IS_ENABLED(CONFIG_ARM64)
>>> + unsigned long tmp0 = 0, tmp1 = 0;
>>>
>>> asm volatile("ldp %0, %1, %3\n"
>>> "stp %0, %1, %2\n"
>>> @@ -627,6 +623,11 @@ static void qm_mb_write(struct hisi_qm *qm, const
>>> void *src)
>>> "+Q" (*((char __iomem *)fun_base))
>>> : "Q" (*((char *)src))
>>> : "memory");
>>
>> For the arm64 version:
>>
>> - the "dmb oshst" barrier needs to come before the stp, not after
>> it, otherwise there is no guarantee that data written to memory
>> is visible by the device when the mailbox gets triggered
>> - The input/output arguments need to be pointers to 128-bit types,
>> either a struct or a __uint128_t
>> - this lacks a byteswap on big-endian kernels
> Sorry for the late reply.
>
> - the execution order relies on the data dependency between ldp and stp:
> load "src" to "tmp0" and "tmp1", then
> store "tmp0" and "tmp1" to "fun_base";

Not entirely sure how that data dependency would help serialize
the store into the DMA buffer against the device access. The problem
here is not the qm_mailbox structure but the data pointed to by the
'u64 base' (e.g. struct qm_eqc *eqc) which may still be in a store
buffer waiting to make it to physical memory at the time the mailbox
store triggers the DMA from the device.

> The "dmb oshst" is used to ensure that the stp instruction has been executed
> before CPU checking mailbox status. Whether the execution order
> cannot be guaranteed via data dependency?

There is no need to have barriers between MMIO operations, they
are implicitly serialized already. In this case specifically,
the read is even on the same address as the write. Note that the
"dmb oshst" does not actually guarantee that the store has made it
to the device, as (at least on PCIe semantics) it can be posted,
but the read from the same address does guarantee that the write
is completed first, and this may be required to ensure that it does
not complete after the mutex_unlock().

> - The input argument "src" is struct "struct qm_mailbox".
> - Before call this funcion, the data has been byteswapped.
>
> mailbox->w0 = cpu_to_le16((cmd) |
> ((op) ? 0x1 << QM_MB_OP_SHIFT : 0) |
> (0x1 << QM_MB_BUSY_SHIFT));
> mailbox->queue_num = cpu_to_le16(queue);
> mailbox->base_l = cpu_to_le32(lower_32_bits(base));
> mailbox->base_h = cpu_to_le32(upper_32_bits(base));
> mailbox->rsvd = 0;

Right, this bit does look correct.

Arnd