Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

From: tonywwang-oc
Date: Mon Dec 21 2020 - 22:02:32 EST


On December 22, 2020 3:27:33 AM GMT+08:00, hpa@xxxxxxxxx wrote:
>On December 20, 2020 6:46:25 PM PST, tonywwang-oc@xxxxxxxxxxx wrote:
>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>><ebiggers@xxxxxxxxxx> wrote:
>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>>
>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>> >>>> The driver crc32c-intel match CPUs supporting
>>>X86_FEATURE_XMM4_2.
>>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>When
>>>> >>>> crc32c-intel and crc32c-generic are both registered, system
>>will
>>>> >>>> use crc32c-intel because its .cra_priority is greater than
>>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>for
>>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>Zhaoxin
>>>> >>>> CPUs support from crc32c-intel.
>>>> >>>>
>>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@xxxxxxxxxxx>
>>>> >>>
>>>> >>> Does this mean that the performance of the crc32c instruction
>on
>>>those CPUs is
>>>> >>> actually slower than a regular C implementation? That's very
>>>weird.
>>>> >>>
>>>> >>
>>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>>think yes.
>>>> >>
>>>> >
>>>> > Did you try measuring the performance of the hashing itself, and
>>>not some
>>>> > higher-level filesystem operations?
>>>> >
>>>>
>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>>same
>>>> input value the generic C implementation takes fewer time than the
>>>> crc32c instruction implementation.
>>>>
>>>
>>>And that is really "working as intended"?
>>
>>These CPU's crc32c instruction is not working as intended.
>>
>> Why do these CPUs even
>>>declare that
>>>they support the crc32c instruction, when it is so slow?
>>>
>>
>>The presence of crc32c and some other instructions supports are
>>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok
>>except the crc32c instruction.
>>
>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>>etc.) that
>>>these CPUs similarly declare support for but they are uselessly slow?
>>
>>No.
>>
>>Sincerely
>>Tonyw
>>
>>>
>>>- Eric
>
>Then the right thing to do is to disable the CPUID bit in the
>vendor-specific startup code.

This way makes these CPUs do not support all instruction sets enumerated
by CPUID.01:ECX[SSE4.2].
While only crc32c instruction is slow, just expect the crc32c-intel driver do not
match these CPUs.

Sincerely
Tonyw