Re: [PATCH] [RFT] crypto: aes-generic - turn off -ftree-pre and -ftree-sra

From: Ard Biesheuvel
Date: Thu Dec 21 2017 - 07:22:45 EST


On 21 December 2017 at 10:20, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@xxxxxxxxxx> wrote:
>> On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote:
>>> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
>>> index ca554d57d01e..35f973ba9878 100644
>>> --- a/crypto/aes_generic.c
>>> +++ b/crypto/aes_generic.c
>>> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key);
>>> f_rl(bo, bi, 3, k); \
>>> } while (0)
>>>
>>> +#if __GNUC__ >= 7
>>> +/*
>>> + * Newer compilers try to optimize integer arithmetic more aggressively,
>>> + * which generally improves code quality a lot, but in this specific case
>>> + * ends up hurting more than it helps, in some configurations drastically
>>> + * so. This turns off two optimization steps that have been shown to
>>> + * lead to rather badly optimized code with gcc-7.
>>> + *
>>> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
>>> + */
>>> +#pragma GCC optimize("-fno-tree-pre")
>>> +#pragma GCC optimize("-fno-tree-sra")
>>
>> So do it only when UBSAN is enabled? GCC doesn't have a particular
>> predefined macro for those (only for asan and tsan), but either the kernel
>> does have something already, or could have something added in the
>> corresponding Makefile.
>
> My original interpretation of the resulting object code suggested that disabling
> those two optimizations produced better results for this particular
> file even without
> UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have
> been better, but I did some measurements now as Ard suggested, showing
> cycles/byte for AES256/CBC with 8KB blocks:
>
>
> default ubsan patched patched+ubsan
> gcc-4.3.6 14.9 ---- 14.9 ----
> gcc-4.6.4 15.0 ---- 15.8 ----
> gcc-4.9.4 15.5 20.7 15.9 20.9
> gcc-5.5.0 15.6 47.3 86.4 48.8
> gcc-6.3.1 14.6 49.4 94.3 50.9
> gcc-7.1.1 13.5 54.6 15.2 52.0
> gcc-7.2.1 16.8 124.7 92.0 52.2
> gcc-8.0.0 15.0 no boot 15.3 no boot
>
> I checked that there are actually three significant digits on the measurements,
> detailed output is available at https://pastebin.com/eFsWYjQp
>
> It seems that I was wrong about the interpretation that disabling
> the optimization would be a win on gcc-7 and higher, it almost
> always makes things worse even with UBSAN. Making that
> check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)"
> would help here, or we could list the file as an exception for
> UBSAN and never sanitize it.
>
> Looking at the 'default' column, I wonder if anyone would be interested
> in looking at why the throughput regressed with gcc-7.2 and gcc-8.
>

Thanks for the elaborate benchmarks. Looking at the bugzilla entry, it
appears the UBSAN code inserts runtime checks to ensure that certain
u8 variables don't assume values <0 or >255, which seems like a rather
pointless exercise to me. But even if it didn't, I think it is
justified to disable UBSAN on all of the low-level cipher
implementations, given that they are hardly ever modified, and
typically don't suffer from the issues UBSAN tries to identify.

So my vote is to disable UBSAN for all such cipher implementations:
aes_generic, but also aes_ti, which has a similar 256 byte lookup
table [although it does not seem to be affected by the same issue as
aes_generic], and possibly others as well.

Perhaps it makes sense to move core cipher code into a separate
sub-directory, and disable UBSAN at the directory level?

It would involve the following files

crypto/aes_generic.c
crypto/aes_ti.c
crypto/anubis.c
crypto/arc4.c
crypto/blowfish_generic.c
crypto/camellia_generic.c
crypto/cast5_generic.c
crypto/cast6_generic.c
crypto/des_generic.c
crypto/fcrypt.c
crypto/khazad.c
crypto/seed.c
crypto/serpent_generic.c
crypto/tea.c
crypto/twofish_generic.c