Re: [llvmlinux] percpu | bitmap issue? (Cannot boot on bare metal due to a kernel NULL pointer dereference)

From: Austin S Hemmelgarn
Date: Mon Sep 14 2015 - 13:50:23 EST


On 2015-09-14 03:49, Sedat Dilek wrote:
On Mon, Sep 14, 2015 at 9:12 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
On Sun, Sep 13, 2015 at 04:33:39AM +0200, Sedat Dilek wrote:
It looks like an inline-optimization bug in CLANG when the compiler's
optimization-level is higher than -O2.

[1] http://lists.linuxfoundation.org/pipermail/llvmlinux/2015-September/001355.html

After some discussion on #llvm it turned out to be a known issue in LLVMLinux!

Unfortunately, an existing patch [1] got archived which is still
required to build x86_64 correctly.

[1] http://git.linuxfoundation.org/?p=llvmlinux.git;a=blob_plain;f=arch/x86_64/patches/ARCHIVE/0029-Fix-ARCH_HWEIGHT-for-compilation-with-clang.patch;hb=HEAD

As long as LLVM cannot do things like that and requires full function
calls I cannot see it being a sensible compiler to use from a
performance POV.

There's a fairly large difference between an inline POPCNT instruction
and a full out-of-line function call.

/me goes back to ignoring LLVM for the time being.

Can you give an example or describe a test-case to check the performance?

I have here diverse Linux v4.2 kernels (all have the same kernel-config)...

[ llvmlinux-patched ]

#1: Compiled with CLANG v3.7 from a self-built llvm-toolchain v3.7.0
#2: Compiled with GCC v4.9

[ unpatched ]

#3: Compiled with GCC v4.9

Can you also comment on the effects of CONFIG_CC_OPTIMIZE_FOR_SIZE in
case of performance?
It is only to reduce binary size or does it also do some "speed" optimization?

I can comment at least a little about the -Os aspect (although not I'm no expert on this in particular). In general, for _most_ use cases, a kernel compiled with CONFIG_CC_OPTIMIZE_FOR_SIZE will run slower than one compiled without it. On rare occasion though, it may actually run faster, the only cases I've seen where this happens are specialized uses that are very memory pressure dependent and run almost entirely in userspace with almost no syscalls (for example math related stuff operating on _very, very big_ (as in, >1 trillion elements) multidimensional matrices, with complex memory constraints), and even then it's usually a miniscule improvement in performance (generally less than 1%, which can of course be significant depending on how long it takes before the improvement).

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature