objtool warnings in prerelease clang-9

From: Nick Desaulniers
Date: Tue Jul 02 2019 - 20:56:40 EST


TL;DR
LLVM currently has a bug when unrolling loops containing asm goto and
we have a fix in hand.

Following up from our thread last week (https://lkml.org/lkml/2019/6/29/14).

With an x86 defconfig (6fbc7275c7a9) +clang 9 (e1006259d84d), I see
the following 4 warnings from objtool:
arch/x86/events/intel/core.o: warning: objtool:
intel_pmu_nhm_workaround()+0xa8: unreachable instruction
arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool:
get_fixed_ranges()+0x9b: unreachable instruction
arch/x86/kernel/platform-quirks.o: warning: objtool:
x86_early_init_platform_quirks()+0x84: unreachable instruction
drivers/gpu/drm/i915/i915_gem_execbuffer.o: warning: objtool:
.altinstr_replacement+0x88: redundant UACCESS disable

Peter mentioned seeing 6 cases:
https://lkml.org/lkml/2019/6/27/118

We have 21 open reports (mostly unverified, I think someone was
testing -O3 -march=native):
https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aissue+is%3Aopen+label%3Aobjtool

Taking a look specifically at arch/x86/kernel/cpu/mtrr/generic.o (as
tglx and Peter did in https://lkml.org/lkml/2019/6/27/118:

Disassembly and jump table:
https://gist.github.com/nickdesaulniers/395129e75d9f7afe028db832145594d8
More concise snippet:
https://gist.github.com/nickdesaulniers/a3159782908cb6e50e26ea9563b14c20

Of interest are the disassembled __jump_table entries; in groups of
three, there is a group for which the second element is duplicated
with a previous group. This is bad because (as explained by Peter in
https://lkml.org/lkml/2019/6/27/118) the triples are in the form (code
location, jump target, pointer to key). Duplicate or repeated jump
targets are unexpected, and will lead to incorrect control flow after
patching such code locations. Also, the jump target should be 0x7
bytes ahead of the location, IIUC.

I've annotated the concise snippet:
https://gist.github.com/nickdesaulniers/c0cb8fc67720a5abbadf42fde5440a8e
we can see that two groups in the __jump_table share a similar jump
target, unexpectedly.

Now that we can identify what's bad (duplicate __jump_table entries),
we can try to creduce the input.
(https://nickdesaulniers.github.io/blog/2019/01/18/finding-compiler-bugs-with-c-reduce/)

The preprocessed input is 37497 lines long. We want to reduce it
down, but I haven't been able to write a correct "interestingness
test" for creduce where I was then able to make sense of the output.
Instead, I'm looking into program slicing:
https://en.wikipedia.org/wiki/Program_slicing. We'd want to do this
for C code, and it should be possible to implement with libclang, but
since it doesn't exist yet AFAICT and llvm-extract does, we can do the
program slicing on the LLVM IR level. This will also give us a test
case to add to the compiler's test suite.

$ clang <same c flags kernel uses via `make V=`> \
-emit-llvm -S -Xclang -disable-llvm- passes -o get_fixed_ranges generic.ll
$ llvm-extract -func get_fixed_ranges generic.ll -o
generic.get_fixed_ranges.ll -S --recursive

is a more manageable 541 lines of LLVM IR.
https://gist.github.com/nickdesaulniers/73aef95e9735a50765f654ebdcbb99dd

`callbr` is the LLVM IR instruction representing `asm goto`.
https://llvm.org/docs/LangRef.html#callbr-instruction
The gist of the instruction is:
callbr ... <list of block addresses from the label list> to
<fallthrough basic block [<list of block addresses from the label
list>]. IIUC, the repeated list of addresses better match.

We can then observe the middle end work its optimizations:
$ opt -O2 -print-before-all -print-after-all
generic.get_fixed_ranges.ll -S -o - 2>&1 | less
(27199 lines of output, omitted)
Here's the relevant snippet of the 27199 lines:
https://gist.github.com/nickdesaulniers/089bf56165675065bca830cd7da767d8

grepping through the output and looking for the first pass which
contains non-matching list of block addresses in the label list, we
come upon:

```
callbr void asm sideeffect "1:.byte
0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table, \22aw\22 \0A\09
.balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c}
- .\0A\09.popsection \0A\09",
"i,i,X,~{dirflag},~{fpsr},~{flags}"(%struct.static_key* getelementptr
inbounds (%struct.tracepoint, %struct.tracepoint*
@__tracepoint_read_msr, i64 0, i32 1), i1 false, i8*
blockaddress(@get_fixed_ranges, %20)) #6
to label %native_read_msr.exit31.1 [label %49], !srcloc !4
```

(particularly I would have expected the `%20` and `%49` to match.)

which occurs before "MergedLoadStoreMotion," but due to what looks
like a bug in the pass printer
(https://lists.llvm.org/pipermail/cfe-dev/2019-July/062795.html)
doesn't print the results of loop unrolling, which runs just before
"MergedLoadStoreMotion."

I see that `get_fixed_ranges` is a loop that gets unrolled to have a
trip count of two.
$ opt -O2 -debug-only=loop-unroll generic.get_fixed_ranges.ll -S -o /dev/null
...
Loop Unroll: F[get_fixed_ranges] Loop %
Loop Size = 27
Trip Count = 2
Trip Multiple = 2
COMPLETELY UNROLLING loop % with trip count 2!

If I manually add some print statements to loop unrolling, I can see
that before we unroll the loop, things look good. After unrolling the
loop, things look bad. Indeed, if I kneecap loop unrolling the
problem goes away:

$ opt -O2 -debug-only=loop-unroll -unroll-full-max-count=0
generic.get_fixed_ranges.ll -S -o -

So this is a bug in our loop unroller when unrolling IR
representations of asm goto. Peter's guess that this was a bug in
inlining was very close! Function inlining and loop unrolling are
closely related and probably share a lot of machinery related to
cloning of basic blocks and instructions. In this case, if we have a
loop with asm goto and we unroll the loop, the referenced labels from
the asm goto statement need to reflect the newly duplicated labels.
In this case, the new duplicated asm goto is referencing the initial
asm goto's label, which would result in non-sensical control flow at
runtime.

In the loop unroller, I can see that we skip loop unrolling in the
event of "indirectbr" instructions.
https://github.com/llvm/llvm-project/blob/97316fff5d6a7c3d2c072301a118ac0cb5094ad8/llvm/lib/Transforms/Utils/LoopUnroll.cpp#L294
https://github.com/llvm/llvm-project/blob/97316fff5d6a7c3d2c072301a118ac0cb5094ad8/llvm/lib/Analysis/LoopInfo.cpp#L432

Conservatively, we can block loop unrolling when we see asm goto in a loop:
diff --git a/llvm/lib/Analysis/LoopInfo.cpp b/llvm/lib/Analysis/LoopInfo.cpp
index 00dbe30c2b3d..f800c890d1db 100644
--- a/llvm/lib/Analysis/LoopInfo.cpp
+++ b/llvm/lib/Analysis/LoopInfo.cpp
@@ -433,7 +433,8 @@ bool Loop::isSafeToClone() const {
// Return false if any loop blocks contain indirectbrs, or there
are any calls
// to noduplicate functions.
for (BasicBlock *BB : this->blocks()) {
- if (isa<IndirectBrInst>(BB->getTerminator()))
+ if (isa<IndirectBrInst>(BB->getTerminator()) ||
+ isa<CallBrInst>(BB->getTerminator()))
return false;

for (Instruction &I : *BB)

This causes objtool to not find any issues in
arch/x86/kernel/cpu/mtrr/generic.o. I don't observe any duplication
in the __jump_table section of the resulting .o file. It also cuts
down the objtool warnings I observe in a defconfig (listed at the
beginning of the email) from 4 to 2. (platform-quirks.o predates asm
goto, i915_gem_execbuffer.o is likely a separate bug).

Coincidentally, Alexander and Bill just found what I think is the same bug:
https://bugs.llvm.org/show_bug.cgi?id=42489

So for now we'll be conservative/correct and disable unrolling of
loops that contain asm goto; then we can use the same test case to
permit unrolling with the correct code transformations added.
--
Thanks,
~Nick Desaulniers