Re: bisected: 4.18-rc* regression: x86-32 troubles (with timers?)

From: Daniel Borkmann
Date: Mon Jul 23 2018 - 15:41:45 EST


Hello Meelis, Arnd,

On 07/23/2018 06:03 PM, Arnd Bergmann wrote:
> On Sat, Jul 21, 2018 at 1:01 AM, Meelis Roos <mroos@xxxxxxxx> wrote:
>> Added netdev and Daniel Borkmann - please see
>> https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1724795.html
>> for the original report. It seems to be about BPF instead.
>>
>> Meanwhile I have found more machines with the trouble. Still no clear
>> mark in the config - some x86-32 machines that have
>> CONFIG_BPF=y
>> CONFIG_BPF_SYSCALL=y
>> CONFIG_BPF_JIT_ALWAYS_ON=y
>> are working fine.
>>
>>> The new bisect seems to have also led me to a strange commit. This time
>>> I tried to be careful and tested most on two reboots before classifying
>>> as good.
>>>
>>> However, f4e3ec0d573e was suspicious - it failed to autoload e1000 but
>>> had no other errors. On both boots with this kernel, modprobe e1000 and
>>> ifup -a made the system work so I assumed it was good, while it might
>>> not have been. Will try bisecting with f4e3ec0d573e marked bad.
>>
>> Now this seems more relevant:
>>
>> mroos@rx100s2:~/linux$ nice git bisect good
>> 24dea04767e6e5175f4750770281b0c17ac6a2fb is the first bad commit
>> commit 24dea04767e6e5175f4750770281b0c17ac6a2fb
>> Author: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
>> Date: Fri May 4 01:08:23 2018 +0200
>>
>> bpf, x32: remove ld_abs/ld_ind
>>
>> Since LD_ABS/LD_IND instructions are now removed from the core and
>> reimplemented through a combination of inlined BPF instructions and
>> a slow-path helper, we can get rid of the complexity from x32 JIT.
>
> This does seem much more likely than the previous bisection, given
> that you ended up in an x86-32 specific commit (the subject says x32,
> but that is a mistake). I also checked that systemd indeed does
> call into bpf in a number of places, possibly for the journald socket.
>
> OTOH, it's still hard to tell how that commit can have ended up
> corrupting the clock read function in systemd. To cross-check,
> could you try reverting that commit on the latest kernel and see
> if it still works?

I would be curious as well about that whether revert would make it
work. What's the value of sysctl net.core.bpf_jit_enable ? Does it
change anything if you set it to 0 (only interpreter) or 1 (JIT
enabled). Seems a bit strange to me that bisect ended at this commit
given the issue you have. The JIT itself was also new in this window
fwiw. In any case some more debug info would be great to have.

Thanks,
Daniel