Re: [cgroup_rmdir] BUG: unable to handle kernel paging request at ffff880210af6000

From: Fengguang Wu
Date: Tue Nov 07 2017 - 11:08:17 EST


On Tue, Nov 07, 2017 at 07:46:46AM -0800, Linus Torvalds wrote:
On Tue, Nov 7, 2017 at 2:26 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:

FYI this happens in v4.14-rc8 -- it's not necessarily a new bug.

.. in fact I don't think it's a bug at all. Not in the kernel, that is.

[ 186.238181] BUG: unable to handle kernel paging request at ffff880210af6000
[ 186.257107] IP: slob_free+0x1c4/0x276

This looks like the same bug we saw earlier, which is due to a gcc bug.

The trapping code disassembles to:

0: 8b 45 00 mov 0x0(%rbp),%eax
3: 41 be 01 00 00 00 mov $0x1,%r14d
9: 48 89 ef mov %rbp,%rdi
c: 66 85 c0 test %ax,%ax

and the thing to note is that: "test %ax,%ax".

It's testing a 16-bit value, but it *loads* a 32-bit one.

It is supposed to load a 16-bit value from the last two bytes of the page:

RBP: ffff880210af5ffe

but because it has turned the 16-bit load into a 32-bit load, it
faults when accessing the next page.

That's too bad!

It's hard to trigger, since you need to have the next page unmapped
due to DEBUG_PAGEALLOC and have just the right allocations etc to make
this happen, but clearly the 0day has gotten pretty good at triggering
it.

0day hits 1 single occurrence by chance out of thousands of boots.
Such random noises have been troublesome for 0day maintenance.

It's good to know the caveats of old gcc -- now we can get rid of some
of our daily annoyance. :)

Anyway, for now, I'd suggest 0day either:

- upgrade the compiler (this is known to happen with 4.8 and 4.9 but
apparently not 5.1)

We cover gcc 4.4 all the way up to 6. (Yet to add gcc-7 coverage.)
The old gcc's are kept mainly for test coverage.

So would you suggest to stop testing gcc 4.x? Or do so selectively
for the known broken combinations?

- not use SLOB in the kernel configurations it tests

eg. disable SLOB for old gcc, or disable SLOB unconditionally?

Honestly, I'd prefer the former, because apparently you use some
ancient debian gcc version 4.8.4, and gcc these days is on 7.2.

Apparently the ancient gcc version is causing problems with KASAN too.

Yeah, I just happily disabled KASAN when compiled with gcc < 4.9.

Anyway, I will be ignoring the slob_free() reports for now, and you
should too until the gcc version is fixed.

OK. Sorry for the noises and glad to get out of them!

Regards,
Fengguang