Re: v4.14-rc{4,7} null pointer dereference in event_sched_out()

From: Mark Rutland
Date: Fri Nov 24 2017 - 13:16:39 EST


On Fri, Nov 24, 2017 at 06:10:56PM +0000, Mark Rutland wrote:
> On Wed, Nov 15, 2017 at 06:00:20PM +0000, Will Deacon wrote:
> > On Mon, Oct 30, 2017 at 04:23:15PM +0000, Mark Rutland wrote:
> > > As a heads-up, while fuzzing arm64 v4.14-rc{4,7} with Syzkaller, I hit a
> > > KASAN splat in event_sched_out():
> > Did you get anywhere with this?
>
> I got a *bit* further, but I haven't figured out the underlying issue
> yet.

Forgot to mention, the above all applies to a vanilla v4.14 arm64
kernel; defconfig + KASAN_INLINE.

Thanks,
Mark.

>
> I minimized the reproducer down to the following:
>
> ----
> # {Threaded:true Collide:true Repeat:true Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}
>
> r2 = gettid()
> mmap(&(0x7f0000000000/0xd3f000)=nil, 0xd3f000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
> r0 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, 0xffffffffffffffff, 0x0)
> mmap(&(0x7f0000d3f000/0x1000)=nil, 0x1000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
> r1 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, r0, 0x0)
> dup3(0, 0, 0)
> perf_event_open(&(0x7f0000b13000-0x78)={0x0, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, r0, 0x0)
> ----
>
> Note: the dup3() is an expensive NOP (since oldfd == newfd), but I think
> it's triggering an interesting scheduling pattern, since thus far I
> haven't managed to trigger the bug without it.
>
> That creates a perf_cpu_clock event, adds another to that group, and
> adds a HW event to that same group. In parallel.
>
> Sometimes at the point the HW event is added, the leading SW event is in
> PERF_EVENT_STATE_INACTIVE, but the follower SW event is in
> PERF_EVENT_STATE_ACTIVE. The context both are held in is inactive, so
> the follower event's state makes no sense.
>
> I added a dump to event_sched_out() that catches this:
>
> [ 35.995144] Uh-oh:
> [ 35.995144] event ffff800039a1f880
> [ 35.995144] event->state 1
> [ 35.995144] event->cpu -1
> [ 35.995144] pmu ffff20000a3b2600 (perf_cpu_clock, AKA (null))
> [ 35.995144] leader ffff800039a1a480
> [ 35.995144] leader->state -1
> [ 35.995144] pmu ffff20000a3b2600 (perf_cpu_clock, AKA (null))
> [ 35.995144] ctx ffff80003932e180, pmu ffff20000a3b2600 (perf_cpu_clock AKA (null))
>
> I'll try to dig into this a bit more next week.
>
> I can't reproduce this with Syzkaller running in a single thread, nor
> with some multi-threaded tests I wrote in C, so I guess there's a subtle
> race I'm not managing to hit.
>
> Thanks,
> Mark.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel