Re: INFO: rcu detected stall in do_idle

From: Dmitry Vyukov
Date: Sat Oct 27 2018 - 07:17:18 EST


On Wed, Oct 24, 2018 at 1:03 PM, Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:
>
> On 19/10/18 22:50, luca abeni wrote:
> > On Fri, 19 Oct 2018 13:39:42 +0200
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > > On Thu, Oct 18, 2018 at 01:08:11PM +0200, luca abeni wrote:
> > > > Ok, I see the issue now: the problem is that the "while
> > > > (dl_se->runtime <= 0)" loop is executed at replenishment time, but
> > > > the deadline should be postponed at enforcement time.
> > > >
> > > > I mean: in update_curr_dl() we do:
> > > > dl_se->runtime -= scaled_delta_exec;
> > > > if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > > > ...
> > > > enqueue replenishment timer at dl_next_period(dl_se)
> > > > But dl_next_period() is based on a "wrong" deadline!
> > > >
> > > >
> > > > I think that inserting a
> > > > while (dl_se->runtime <= -pi_se->dl_runtime) {
> > > > dl_se->deadline += pi_se->dl_period;
> > > > dl_se->runtime += pi_se->dl_runtime;
> > > > }
> > > > immediately after "dl_se->runtime -= scaled_delta_exec;" would fix
> > > > the problem, no?
> > >
> > > That certainly makes sense to me.
> >
> > Good; I'll try to work on this idea in the weekend.
>
> So, we (me and Luca) managed to spend some more time on this and found a
> few more things worth sharing. I'll try to summarize what we have got so
> far (including what already discussed) because impression is that each
> point might deserve a fix or at least consideration (just amazing how a
> simple random fuzzer thing can highlight all that :).

1. Fuzzing finds bugs in any code. Always.
If a code wasn't fuzzed, there are bugs.

2. This fuzzer is not so simple ;)


> Apologies for the
> long email.
>
> Reproducer runs on a CONFIG_HZ=100, CONFIG_IRQ_TIME_ACCOUNTING kernel
> and does something like this (only the bits that seems to matter here)
>
> int main(void)
> {
> [...]
> [setup stuff at 0x2001d000]
> syscall(__NR_perf_event_open, 0x2001d000, 0, -1, -1, 0);
> *(uint32_t*)0x20000000 = 0;
> *(uint32_t*)0x20000004 = 6;
> *(uint64_t*)0x20000008 = 0;
> *(uint32_t*)0x20000010 = 0;
> *(uint32_t*)0x20000014 = 0;
> *(uint64_t*)0x20000018 = 0x9917; <-- ~40us
> *(uint64_t*)0x20000020 = 0xffff; <-- ~65us (~60% bandwidth)
> *(uint64_t*)0x20000028 = 0;
> syscall(__NR_sched_setattr, 0, 0x20000000, 0);
> [busy loop]
> return 0;
> }
>
> And this causes problems because the task is actually never throttled.
>
> Pain points:
>
> 1. Granularity of enforcement (at each tick) is huge compared with
> the task runtime. This makes starting the replenishment timer,
> when runtime is depleted, always to fail (because old deadline
> is way in the past). So, the task is fully replenished and put
> back to run.
>
> - Luca's proposal should help here, since the deadline is postponed
> at throttling time, and replenishment timer set to that (and it
> should be in the future)
>
> 1.1 Even if we fix 1. in a configuration like this, the task would
> still be able to run for ~10ms (worst case) and potentially starve
> other tasks. It doesn't seem a too big interval maybe, but there
> might be other very short activities that might miss an occasion
> to run "quickly".
>
> - Might be fixed by imposing (via sysctl) reasonable defaults for
> minimum runtime (w.r.t. HZ, like HZ/2) and maximum for period
> (as also a very small bandwidth task can have a big runtime if
> period is big as well)
>
> (1.2) When runtime becomes very negative (because delta_exec was big)
> we seem to spend lot of time inside the replenishment loop.
>
> - Not sure it's such a big problem, might need more profiling.
> Feeling is that once the other points will be addressed this
> won't matter anymore
>
> 2. This is related to perf_event_open syscall reproducer does before
> becoming DEADLINE and entering the busy loop. Enabling of perf
> swevents generates lot of hrtimers load that happens in the
> reproducer task context. Now, DEADLINE uses rq_clock() for setting
> deadlines, but rq_clock_task() for doing runtime enforcement.
> In a situation like this it seems that the amount of irq pressure
> becomes pretty big (I'm seeing this on kvm, real hw should maybe do
> better, pain point remains I guess), so rq_clock() and
> rq_clock_task() might become more a more skewed w.r.t. each other.
> Since rq_clock() is only used when setting absolute deadlines for
> the first time (or when resetting them in certain cases), after a
> bit the replenishment code will start to see postponed deadlines
> always in the past w.r.t. rq_clock(). And this brings us back to the
> fact that the task is never stopped, since it can't keep up with
> rq_clock().
>
> - Not sure yet how we want to address this [1]. We could use
> rq_clock() everywhere, but tasks might be penalized by irq
> pressure (theoretically this would mandate that irqs are
> explicitly accounted for I guess). I tried to use the skew between
> the two clocks to "fix" deadlines, but that puts us at risks of
> de-synchronizing userspace and kernel views of deadlines.
>
> 3. HRTICK is not started for new entities.
>
> - Already got a patch for it.
>
> This should be it, I hope. Luca (thanks a lot for your help) and please
> add or correct me if I was wrong.
>
> Thoughts?
>
> Best,
>
> - Juri
>
> 1 - https://elixir.bootlin.com/linux/latest/source/kernel/sched/deadline.c#L1162
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@xxxxxxxxxxxxxxxxx
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20181024120335.GE29272%40localhost.localdomain.
> For more options, visit https://groups.google.com/d/optout.