Re: RFC: futex_wait() can DoS the tick

From: Thomas Gleixner
Date: Wed Jun 10 2015 - 11:13:12 EST


On Wed, 10 Jun 2015, Mike Galbraith wrote:
> The above was handed to me by a colleague working on a Xen guest that
> livelocked. I at first though Xen arch must have a weird problem, but
> when I tried proggy on my desktop box, while it didn't stop the tick
> completely as it did the Xen box, it slowed it to a crawl. I noticed
> that this did not happen with newer kernels, so a bisecting I did go,
> and found that...
>
> 279f14614 x86: apic: Use tsc deadline for oneshot when available
>
> ..is what fixed it up. Trouble is, while it fixes up my Haswell box, a

This does not make any sense at all. It does not matter whether the
box uses tscdeadline or local apic timer. We do not even program the
hardware because we see that the event is in the past already.

So we raise the hrtimer softirqd, which then expires the timer. So all
what happens is that ksoftirqd accumulates runtime, but I cannot at
all see how that amounts to a DoS and brings the machine to a grinding
halt.

> Xen dom0 remains busted by that testcase whether that patch is applied
> to the host or not, even though the hypervisor supports deadline timer,
> and seemingly regardless of CPU type all together.
>
> Of all the x86_64 bare metal boxen I've tested, only those with the TSC
> deadline timer have shown the issue, and there it goes away as of v3.8
> unless you boot lapic=notscdeadline.

I just booted a SNB with lapic=notscdeadline and ran that test
program. All what happens is - as expected - that ksoftirqd runs more
than we would like it to. I cannot observe any anomality vs. local
timer interrupts at all. If I run this pinned on an otherwise idle
core, then I get ~ CONFIG_HZ interrupts per second, which is what you
expect when the cpu never reaches idle.

With the changes pending in tip/timers/core we get more timer
interrupts instead of offloading crap to ksoftirqd, but they cannot
lead to a DoS either and we do not care whether the user spends its
cycles looping in user space or firing timer interrupts. It can only
do as long as it is on the cpu.

These timers (futex, nanosleep, poll, ...) are oneshot and all timers
which are self rearming are rate limited by the fact that we only
rearm when the previous event has been consumed by the task which
scheduled it. So the scheduler controls how many of these events can
be created from user space.

> However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
> hasbeen) can be made to exhibit the symptom, there may be other arches
> that get seriously dinged up or maybe even as thoroughly b0rked as Xen
> does when hrtimer_interrupt() is pounded into the ground by userspace.
>
> Alternatively, should someone out there know that all bare metal is in
> fact fine post 279f14614, that person will likely also know what the Xen
> folks need to do to fix up their busted arch.
>
> The below targets the symptom, consider it hrtimer cluebat attractant.

By now I know to take your patches with a grain of salt :)

Some more information about your symptoms in form of configuration,
extra patches, kernel traces etc. would be appreciated.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/