Re: [Fwd: Re: [patch] Real-Time Preemption, -RT-2.6.9-mm1-V0.4]

From: Lee Revell
Date: Mon Nov 01 2004 - 13:00:40 EST


On Mon, 2004-11-01 at 16:47 +0100, Thomas Gleixner wrote:
> The latencies are still there. I have the feeling it's worse than 0.6.2.
>
> It's definitely irq-off. I have a card with a controller, which produces
> an 2ms interrupt. The controller busy loops until the second level ack
> is done. The time is measured from raising the irq to the 2nd level ack.

This was my conclusion as well. I have a patch sitting around to add
this to the emu10k1 ALSA driver, it's quite useful. It would be nice if
there were a facility in the kernel to easily identify missed interrupts
like this or (even better) unbalanced irq disable/enable - AFAICT
userspace alone cannot reliably distinguish lost interrupts from
scheduling problems (though you can get a lot of hints). Paul mentioned
trying to debug the unbalanced irq disable in his talk at ZKM 2003, and
said it's hard because the hardware will enable/disable interrupts on
its own and he could not identify all those places. Ingo, is there an
easy way to trace this like we do for unbalanced preempt count?

I think there might even be a _very_ rare errant irq disable in T3 even.
On my 2-day test runs with 100s of millions of samples, I got 2 or 3
outliers in each graph. Jow Gwinn from LKML (thanks Joe) ran a
statistical analysis that showed these to be independent from the 4 or 5
underlying exponential distributions (each corresponding to a known or
suspected nonpreemptible section). Our theory was that these were very
rare code paths or race conditions that left IRQs off. Unfortunately
this seems impossible to debug unless you have a way to make the machine
crash dump immediately when it detects the situation.

Anyway, the clearest way to demonstrate the problem with the -V series
here is to run the version of Florian's tool that tells you how many
interrupts were missed. If I spin my (USB, not sharing irq with
soundcard) trackball as fast as I can I can get it up to 54 or 55 in a
row. If I move it just the right way I can see the humps appear - 2,
then 15, then 50 then 15 then 2 missed interrupts in a row. This is at
1024Hz - at 2048 I can get it to miss several hundred IRQs, but this
inevitably locks the machine.

I suspect the lockups and the latencies are same bug.

Lee



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/