William Lee Irwin III wrote:
There isn't anything left to explain. So if there's a question, be
specific about it.
On Sat, Nov 20, 2004 at 04:50:25PM +1100, Nick Piggin wrote:
Why am I very very wrong? Why won't touch_nmi_watchdog work from
the read loop?
And let's just be nice and try not to jump at the chance to point
out when people are very very wrong, and keep count of the times
they have been very very wrong. I'm trying to be constructive.
touch_nmi_watchdog() is only "protection" against local interrupt
disablement triggering the NMI oopser because alert_counter[]
increments are not atomic. Yet even supposing they were made so, the
net effect of "covering up" this gross deficiency is making the
user-observable problems it causes undiagnosable, as noted before.
William Lee Irwin III wrote:
This entire line of argument is bogus. A preexisting bug of a similar
nature is not grounds for deliberately introducing any bug.
On Sat, Nov 20, 2004 at 04:50:25PM +1100, Nick Piggin wrote:
Sure, if that is a bug and someone is just about to fix it then
yes you're right, we shouldn't introduce this. I didn't realise
it was a bug. Sounds like it would be causing you lots of problems
though - have you looked at how to fix it?
Kevin Marin was the first to report this issue to lkml. I had seen
instances of it in internal corporate bugreports and it was one of
the motivators for the work I did on pidhashing (one of the causes
of the timeouts was worst cases in pid allocation). Manfred Spraul
and myself wrote patches attempting to reduce read-side hold time
in /proc/ algorithms, Ingo Molnar wrote patches to hierarchically
subdivide the /proc/ iterations, and Dipankar Sarma and Maneesh
Soni wrote patches to carry out the long iterations in /proc/ locklessly.
The last several of these affecting /proc/ have not gained acceptance,
though the work has not been halted in any sense, as this problem
recurs quite regularly. A considerable amount of sustained effort has
gone toward mitigating and resolving rwlock starvation.
Aggravating the rwlock starvation destabilizes, not pessimizes,
and performance is secondary to stability.