Re: RT and Cascade interrupts

From: john cooper
Date: Sat May 28 2005 - 09:05:51 EST


Oleg Nesterov wrote:

[race scenario deleted]

This was not the race scenario of concern.

This is totally pointless to do:

if (timer_pending())
del_singleshot_timer_sync();

However this is the action the RPC code was attempting
to take originally based upon stale/incorrect data.
Again the failure mode was of the RPC_TASK_HAS_TIMER
bit not being set when the embedded timer actually was
still present in the timer cascade structure (eg:
timer->base != 0 or timer_pending()). This caused the
rpc_task to be recycled with the embedded timer struct
still linked in the timer cascade.

If you need to ensure that timer's handler is not running on any
cpu then timer_pending() can't help. If you don't need this, you
should use plain del_timer().

That's not the goal of the timer_pending() usage here.
Rather we're at a point in rpc_release_task where we
want to tear down an rpc_task. The call to timer_pending()
determines if the embedded timer is still linked in the
timer cascade structure.

The embedded timer may be running on another CPU. We
may even check timer->base during this race when it is
non-NULL and wind up deleting a timer which the path
running on another CPU has already deleted. That is
allowable. We simply want to idle the timer.

I don't understand why this race is rt specific. I think
that PREEMPT_SOFTIRQS just enlarges the window.

That is certainly possible. However this code has been
around for some time and this problem apparently hasn't
otherwise surfaced. I suspect it depends on the
combination of timer cascade structures existing
per-CPU and the execution context of callback(task)
invoked in rpc_run_timer() now being preemptive (ie:
preemption is required on the same CPU for this
problem to surface).

I believe
the right fix is just to call del_singleshot_timer_sync()
unconditionally and kill RPC_TASK_HAS_TIMER bit.

Ignoring RPC_TASK_HAS_TIMER and always calling
del_singleshot_timer_sync() will work but seems
excessive.

I have added Olaf Kirch to the CC: list.

Welcome.

The fix removes the attempt to replicate timer queue
status from the RPC code by removing the usage of the
pre-calculated RPC_TASK_HAS_TIMER.


No, RPC_TASK_HAS_TIMER doesn't replicate timer queue status.
I believe, it was intended as optimization, to avoid costly
del_timer_sync() call.

If true its chosen name seems a misnomer. In any case
the embedded timer in the rpc_task structure must be
quiesced before the rpc_task is made available for
reuse. The fix under discussion here does so. Note the
goal was not to rearchitecture the RPC code but rather to
address this very specific bug which surfaced during
stress testing.

If anyone with more ownership of the RPC code would like
to comment, any insight would be most welcome.

-john


--
john.cooper@xxxxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/