Re: [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()"

From: zhuyj
Date: Mon Jan 18 2016 - 02:59:57 EST


Hi, all

I made tests for this patch. To now, I can not find any similar problem.

Best Regards!
Zhu Yanjun

On 01/14/2016 09:29 AM, Huang Shijie wrote:
On Wed, Jan 13, 2016 at 02:07:25PM +0100, Thomas Gleixner wrote:
On Wed, 13 Jan 2016, zyjzyj2000@xxxxxxxxx wrote:

After this commit 71f64340fc0e ("genirq: Remove the second parameter
from handle_irq_event_percpu()") is applied, the variable action is
not protected by raw_spin_lock. The following calltrace will pop up.
Thanks, for the report. I missed that detail when merging the patch!

Just for correctness sake: You miss to explain why this can happen.

It's not about the variable action, it's about desc->action not being
protected anymore. So the reason why this oopses is that the action is being
removed concurrently.

CPU 0 CPU 1

free_irq() lock(desc)
lock(desc) handle_edge_irq()
handle_irq_event(desc)
unlock(desc)
desc->action = NULL handle_irq_event_percpu(desc)
action = desc->action

While the original code did:

free_irq() lock(desc)
lock(desc) handle_edge_irq()
handle_irq_event()
action = desc->action
unlock(desc)
desc->action = NULL handle_irq_event_percpu(desc, action)

So now the question is whether we revert that patch or simply change
handle_irq_event_percpu() to deal with that. Patch below.

That preserves us the code size reduction of commit 71f64340fc0e. This is safe
because we either see a valid desc->action or NULL. If the action is about to
be removed it is still valid as free_irq() is blocked on synchronize_irq().

free_irq() lock(desc)
lock(desc) handle_edge_irq()
handle_irq_event(desc)
set(INPROGRESS)
unlock(desc)
handle_irq_event_percpu(desc)
action = desc->action
desc->action = NULL
sychronize_irq()
while(INPROGRESS); lock(desc)
clr(INPROGRESS)
free(action)

That's basically the same mechanism as we have for shared
interrupts. action->next can become NULL while handle_irq_event_percpu()
runs. Either it sees the action or NULL. It does not matter, because action
itself cannot go away.

Thanks,

tglx

8<-------------

--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -136,9 +136,15 @@ irqreturn_t handle_irq_event_percpu(stru
{
irqreturn_t retval = IRQ_NONE;
unsigned int flags = 0, irq = desc->irq_data.irq;
- struct irqaction *action = desc->action;
+ struct irqaction *action;
- do {
+ /*
+ * READ_ONCE is not required here. The compiler cannot reload action
+ * because it'll be action->next for the second iteration of the loop.
+ */
+ action = desc->action;
+
+ while (action) {
irqreturn_t res;
trace_irq_handler_entry(irq, action);
@@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(stru
retval |= res;
action = action->next;
- } while (action);
+ }
add_interrupt_randomness(irq, flags);
I prefer to this patch, revert the old the patch is not a good solution.

thanks
Huang Shijie