[V2 PATCH 1/1] genirq: fix desc->action become NULL error

From: zyjzyj2000
Date: Thu Jan 21 2016 - 02:52:13 EST


From: Zhu Yanjun <zyjzyj2000@xxxxxxxxx>

After this commit 71f64340fc0e ("genirq: Remove the second parameter
from handle_irq_event_percpu()") is applied, the variable desc->action is
not protected by raw_spin_lock. The following calltrace will pop up.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810a4991>] handle_irq_event_percpu+0x31/0x1c0
...
Call Trace:
<IRQ>
[<ffffffff810a4b5c>] handle_irq_event+0x3c/0x60
[<ffffffff810a7c9f>] handle_edge_irq+0xcf/0x160
[<ffffffff810067ba>] handle_irq+0x1a/0x30
[<ffffffff819b0d37>] do_IRQ+0x57/0xf0
[<ffffffff819af1ff>] common_interrupt+0x7f/0x7f
<EOI>
[<ffffffff819ae192>] ? _raw_write_unlock_irq+0x12/0x30
[<ffffffff819ae1be>] _raw_spin_unlock_irq+0xe/0x10
[<ffffffff8107703a>] finish_task_switch+0x9a/0x1f0
[<ffffffff819aa375>] __schedule+0x3c5/0xb60
[<ffffffff819aac8f>] schedule+0x3f/0x90
[<ffffffff819aaf18>] schedule_preempt_disabled+0x18/0x30
[<ffffffff8108f2ec>] cpu_startup_entry+0x13c/0x320
[<ffffffff810379b1>] start_secondary+0xf1/0x100
RIP [<ffffffff810a4991>] handle_irq_event_percpu+0x31/0x1c0
...
The reason is as below:

The variable desc->action is not protected anymore. So desc->action is
removed concurrently.

CPU 0 CPU 1

free_irq() lock(desc)
lock(desc) handle_edge_irq()
handle_irq_event(desc)
unlock(desc)
desc->action = NULL handle_irq_event_percpu(desc)
action = desc->action

Because we either see a valid desc->action or NULL. If the action is about to
be removed it is still valid as free_irq() is blocked on synchronize_irq().

free_irq() lock(desc)
lock(desc) handle_edge_irq()
handle_irq_event(desc)
set(INPROGRESS)
unlock(desc)
handle_irq_event_percpu(desc)
action = desc->action
desc->action = NULL
sychronize_irq()
while(INPROGRESS); lock(desc)
clr(INPROGRESS)
free(action)

That's basically the same mechanism as we have for shared
interrupts. The variable action->next can become NULL while
handle_irq_event_percpu() runs. Either it sees the action or
NULL. It does not matter, because action itself cannot go away.

Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Zhu Yanjun <zyjzyj2000@xxxxxxxxx>
---
kernel/irq/handle.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a302cf9..7510b72 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -136,9 +136,14 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
{
irqreturn_t retval = IRQ_NONE;
unsigned int flags = 0, irq = desc->irq_data.irq;
- struct irqaction *action = desc->action;
+ struct irqaction *action;

- do {
+ /*
+ * READ_ONCE is not required here. The compiler cannot reload action
+ * because it'll be action->next for the second iteration of the loop.
+ */
+ action = desc->action;
+ while (action) {
irqreturn_t res;

trace_irq_handler_entry(irq, action);
@@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)

retval |= res;
action = action->next;
- } while (action);
+ }

add_interrupt_randomness(irq, flags);

--
1.7.9.5