Re: WARNING and PANIC in irq_matrix_free

From: Tariq Toukan
Date: Sun Feb 25 2018 - 04:51:09 EST




On 22/02/2018 11:38 PM, Thomas Gleixner wrote:
On Wed, 21 Feb 2018, Tariq Toukan wrote:
On 20/02/2018 8:18 PM, Thomas Gleixner wrote:
On Tue, 20 Feb 2018, Thomas Gleixner wrote:
On Tue, 20 Feb 2018, Tariq Toukan wrote:

Is there CPU hotplugging in play?

No.

Ok.


I'll come back to you tomorrow with a plan how to debug that after staring
into the code some more.

Do you have a rough idea what the test case is doing?


It arbitrary appears in different flows, like sending traffic or interface
configuration changes.

Hmm. Looks like memory corruption, but I can't pin point it.

Find below a debug patch which should prevent the crash and might give us
some insight into the type of corruption.

Please enable the irq_matrix and vector allocation trace points.

echo 1 >/sys/kernel/debug/tracing/events/irq_matrix/enable
echo 1 >/sys/kernel/debug/tracing/events/irq_vectors/vector*/enable

When the problem triggers the bogus vector is printed and the trace is
frozen. Please provide dmesg and the tracebuffer output.


OK, I'm temporarily adding this to the regression internal branch. I'll let you know once we have a repro.

Thanks,
Tariq

Thanks,

tglx

8<--------------
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -822,6 +822,12 @@ static void free_moved_vector(struct api
unsigned int cpu = apicd->prev_cpu;
bool managed = apicd->is_managed;
+ if (vector < FIRST_EXTERNAL_VECTOR || vector >= FIRST_SYSTEM_VECTOR) {
+ tracing_off();
+ pr_err("Trying to clear prev_vector: %u\n", vector);
+ goto out;
+ }
+
/*
* This should never happen. Managed interrupts are not
* migrated except on CPU down, which does not involve the
@@ -833,6 +839,7 @@ static void free_moved_vector(struct api
trace_vector_free_moved(apicd->irq, cpu, vector, managed);
irq_matrix_free(vector_matrix, cpu, vector, managed);
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
+out:
hlist_del_init(&apicd->clist);
apicd->prev_vector = 0;
apicd->move_in_progress = 0;