Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

From: Joao Correia
Date: Wed Jul 08 2009 - 17:45:25 EST


Hello again

On Tue, Jul 7, 2009 at 11:47 AM, Andres Freund<andres@xxxxxxxxxxx> wrote:
> On Tuesday 07 July 2009 12:40:16 Joao Correia wrote:
>> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>>
>> Let me know what/if i can help with tracking down the original source
>> of the problem.
> You dont see the problem anymore with the `echo 0 >
> /proc/sys/kernel/timer_migration` change (or equivalently with the patch from
> Jarek) or has the problem vanished completely?
>
> Andres
>
> On Tuesday 07 July 2009 13:03:50 Joao Correia wrote:
>> I dont see the problem with the patch from Jarek


I have to correct this information.
I had inserted `echo 0 >> /proc/sys/kernel/timer_migration` into
rc.local, and i left it there when i applied your first patch.

Im talking about this patch:

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..011429c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

cpu = smp_processor_id();

-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if 0

After removing the line from rc.local, and leaving only the patch, the
freeze still happens. The patch -does not- prevent the freeze. It was
my mistake saying it does, i totally forgot i had added that line to
rc.local.

So again, the only thing that stops that freeze is `echo 0 >>
/proc/sys/kernel/timer_migration`. Apologies for pointing you in the
wrong direction.

I also tried the other patch provided:

kernel/timer.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..61ba855 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -658,6 +658,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
+ BUG_ON(tbase_get_base(timer->base));
timer_set_base(timer, base);
}
}

but the OPS never triggers, either with your first patch or with the
echo 0 > proc[...]

I was under the impression that disabling the entry in /proc or
applying the first patch would provide the same result, but alas, it
does not.

Joao Correia

[PS Im providing the patches in this email to contextualize this so
that people dont get lost]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/