Re: [PATCH] softlockup: stop spurious softlockup messages due tooverflow

From: Colin Ian King
Date: Thu Mar 18 2010 - 09:22:29 EST


On Tue, 2010-03-16 at 11:12 +0100, Ingo Molnar wrote:
> * Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>
> > Le lundi 15 mars 2010 ?? 14:01 +0000, Colin Ian King a ??crit :
> > > Ensure additions on touch_ts do not overflow. This can occur when
> > > the top 32 bits of the TSC reach 0xffffffff causing additions to
> > > touch_ts to overflow and this in turn generates spurious softlockup
> > > warnings.
> > >
> > > Signed-off-by: Colin Ian King <colin.king@xxxxxxxxxxxxx>
> > > ---
> > > kernel/softlockup.c | 6 +++---
> > > 1 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/softlockup.c b/kernel/softlockup.c
> > > index 0d4c789..90d9aa0 100644
> > > --- a/kernel/softlockup.c
> > > +++ b/kernel/softlockup.c
> > > @@ -111,10 +111,10 @@ int proc_dosoftlockup_thresh(struct ctl_table *table, int write,
> > > void softlockup_tick(void)
> > > {
> > > int this_cpu = smp_processor_id();
> > > - unsigned long touch_ts = per_cpu(softlockup_touch_ts, this_cpu);
> > > + unsigned long long touch_ts = per_cpu(softlockup_touch_ts, this_cpu);
> > > unsigned long print_ts;
> > > struct pt_regs *regs = get_irq_regs();
> > > - unsigned long now;
> > > + unsigned long long now;
> > >
> > > /* Is detection switched off? */
> > > if (!per_cpu(softlockup_watchdog, this_cpu) || softlockup_thresh <= 0) {
> > > @@ -165,7 +165,7 @@ void softlockup_tick(void)
> > > per_cpu(softlockup_print_ts, this_cpu) = touch_ts;
> > >
> > > spin_lock(&print_lock);
> > > - printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n",
> > > + printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %llus! [%s:%d]\n",
> > > this_cpu, now - touch_ts,
> > > current->comm, task_pid_nr(current));
> > > print_modules();
> >
> > This looks wrong, touch_ts is a long, not a long long.
>
> Could be increased to long long - but that's probably overkill as the touch_ts
> is in seconds, so the scope of comparisons should never truly get even close
> to ~2^31.
>
> > You probably want to change the comparisons instead.
> >
> > if (now > touch_ts + softlockup_thresh/2)
> > wake_up_process(per_cpu(softlockup_watchdog, this_cpu));
> > if (now <= (touch_ts + softlockup_thresh))
> > return;
> >
> > ->
> >
> > if ((long)(now - touch_ts) > softlockup_thresh/2)
> > wake_up_process(per_cpu(softlockup_watchdog, this_cpu));
> > if ((long)(now - touch_ts) <= softlockup_thresh))
> > return;
> >
> > Or use standard time_after()/time_before() macros.
>
> Yeah, time_after/before would work better i suspect.
>
> Thanks,
>
> Ingo

Using time_after/before:

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 0d4c789..4b493f6 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -155,11 +155,11 @@ void softlockup_tick(void)
* Wake up the high-prio watchdog task twice per
* threshold timespan.
*/
- if (now > touch_ts + softlockup_thresh/2)
+ if (time_after(now - softlockup_thresh/2, touch_ts))
wake_up_process(per_cpu(softlockup_watchdog, this_cpu));

/* Warn about unreasonable delays: */
- if (now <= (touch_ts + softlockup_thresh))
+ if (time_before_eq(now - softlockup_thresh, touch_ts))
return;

per_cpu(softlockup_print_ts, this_cpu) = touch_ts;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/