PROBLEM: CONFIG_NO_HZ could cause software timeouts

From: Norbert van Bolhuis
Date: Thu Sep 03 2009 - 06:50:23 EST



The problem occurs when e.g. drivers use time_after(jiffes, timeout).

CONFIG_NO_HZ could make jiffies advance by more than 1.
This is done by: tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer

If drivers use a timeout value of jiffies+1,
"time_after(jiffies, timeout)" will be true after 1 interrupt
(given that it advances jiffies by at least 2).

This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250, CONFIG_NO_HZ=y).

do_write_buffer does the following:
unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
...
timeo = jiffies + uWriteTimeout;
...
for (;;) {
...
if (time_after(jiffies, timeo) && !chip_ready(map, adr))
break;
if (chip_ready(map, adr)) {
xip_enable(map, chip, adr);
goto op_done;
}
UDELAY(map, chip, adr, 1);
}
/* software timeout */
ret = -EIO;
opdone:
...

I've seen a few software timeouts after the for-loop
looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
our NOR flash (S29GL01GP) may need upto ~ 200 us to be ready.

disabling CONFIG_NO_HZ fixes the problem.
replacing time_after by a for-loop counter to loop max 1000 times
also fixes the problem.

the latest kernel seems to have the same problem.

do I miss something here or is this a known problem of CONFIG_NO_HZ ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/