Re: PROBLEM: CONFIG_NO_HZ could cause software timeouts

From: Marcin Slusarz
Date: Sun Sep 06 2009 - 06:18:37 EST


Pavel Machek wrote:
> On Sat 2009-09-05 20:19:46, Marcin Slusarz wrote:
>> Norbert van Bolhuis wrote:
>>> The problem occurs when e.g. drivers use time_after(jiffes, timeout).
>>>
>>> CONFIG_NO_HZ could make jiffies advance by more than 1.
>>> This is done by:
>>> tick_nohz_update_jiffies->tick_do_update_jiffies64->do_timer
>>>
>>> If drivers use a timeout value of jiffies+1,
>>> "time_after(jiffies, timeout)" will be true after 1 interrupt
>>> (given that it advances jiffies by at least 2).
>>>
>>> This is exactly what happens in cfi_cmdset_0002.c:do_write_buffer
>>> for our case (Powerpc MPC8313, linux-2.6.28, CONFIG_HZ=250,
>>> CONFIG_NO_HZ=y).
>>>
>>> do_write_buffer does the following:
>>> unsigned long uWriteTimeout = ( HZ / 1000 ) + 1;
>>> ...
>>> timeo = jiffies + uWriteTimeout;
>>> ...
>>> for (;;) {
>>> ...
>>> if (time_after(jiffies, timeo) && !chip_ready(map, adr))
>>> break;
>>> if (chip_ready(map, adr)) {
>>> xip_enable(map, chip, adr);
>>> goto op_done;
>>> }
>>> UDELAY(map, chip, adr, 1);
>>> }
>>> /* software timeout */
>>> ret = -EIO;
>>> opdone:
>>> ...
>>>
>>> I've seen a few software timeouts after the for-loop
>>> looped only 13 times (= 13 us delay, i.s.o. the expected 1 ms). Typically
>> Are you sure? UDELAY may call schedule(), which can return to this thread
>> after much longer time than 13us...
>
> Too long wait is expected, but AFAICS he's complaining about too short
> delay and that's a hard bug.

Yeah, I know. But conclusion is a bit fishy - 13 iterations don't necessarily mean 13us.
Bug might be elsewhere.

Marcin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/