Jiffy wraparound problems.

David Benson (daveb@ffem.org)
Wed, 28 Apr 1999 16:50:55 -0700 (PDT)


We have quite a few machines with long uptimes here.

We had noticed a collection of problems with them, in particular
select(2) breaks, once the "jiffies" global variable reaches
2**31, and starts looking signed. The bad behaviour is that
the "timeval" structure (the fifth argument to select)
gets set withgarbage valuables, much larger
that those passed in. In the 2.0.* series of kernels this appears
to be a bug in the sys_select function in fs/select.c
and in 2.2.* it is in schedule_timeout.

The 2.2 problems are somewhat conjectural, but I wrote a test
kernel-module which sets `jiffies' to 0xf000000; this machine then
exhibits the select bug.

The 2**31 errors crop up that 248 days; errors that occur after
2**32 jiffies come later on.

Other visible problems:
times() returns -1; perhaps this is inevitable/correct...

The way we discovered the select problem, btw, was that `zsh's
interface depends on the select() returned time. It behaves
very strangely when the timeval keeps getting set to very large
numbers.

-----------------------------
At 4 billion jiffies, different stuff starts malfunctioning.
A collection of kernel panics in dmesg:

Aiee, killing interrupt handler
kfree of non-kmalloced memory: 001b65a8, next= 00000000, order=0
kfree of non-kmalloced memory: 001b6598, next= 00000000, order=0
kfree of non-kmalloced memory: 001b6aac, next= 00000000, order=0
idle task may not sleep

and the uptime on this machine now reports it has been up 5days, even
though it didn't reboot; all the processes show the same creation time..

Are there any plans to fix these issues? Are they known problems?

- Dave Benson
daveb@ffem.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/