Re: Linux-2.2.2-pre2..

Philippe Troin (phil@fifi.org)
06 Feb 1999 14:04:35 -0800


Linus Torvalds <torvalds@transmeta.com> writes:

> On 6 Feb 1999, Philippe Troin wrote:
> >
> > I'm going to try that and tell you the results, but I think there's
> > something deeper going on: if this was the only problem, the oops
> > would only happen in a 'crashme' process, but the oops I sent happens
> > in find (why would find get a hangup ?), but I also got it in cat (in
> > while true; do find / ! -fstype nfs -type f -print0 | xargs -n0 cat;
> > done), or in syslogd...
>
> No, the hangup is done asynchronously at the next scheduling point, so the
> hangup even can actually happen in a random process context. So it doesn't
> matter who closes the file, the actual hangup might even happen on another
> CPU..

Ok.

8< snip >8

> Quite frankly, I only looked very quickly at this, and Ted should probably
> see if maybe we could avoid the problem more prettily by just doing some
> initialization in a different order, but from the quick look it basically
> looks to me like the slave tty->termios structure just hasn't been
> allocated yet, so when the hangup code tries to re-initialize it it fails
> for obvious reasons.. Thus the simple test for NULL, while fairly ugly,
> should do the trick.
>
> (For example, maybe we should set "filp->private_data" to point to the tty
> only _after_ it has been initialized? Then the hangup code wouldn't ever
> see half-way initialized tty's at all. But in the meantime the NULL
> pointer test should be sufficient - with the caveat that there might be
> other places you'd need the same check too).

I applied the following patch:
--- /usr/local/src/linux/drivers/char/tty_io.c.linus Fri Jan 15 17:46:27 1999
+++ /usr/local/src/linux/drivers/char/tty_io.c Sat Feb 6 11:32:02 1999
@@ -436,7 +436,7 @@
* Shutdown the current line discipline, and reset it to
* N_TTY.
*/
- if (tty->driver.flags & TTY_DRIVER_RESET_TERMIOS)
+ if ((tty->driver.flags & TTY_DRIVER_RESET_TERMIOS) && tty->termios)
*tty->termios = tty->driver.init_termios;
if (tty->ldisc.num != ldiscs[N_TTY].num) {
if (tty->ldisc.close)

After recompile & reboot in X, I tried to run the gcc testsuites
(which use a bunch of ptys) in parallel on two gcc trees , and I got a
solid lockup.
Rebooted, ran them on the console.
Got a really bad oops, the log was garbled, the only thing I could
note down was EIP, which translates to:

0xc01115d2 is in del_timer (sched.c:475).
470 static inline int detach_timer(struct timer_list *timer)
471 {
472 struct timer_list *prev = timer->prev;
473 if (prev) {
474 struct timer_list *next = timer->next;
475 prev->next = next;
476 if (next)
477 next->prev = prev;
478 return 1;
479 }

...which is rather unusable.

Methinks there's still a problem, and just checking for tty->termios
to be non-null fixes one symptom.

Are these 'dev (03:00) tty->count(0) != #fd's(2) in do_tty_hangup'
and 'null tty in fasync' really harmless ?

Phil.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/