Re: [PATCH] tty: vt: make do_con_write() no-op if IRQ is disabled

From: Fabio M. De Francesco
Date: Fri Dec 03 2021 - 09:52:18 EST


On Friday, December 3, 2021 1:32:21 PM CET Tetsuo Handa wrote:
> On 2021/12/03 20:00, Fabio M. De Francesco wrote:
> > On Thursday, December 2, 2021 7:35:16 PM CET Linus Torvalds wrote:
> >> [...]
> >> Example: net/nfc/nci/uart.c. It does that
> >>
> >> schedule_work(&nu->write_work);
> >>
> >> instead of actually trying to do a write from a wakeup routine
> >> (similar examples in ppp - "tasklet_schedule(&ap->tsk)" etc).
> >>
> >> I mean, it's called "wakeup", not "write". So I think the fundamental
> >> confusion here is in hdlc, not the tty layer.
> >>
> >> Linus
> >>
>
> OK.
>
> > This is what I understand from the above argument: do a schedule_work() to get
> > that n_hdlc_send_frames() started; in this way, n_hdlc_tty_wakeup() can
> > return to the caller and n_hdlc_send_frames() is executed asynchronously
> > (i.e., no longer in an atomic context).
>
> Yes.

Thanks for confirming :)

> [...]
>
> > [...]
> >
> > Commit f9e053dcfc02 ("tty: Serialize tty flow control changes with flow_lock")
> > has introduced spinlocks to serialize flow control changes and avoid the
> > concurrent executions of __start_tty() and __stop_tty().
> >
> > [...]
> >
> > This is the reason why we are dealing with this bug. Currently we have __start_tty()
> > called with an acquired spinlock and IRQs disabled and the calls chain leads to
> > console_lock() while in atomic context.
>
> If we hit a race window described in that commit
>
> CPU 0 | CPU 1
> stop_tty() |
> lock ctrl_lock |
> tty->stopped = 1 |
> unlock ctrl_lock |
> | start_tty()
> | lock ctrl_lock
> | tty->stopped = 0
> | unlock ctrl_lock
> | driver->start()
> driver->stop() |
>
> In this case, the flow control state now indicates the tty has
> been started, but the actual hardware state has actually been stopped.
>
> , the tty->stopped flag remains 0 despite driver->stop() is called after
> driver->start() finished. tty->stopped (the flow control state) says "not stopped"
> but the actual hardware state is "stopped".

This is clear and it is the reason why I cited that commit. I understand that
__stop_tty() and __start_tty() cannot run concurrently.

I'm sorry if my poor abilities to express complex thoughts in English may
have made you to think that I'm not aware of the real race condition issue
that commit f9e053dcfc02 is trying to address :(

Unfortunately, by fixing the SAC bug with a mere asynchronous execution of
n_hdlc_send_frames() in a work queue, we may still have different sources of
race conditions.

The point is that, when n_hdlc_tty_wakeup() returns to the IOCTL helper (the caller)
that acquired the spinlock, that same spinlock is immediately released without
even knowing whether or not n_hdlc_send_frames() has had the chance to run
and complete. Further __start_tty() or __stop_tty() calls are allowed to acquire that
spinlock again and we lose serialization because they still run concurrently.

Actually, in the final part of your email you say that more than one instance of
__start_tty() cannot run because of a variable that checks that another thread is
still running the same code. I haven't yet checked, however you showed other
steps that can lead to races and I agree that the problem is not yet fixed.

This is why I think that n_hdlc_send_frames(), when runs asynchronously with
the change that Linus suggested, should also signal completions or change a
a condition variable.

Still using locks locks where it is needed, we should somehow use completions
or condition variables to enforce an execution alternation between __stop_tty
and __start_tty.

Either:

1) wait_for_completion() and complete();

Or:

2) wait_event(tty_event, tty->flow.tco_stopped == true) and
wait_event(tty_event, tty->flow.tco_stopped == false) before entering respectively
__start_tty() and __stop_tty().

Regards,

Fabio M. De Francesco