Re: [PATCH] vt: Fix sleeping functions called from atomic context

From: Fabio M. De Francesco
Date: Thu Nov 18 2021 - 04:38:39 EST


On Thursday, November 18, 2021 9:31:06 AM CET Fabio M. De Francesco wrote:
> On Wednesday, November 17, 2021 11:51:13 AM CET Tetsuo Handa wrote:
> > On 2021/11/17 17:54, Greg Kroah-Hartman wrote:
> > > Great, you have a reproducer, so you should be able to duplicate this
> > > locally to figure out what is really happening here.
> >
> > Until commit ac751efa6a0d70f2 ("console: rename acquire/release_console_sem() to
> > console_lock/unlock()"), do_con_write() was surely designed to be able to sleep.
> >
> > > $ git blame ac751efa6a0d7~1 drivers/tty/vt/vt.c
> >
> > [...]
> >
> > Until that commit, n_hdlc_send_frames() was prepared for being interrupted by signal
> > while sleeping.
> >
> > $ git blame ac751efa6a0d7~1 drivers/tty/n_hdlc.c
> >
> > [...]
> >
> > But as of commit c545b66c6922b002 ("tty: Serialize tcflow() with other tty flow
> > control changes"), start_tty() was already holding spinlock.
>
> Hi Tetsuo,
>
> Actually, we don't care of start_tty(). It's not in the path that triggers sleeping in atomic bug.
> According to Syzbot report and to my ftrace analysis it's __start_tty() that is called by
> n_tty_ioctl_helper(), and it is this function that acquires a spinlock and disables interrupts.
>
> I must admit that I've never used git-blame and I'm not sure to understand what you did here :(

I've had a chance to look both at commit c545b66c6922 and f9e053dcfc02. They are so strictly
related (same code. same author, same date) that I'm not anymore sure of which is to blame.

However, at this moment I'm scarcely interested in figuring out which one actually is responsible
for this issue.

What I know is that this issue is _real_ and that it should be fixed some way.

> Have you had a chance to read my analysis?
>
> > $ git blame c545b66c6922b002~1 drivers/tty/tty_io.c
> >
> > [...]
> >
> > Actually, it is commit f9e053dcfc02b0ad ("tty: Serialize tty flow control changes
> > with flow_lock") that started calling tty->ops->start(tty) from atomic context.
> >
> > $ git blame f9e053dcfc02b~1 drivers/tty/tty_io.c
> >
> > [...]
> >
> > Therefore, I think that bisection will reach f9e053dcfc02b0ad, and I guess that
> > this bug was not noticed simply because little people tested n_hdlc driver.
> >
> > Well, how to fix? Introduce a new flag for indicating "starting" state (like drivers/block/loop.c uses Lo_* state) ?
>
> I think this is not the correct fix, but I might very well be wrong...

I've read (again) the code that you mentioned. I've changed my mind about it.
Now I think that a flag like that could be useful if there are no other means to
lock __start_tty() and __stop_tty() critical sections.

Thanks,

Fabio

> Can you please reply to my last email (the one with the ftrace analysis)?
> In the last lines I proposed two alternative solutions, what about them?
>
> Thanks,
>
> Fabio
>
>
>
>
>