Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads

From: Petr Mladek
Date: Thu Apr 20 2023 - 05:55:59 EST


On Thu 2023-04-06 11:46:37, Petr Mladek wrote:
> On Thu 2023-03-02 21:02:11, John Ogness wrote:
> > From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >
> > Add the infrastructure to create a printer thread per console along
> > with the required thread function, which is takeover/handover aware.
>
> > --- a/kernel/printk/printk_nobkl.c
> > +++ b/kernel/printk/printk_nobkl.c
> > +/**
> > + * cons_kthread_func - The printk thread function
> > + * @__console: Console to operate on
> > + */
> > +static int cons_kthread_func(void *__console)
> > +{
> > + struct console *con = __console;
> > + struct cons_write_context wctxt = {
> > + .ctxt.console = con,
> > + .ctxt.prio = CONS_PRIO_NORMAL,
> > + .ctxt.thread = 1,
> > + };
> > + struct cons_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> > + unsigned long flags;
> > + short con_flags;
> > + bool backlog;
> > + int cookie;
> > + int ret;
> > +
> > + for (;;) {
> > + atomic_inc(&con->kthread_waiting);
> > +
> > + /*
> > + * Provides a full memory barrier vs. cons_kthread_wake().
> > + */
> > + ret = rcuwait_wait_event(&con->rcuwait,
> > + cons_kthread_should_wakeup(con, ctxt),
> > + TASK_INTERRUPTIBLE);
> > +
> > + atomic_dec(&con->kthread_waiting);
> > +
> > + if (kthread_should_stop())
> > + break;
> > +
> > + /* Wait was interrupted by a spurious signal, go back to sleep */
> > + if (ret)
> > + continue;
> > +
> > + for (;;) {
> > + cookie = console_srcu_read_lock();
> > +
> > + /*
> > + * Ensure this stays on the CPU to make handover and
> > + * takeover possible.
> > + */
> > + if (con->port_lock)
> > + con->port_lock(con, true, &flags);
>
> IMHO, we should use a more generic name. This should be a lock that
> provides full synchronization between con->write() and other
> operations on the device used by the console.
>
> "port_lock" is specific for the serial consoles. IMHO, other consoles
> might use another lock. IMHO, tty uses "console_lock" internally for
> this purpose. netconsole seems to has "target_list_lock" that might
> possible have this purpose, s390 consoles are using sclp_con_lock,
> sclp_vt220_lock, or get_ccwdev_lock(raw->cdev).
>
>
> Honestly, I expected that we could replace these locks by
> cons_acquire_lock(). I know that the new lock is special: sleeping,
> timeouting, allows hand-over by priorities.
>
> But I think that we might implement cons_acquire_lock() that would always
> busy wait without any timeout. And use some "priority" that would
> never handover the lock a voluntary way at least not with a voluntary
> one. The only difference would be that it is sleeping. But it might
> be acceptable in many cases.
>
> Using the new lock instead of port->lock would allow to remove
> the tricks with using spin_trylock() when oops_in_progress is set.
>
> That said, I am not sure if this is possible without major changes.
> For example, in case of serial consoles, it would require touching
> the layer using port->lock.
>
> Also it would requere 1:1 relation between struct console and the output
> device lock. I am not sure if it is always the case. On the other
> hand, adding some infrastructure for this 1:1 relationship would
> help to solve smooth transition from the boot to the real console
> driver.
>
>
> OK, let's first define what the two locks are supposed to synchronize.
> My understanding is that this patchset uses them the following way:
>
> + The new lock (atomic_state) is used to serialize emiting
> messages between different write contexts. It replaces
> the functionality of console_lock.
>
> It is a per-console sleeping lock, allows voluntary and hars
> hand-over using priorities and spinning with a timeout.
>
>
> + The port_lock is used to synchronize various operations
> of the console driver/device, like probe, init, exit,
> configuration update.
>
> It is typically a per-console driver/device spin lock.
>
>
> I guess that we would want to keep both locks:
>
> + it might help to do the rework manageable
>
> + the sleeping lock might complicate some operations;
> raw_spin_lock might be necessary at least on
> non-RT system.

I forgot to check how these two locks are supposed to be used
in write_atomic().

It seems that cons_atomic_flush_con() takes only the new lock
(atomic_state) and ignores the port_lock(). It should be safe
against write_kthread(). But it is not safe against other
operations with the console device that are synchronized
only by the port_lock().

This looks like a potential source of problems and regressions.

Do I miss something, please?
Is there any plan how to deal with this?

Best Regards,
Petr