Re: port lock: was: Re: [PATCH printk v1 11/18] printk: nobkl: Introduce printer threads

From: Petr Mladek
Date: Fri Apr 21 2023 - 12:15:33 EST


On Thu 2023-04-20 15:33:10, Petr Mladek wrote:
> On Thu 2023-04-20 12:39:31, John Ogness wrote:
> I know. And the hostile takeover is not my concern.
>
> My concern are races between write_atomic() in emergency context
> and other driver code serialized only by the port->lock.
>
> We need an API that will make sure that any code serialized
> by port->lock is properly serialized against write->atomic()
> when the console is registered.

I though more about it. My idea is the following:

A. The nbcon side might have basically four modes
for taking the new nbcon lock. It might have four interfaces:

nbcon_trylock(struct console *con,
enum cons_prio prio);
nbcon_trylock_emergency(struct console *con,
enum cons_prio prio);
nbcon_trylock_panic(struct console *con,
enum cons_prio prio);
nbcon_lock(struct console *con,
enum cons_prio prio);

, where

+ nbcon_trylock() would use the current approach for
the printk kthread. It means that it would try to get
the lock with a timeout. But it would never try to
steel the lock.

+ nbcon_trylock_emergency() would use the current approach
used in emergency context. It would busy wait and
then try to steel the lock. But it would take over the lock
only when it is in safe context.

+ nbcon_trylock_panic() would behave the same way as
nbcon_trylock_emergency(). But it would allow to
take over the lock even when it is unsafe. It might
still fail when it is not called on the CPU that
handles the panic().

+ nbcon_lock() would wait until the lock is really
available.

and

enum cons_prio would be one of the four priorities.

The API should disable cpu migration to make sure that
it will stay the same until the lock is released.

The caller should rememner the priority somewhere,
e,g. in struct cons_ctxt.


B. The port->lock side would switch to the new nbcon lock
when the console is registered. There are two big questions
that come to my mind:

1. The original code does not expect that it might lose
the lock.

It should be enough to mark the entire section .unsafe.
In that case, only the final panic() call might steel
the lock.


2. The console registration must be done a safe way
to make sure that all callers will use the same
real lock (port->lock or nbcon_lock).


IMHO, the uart_port part might look like:

void uart_port_lock_irqsafe(struct uart_port *port,
int *cookie,
unsigned long &flags)
{
struct console *con;

try_again:
/* Synchrnonization against console registration. */
*cookie = console_srcu_read_lock();

con = rcu_access_pointer(nbcon->cons);

if (!can_use_nbcon_lock(con)) {
/* Only the port lock is available. */
spin_lock_irqsafe(&port->lock, *flags);
port->locked = LOCKED_BY_PORT_LOCK;
return;
}


/*
* The nbcon lock is available. Take it instead of
* the port->lock. The only exception is when
* there is registration in progress. In this case,
* port->lock has to be taken as well.
*
* It will always be taken only with the normal priority.
* when called from the port lock side.
*/
nbcon_lock(con, CON_PRIO_NORMAL);
local_irq_save(*flags);

if (cons->registration_in_progress) {
spin_lock(&port->lock);
port->locked = LOCKED_BY_BOTH_LOCKS;
} else {
port->locked = LOCKED_BY_NBCON_LOCK;
}

/*
* Makes sure that only nbcon_lock_panic() would
* be able to steel this lock.
*/
if (!nbcon_enter_unsafe(con, CON_PRIO_NORMAL)) {
revert locks;
goto try_again;
}
}

void uart_port_unlock_irqrestore(struct uart_port *port,
int *cookie, unsigned long *flags)
{
struct console *con;

con = rcu_access_pointer(nbcon->cons);

switch (port->locked) {
LOCKED_BY_PORT_LOCK:
spin_unlock_irqrestore(&port->lock, *flags);
break;
LOCKED_BY_BOTH_LOCKS:
spin_unlock(&port->lock);
fallthrough;
LOCKED_BY_NBCON_LOCK:
nbcon_exit_unsafe(con, CON_PRIO_NORMAL);
local_irq_restore(*flags);
nbcon_unlock(con, CON_PRIO_NORMAL);
};

console_srcu_unlock(*cookie);
}


and finally the registration:

void register_console(struct console *newcon)
{
[...]
if (con->flags & CON_NBCON) {
nbcon_lock(con);
nbcon->regisration_in_progress = true;
nbcon_unlock(con);

/*
* Make sure that callers are locked by both
* nbcon_lock() and port->lock()
*/
synchronize_srcu();
}

/* Insert the console into console_list */

if (con->flags & CON_NBCON) {
nbcon_lock(con);
nbcon->regisration_in_progress = false;
nbcon_unlock(con);
}
[...]
}

and similar thing in uregister_console().

I am not sure if I should send this on Friday evening. But I reworked
it many times and I do not longer see any obvious reason why it
could not work.

My relief is that it builds on top of your code. It basically just
adds the port_lock interface. I hope that it would actually simplify
things a lot.

Well, a huge challenge might be to replace all spin_lock(port->lock)
calls with the new API. There is a lot of code shared between various
consoles and we wanted to migrate them one-by-one.

On the other hand, the new port_lock() API should behave as simple
spin lock when port->cons is a legacy console.

Best Regards,
Petr