Re: [PATCH printk v2 06/26] printk: nbcon: Ensure ownership release on failed emit

From: Petr Mladek
Date: Tue Feb 20 2024 - 10:17:09 EST


On Sun 2024-02-18 20:03:06, John Ogness wrote:
> Until now it was assumed that ownership has been lost when the
> write_atomic() callback fails. And nbcon_emit_next_record()
> directly returned false. However, if nbcon_emit_next_record()
> returns false, the context must no longer have ownership.
>
> The semantics for the callbacks could be specified such that
> if they return false, they must have released ownership. But
> in practice those semantics seem odd since the ownership was
> acquired outside of the callback.
>
> Ensure ownership has been released before reporting failure by
> explicitly attempting a release. If the current context is not
> the owner, the release has no effect.

Hmm, the new semantic is not ideal either. And I think that it is
even worse. The function still releases the owership even though
it has been acquired by the caller. In addition, it causes
a double unlock in a valid case. I know that the 2nd
nbcon_context_release() is a NOP but...

I would personally solve this by adding a comment into the code
and moving the check, see below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -891,17 +891,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
> nbcon_state_read(con, &cur);
> wctxt->unsafe_takeover = cur.unsafe_takeover;
>
> - if (con->write_atomic) {
> + if (con->write_atomic)
> done = con->write_atomic(con, wctxt);
> - } else {

This code path does not create a bad semantic. The semantic is
as it is because the context might lose the ownership in "any"
nested function.

Well, it might deserve a comment, something like:

/*
* nbcon_emit_next_record() should never be called for legacy
* consoles. Handle it as if write_atomic() have lost
* the ownership and try to continue.
*/
> - nbcon_context_release(ctxt);
> - WARN_ON_ONCE(1);
> - done = false;
> - }
>
> - /* If not done, the emit was aborted. */
> - if (!done)
> + if (!done) {
> + /*
> + * The emit was aborted, probably due to a loss of ownership.
> + * Ensure ownership was lost or released before reporting the
> + * loss.
> + */

Is there a valid reason when con->write_atomic() would return false
and still own the context?

If not, then this would hide bugs and cause double unlock in
the valid case.

> + nbcon_context_release(ctxt);
> return false;

Even better solution might be to do the check at the beginning of
the function. It might look like:

if (WARN_ON_ONCE(!con->write_atomic)) {
/*
* This function should never be called for legacy consoles.
* Handle it as if write_atomic() have lost the ownership
* and try to continue.
*/
nbcon_context_release(ctxt);
return false;
}


Best Regards,
Petr