Re: [PATCH printk v2 06/26] printk: nbcon: Ensure ownership release on failed emit

From: John Ogness
Date: Tue Feb 20 2024 - 11:29:41 EST


On 2024-02-20, Petr Mladek <pmladek@xxxxxxxx> wrote:
>> Until now it was assumed that ownership has been lost when the
>> write_atomic() callback fails. And nbcon_emit_next_record()
>> directly returned false. However, if nbcon_emit_next_record()
>> returns false, the context must no longer have ownership.
>>
>> The semantics for the callbacks could be specified such that
>> if they return false, they must have released ownership. But
>> in practice those semantics seem odd since the ownership was
>> acquired outside of the callback.
>>
>> Ensure ownership has been released before reporting failure by
>> explicitly attempting a release. If the current context is not
>> the owner, the release has no effect.
>
> Hmm, the new semantic is not ideal either. And I think that it is
> even worse. The function still releases the owership even though
> it has been acquired by the caller. In addition, it causes
> a double unlock in a valid case. I know that the 2nd
> nbcon_context_release() is a NOP but...
>
> I would personally solve this by adding a comment into the code
> and moving the check, see below.
>
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -891,17 +891,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>> nbcon_state_read(con, &cur);
>> wctxt->unsafe_takeover = cur.unsafe_takeover;
>>
>> - if (con->write_atomic) {
>> + if (con->write_atomic)
>> done = con->write_atomic(con, wctxt);
>> - } else {
>
> This code path does not create a bad semantic. The semantic is
> as it is because the context might lose the ownership in "any"
> nested function.
>
> Well, it might deserve a comment, something like:
>
> /*
> * nbcon_emit_next_record() should never be called for legacy
> * consoles. Handle it as if write_atomic() have lost
> * the ownership and try to continue.
> */
>> - nbcon_context_release(ctxt);
>> - WARN_ON_ONCE(1);
>> - done = false;
>> - }
>>
>> - /* If not done, the emit was aborted. */
>> - if (!done)
>> + if (!done) {
>> + /*
>> + * The emit was aborted, probably due to a loss of ownership.
>> + * Ensure ownership was lost or released before reporting the
>> + * loss.
>> + */
>
> Is there a valid reason when con->write_atomic() would return false
> and still own the context?

This is driver code, so you must use your imagination. But I thought
maybe there might be some reason why the driver cannot print the message
(due to other driver-internal reasons). In this case, it would return
false even though it never lost ownership.

> If not, then this would hide bugs and cause double unlock in
> the valid case.

Even if true is returned, that does not mean that there is still
ownership (because it can be lost at any time). And even if we hit the
WARN because there is no callback, ownership may have been lost. My
point is that there is _always_ a chance that nbcon_context_release()
will be called when ownership was already lost.

nbcon_context_release() was purposely implemented with the idea that it
may be called by a context that has lost ownership. So why not leverage
this here? It is _critical_ that if _this_ function returns false, the
context no longer has ownership.

We could add a nbcon_can_proceed() in front of the release, but
nbcon_context_release() already does that internally.

>> + nbcon_context_release(ctxt);
>> return false;
>
> Even better solution might be to do the check at the beginning of
> the function. It might look like:
>
> if (WARN_ON_ONCE(!con->write_atomic)) {
> /*
> * This function should never be called for legacy consoles.
> * Handle it as if write_atomic() have lost the ownership
> * and try to continue.
> */
> nbcon_context_release(ctxt);
> return false;
> }

In the future, con->write_thread() is added. So the missing callback
check will end up in a final else branch anyway.

John