Re: [RFC PATCH printk v1] printk: ringbuffer: Do not skip non-finalized with prb_next_seq()

From: John Ogness
Date: Fri Nov 03 2023 - 09:31:35 EST


On 2023-11-03, Petr Mladek <pmladek@xxxxxxxx> wrote:
>> IMHO we need 2 different functions:
>>
>> 1. A function that reports the last contiguous finalized record for a
>> reader. This is useful for syslog and kmsg_dump to know what is
>> available for them to read. We can use @last_finalized_seq for this,
>> optimizing it correctly this time.
>
> I would use this also for console_unlock() as well, see below.

OK.

>> 2. A function that reports the last reserved sequence number of a
>> writer. This is useful for pr_flush and console_unlock to know when they
>> are finished.
>
> I would personally use the @last_finalized_seq for
> console_unlock() and pr_flush() without a timeout. We could
> always call defer_console_output() when it is lower then
> the last reserved seq.
>
> Well, we actually do not even need to do this because
> the reserved records must be added by some printk().
> And this printk() will either flush the pending messages
> or it will call defer_console_output().

OK.

> The above paragraph describes a scenario which is not obvious.
> We should probably document it somewhere, probably in the description
> of prb_last_finalized_seq() or how it will be called.

OK.

>> This function can begin with @last_finalized_seq, looking
>> for the last finalized record (skipping over any non-finalized).
>
> I though about using desc_ring->head_id or looking for the
> last reserved sequence number.

The problem with @head_id is that the sequence number may not be
assigned yet. Really @last_finalized_seq is the newest sequence number
we have to search from.

>> Generally we have not concerned ourselves with readers. But I agree we
>> should make the optimization coherent with what a reader can actually
>> read. It might save some CPU cycles for polling tasks.
>
> I wanted to agree. But then I found this scenario:
>
> CPU0 CPU1
>
> console_unlock()
> console_flush_all()
>
> printk()
> vprintk_store()
> return;
> prb_final_commit;
>
> console_trylock(); # failed
>
> while (prb_read_valid());
>
> Now, the race:
>
> + console_flush_all() did not flush the message from CPU1 because
> it was not finalized in time.
>
> + CPU1 failed to get console_lock() => CPU0 is responsible for
> flushing
>
> + prb_read_valid() failed on CPU0 because it did not see
> the prb_desc finalized (missing barrier).

For semaphores, up() and down_trylock() successfully take and release a
raw spin lock. That provides the necessary barriers so that CPU0 sees
the record that CPU1 finalized.

>> Writing and reading of @last_finalized_seq will provide the necessary
>> boundaries to guarantee this:
>>
>> ...finalize record...
>> atomic_long_try_cmpxchg_release(&desc_ring->last_finalized_seq, ...);
>>
>> and
>>
>> atomic_long_read_acquire(&desc_ring->last_finalized_seq);
>> ...read record...
>
> Yup. something like this.
>
> Well, it is suspicious that there is no _release() counter part.

Take a closer look above. The cmpxchg (on the writer side) does the
release. I have the litmus tests to verify that is correct and
sufficient for what we want: to guarantee that for any read
@last_finalized_seq value, the CPU can also read the associated record.

I am finalizing a new version of the "fix console flushing on panic"
series [0] that will also include the prb_next_seq() fix. If needed, we
can continue this discussion based on the new code.

John

[0] https://lore.kernel.org/lkml/20231013204340.1112036-1-john.ogness@xxxxxxxxxxxxx