Re: [RFC PATCH printk v1] printk: ringbuffer: Do not skip non-finalized with prb_next_seq()

From: Petr Mladek
Date: Fri Nov 03 2023 - 11:58:14 EST


On Fri 2023-11-03 14:37:23, John Ogness wrote:
> On 2023-11-03, Petr Mladek <pmladek@xxxxxxxx> wrote:
> >> Generally we have not concerned ourselves with readers. But I agree we
> >> should make the optimization coherent with what a reader can actually
> >> read. It might save some CPU cycles for polling tasks.
> >
> > I wanted to agree. But then I found this scenario:
> >
> > CPU0 CPU1
> >
> > console_unlock()
> > console_flush_all()
> >
> > printk()
> > vprintk_store()
> > return;
> > prb_final_commit;
> >
> > console_trylock(); # failed

__console_unlock();

while (prb_read_valid() || console_trylock();

I added the __console_unlock() and console_trylock()
to have the full picture.

> >
> > Now, the race:
> >
> > + console_flush_all() did not flush the message from CPU1 because
> > it was not finalized in time.
> >
> > + CPU1 failed to get console_lock() => CPU0 is responsible for
> > flushing
> >
> > + prb_read_valid() failed on CPU0 because it did not see
> > the prb_desc finalized (missing barrier).
>
> For semaphores, up() and down_trylock() successfully take and release a
> raw spin lock. That provides the necessary barriers so that CPU0 sees
> the record that CPU1 finalized.

I see. Hmm, we should document this. The spinlock is an implementaion
detail.

IMHO, up()/down()/down_trylock() are quite special. I think
that the theory is that lock() and unlock() are one-way barriers.
And trylock() is one-way barrier only when it succeds.

By other words, _trylock() on CPU1 normally would not guarantee
that CPU0 would see the finalized record after unlock(). [*]


Maybe, we could rely on the existing behavior but we should at least
document it.

Note that I thouht many times about using spin_trylock() in
down_trylock(). It would make more sense because trylock()
should not be be blocking operations. But I see now,
that it might break users depending on the full barrier.


Note: It is possible that I get it wrong. It is so easy to get lost
in barriers.


> >> Writing and reading of @last_finalized_seq will provide the necessary
> >> boundaries to guarantee this:
> >>
> >> ...finalize record...
> >> atomic_long_try_cmpxchg_release(&desc_ring->last_finalized_seq, ...);
> >>
> >> and
> >>
> >> atomic_long_read_acquire(&desc_ring->last_finalized_seq);
> >> ...read record...
> >
> > Yup. something like this.
> >
> > Well, it is suspicious that there is no _release() counter part.
>
> Take a closer look above. The cmpxchg (on the writer side) does the
> release. I have the litmus tests to verify that is correct and
> sufficient for what we want: to guarantee that for any read
> @last_finalized_seq value, the CPU can also read the associated record.

The barrier is only in _prb_commit() which sets the committed state.

desc_make_final() uses cmpxchg_relaxed() so that it is not guaranteed
that other CPUs would see the "finalized" state.

> I am finalizing a new version of the "fix console flushing on panic"
> series [0] that will also include the prb_next_seq() fix. If needed, we
> can continue this discussion based on the new code.

OK, maybe it would look different in the new version. And it might
be better to continue the discussion on the new code.

Best Regards,
Petr