Re: [PATCH printk v1 01/13] printk: rename cpulock functions

From: Petr Mladek
Date: Tue Feb 15 2022 - 04:15:39 EST


On Fri 2022-02-11 15:48:08, John Ogness wrote:
> On 2022-02-11, Petr Mladek <pmladek@xxxxxxxx> wrote:
> > On Mon 2022-02-07 20:49:11, John Ogness wrote:
> >> Since the printk cpulock is CPU-reentrant and since it is used
> >> in all contexts, its usage must be carefully considered and
> >> most likely will require programming locklessly. To avoid
> >> mistaking the printk cpulock as a typical lock, rename it to
> >> cpu_sync. The main functions then become:
> >>
> >> printk_cpu_sync_get_irqsave(flags);
> >> printk_cpu_sync_put_irqrestore(flags);
> >
> > It is possible that I will understand the motivation later when
> > reading the entire patchset. But my initial reaction is confusion ;-)
>
> Actually, the motivation comes from a discussion we had during the RT
> Track at Plumbers 2021 [0]. It isn't a lock and so we didn't want to
> call it a lock. (More below.)

Thanks for the link. I have listened to the discussion. And I am still not
persuaded ;-)


> > From mo POV, it is a lock. It tries to get exclusive access and
> > has to wait until the current owner releases it.
>
> It is only exclusive for a CPU. If another context on that CPU tries to
> get the "lock" it will succeed. For example:
>
> process context lock() -> success
> --- INTERRUPT ---
> irq context lock() -> success
> --- NMI ---
> nmi context lock() -> success
>
> None of these contexts can assume that they have synchronized access
> because clearly they have all interrupted each other. If an object does
> not provide synchronized access to data, then "lock" is probably not a
> good name for that object.

All _nested_ locks have these limits. In fact, _all_ locks have these
limits. This is why it is common to take many locks (chain of locks)
if you really want to serialize some things. This is why there are
ABBA problems.


> > As you say: "its usage must be carefully considered and most likely
> > will require programming locklessly." I guess that it is related to:
> >
> > + There is a risk of deadlocks that are typically associated with
> > locks. After all the word "lock" is part of "deadlock".
> >
> > + It requires lockless programming because it is supposed to be
> > terminal lock. It means that no other locks should be taken
> > under it.
>
> It is because (as in the example above), taking this "lock" does not
> provide synchronization to data. It is only synchronizing between
> CPUs. It was Steven's suggestion to call the thing a cpu_sync object and
> nobody in the RT Track seemed to disagree.

IMHO, the main task of this API is to synchronize CPUs. It is normal that
a lock does not protect all objects that are accessed under the lock.


> > I have get() and put() associated with reference counting. But it has
> > an opposite meaning. It usually guards an object from freeing as long
> > as there is at least one user. And it allows to have many users.
>
> This _is_ reference counting. In fact, if you look at the implementation
> you see:
>
> atomic_inc(&printk_cpu_sync_nested);
>
> It is allowing multiple users (from the same CPU).

Yes. My point is that reference counting prevents releasing of an
object. It does not prevent parallel access. The parallel access
is prevented by locks.

>From my POV, the main task of this API is to prevent parallel
printing from other CPUs. Even Steven Rostedt wrote in the chat
"This makes it a lock" see the recording[0] around the time 2:46:14.


> > Regarding the reentrancy. It seems that "_nested" suffix is used for
> > this type of locks, for example, mutex_lock_nested(),
> > spin_lock_nested().
> >
> > It might be enough to add "_nested" suffix and explain why it has
> > to be used carefully (terminal lock) in a comment.
>
> The internal counter is called "_nested" to make it clear to us printk
> developers. IMO the common _get and _put semantics are appropriate
> here. The important thing is that the word "lock" is removed. It is not
> a lock.

Why is it so important to get rid of the word "lock", please?

Well, I probably understand it. The API must be used carefully.
This whole discussion is about how to make the risks more obvious.

My main fear are deadlocks caused when someone tries to get this
"cpu_sync" thing. This is why I would like to call it a lock.

I guess that you are more concerned about races between different
contexts when implementing atomic consoles. This is why you prefer
to avoid the word "lock".

OK, this discussion helped me to improve my mental model about
this API. So, the name is getting less important for me. I would
still slightly prefer to keep "lock". But I am fine with the renaming
to "put/get".

[0] https://youtu.be/cZUzc0U1jJ4?t=12946

Best Regards,
Petr