numlist_push() barriers Re: [RFC PATCH v4 1/9] printk-rb: add a new printk ringbuffer implementation

From: Petr Mladek
Date: Fri Aug 23 2019 - 05:21:14 EST


On Thu 2019-08-08 00:32:26, John Ogness wrote:
> --- /dev/null
> +++ b/kernel/printk/numlist.c
> +/**
> + * numlist_push() - Add a node to the list and assign it a sequence number.
> + *
> + * @nl: The numbered list to push to.
> + *
> + * @n: A node to push to the numbered list.
> + * The node must not already be part of a list.
> + *
> + * @id: The ID of the node.
> + *
> + * A node is added in two steps: The first step is to make this node the
> + * head, which causes a following push to add to this node. The second step is
> + * to update @next_id of the former head node to point to this one, which
> + * makes this node visible to any task that sees the former head node.
> + */
> +void numlist_push(struct numlist *nl, struct nl_node *n, unsigned long id)
> +{
> + unsigned long head_id;
> + unsigned long seq;
> + unsigned long r;
> +
> + /*
> + * bA:
> + *
> + * Setup the node to be a list terminator: next_id == id.
> + */
> + WRITE_ONCE(n->next_id, id);

Do we need WRITE_ONCE() here?
Both "n" and "id" are given as parameters and do not change.
The assigment must be done before "id" is set as nl->head_id.
The ordering is enforced by cmpxchg_release().

> +
> + /* bB: #1 */
> + head_id = atomic_long_read(&nl->head_id);
> +
> + for (;;) {
> + /* bC: */
> + while (!numlist_read(nl, head_id, &seq, NULL)) {
> + /*
> + * @head_id is invalid. Try again with an
> + * updated value.
> + */
> +
> + cpu_relax();

I have got very confused by this. cpu_relax() suggests that this
cycle is busy waiting until a particular node becomes valid.
My first though was that it must cause deadlock in NMI when
the interrupted code is supposed to make the node valid.

But it is the other way. The head is always valid when it is
added to the list. It might become invalid when another CPU
moves the head and the old one gets reused.

Anyway, I do not see any reason for cpu_relax() here.

Also the entire cycle would deserve a comment to avoid this mistake.
For example:

/*
* bC: Read seq from current head. Repeat with new
* head when it has changed and the old one got reused.
*/

> +
> + /* bB: #2 */
> + head_id = atomic_long_read(&nl->head_id);
> + }
> +
> + /*
> + * bD:
> + *
> + * Set @seq to +1 of @seq from the previous head.
> + *
> + * Memory barrier involvement:
> + *
> + * If bB reads from bE, then bC->aA reads from bD.
> + *
> + * Relies on:
> + *
> + * RELEASE from bD to bE
> + * matching
> + * ADDRESS DEP. from bB to bC->aA
> + */
> + WRITE_ONCE(n->seq, seq + 1);

Do we really need WRITE_ONCE() here?
It is the same problem as with setting n->next_id above.

> +
> + /*
> + * bE:
> + *
> + * This store_release() guarantees that @seq and @next are
> + * stored before the node with @id is visible to any popping
> + * writers. It pairs with the address dependency between @id
> + * and @seq/@next provided by numlist_read(). See bD and bF
> + * for details.
> + */
> + r = atomic_long_cmpxchg_release(&nl->head_id, head_id, id);
> + if (r == head_id)
> + break;
> +
> + /* bB: #3 */
> + head_id = r;
> + }
> +
> + n = nl->node(head_id, nl->node_arg);
> +
> + /*
> + * The old head (which is still the list terminator), cannot be
> + * removed because the list will always have at least one node.
> + * Therefore @n must be non-NULL.
> + */

Please, move this comment above the nl->node() call. Both locations
makes sense. I just see it as an important note for the call and thus
is should be above. Also it will be better separated from the below
comments for the _release() barrier.

> + /*
> + * bF: the STORE part for @next_id
> + *
> + * Set @next_id of the previous head to @id.
> + *
> + * Memory barrier involvement:
> + *
> + * If bB reads from bE, then bF overwrites bA.
> + *
> + * Relies on:
> + *
> + * RELEASE from bA to bE
> + * matching
> + * ADDRESS DEP. from bB to bF
> + */
> + /*
> + * bG: the RELEASE part for @next_id
> + *
> + * This _release() guarantees that a reader will see the updates to
> + * this node's @seq/@next_id if the reader saw the @next_id of the
> + * previous node in the list. It pairs with the address dependency
> + * between @id and @seq/@next provided by numlist_read().
> + *
> + * Memory barrier involvement:
> + *
> + * If aB reads from bG, then aA' reads from bD, where aA' is in
> + * numlist_read() to read the node ID from bG.
> + * If aB reads from bG, then aB' reads from bA, where aB' is in
> + * numlist_read() to read the node ID from bG.
> + *
> + * Relies on:
> + *
> + * RELEASE from bG to bD
> + * matching
> + * ADDRESS DEP. from aB to aA'
> + *
> + * RELEASE from bG to bA
> + * matching
> + * ADDRESS DEP. from aB to aB'
> + */
> + smp_store_release(&n->next_id, id);

Sigh, I see this line one screen below the previous command thanks
to the extensive comments. Well, bF comment looks redundant.

> +}

Best Regards,
Petr