Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()

From: Oleg Nesterov
Date: Wed Nov 01 2023 - 16:25:02 EST


On 11/01, David Howells wrote:
>
> Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> > read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> > after the lockless access failed.
>
> I think you're wrong.

I think you missed the point ;)

> write_seqlock() turns it odd.

It changes seqcount_t->sequence but not "seq" so this doesn't matter.

> For instance, if the read lock is taken first:
>
> sequence seq CPU 1 CPU 2
> ======= ======= =============================== ===============
> 0
> 0 0 seq = 0 MUST BE EVEN

This is correct,

> ACCORDING TO DOC

documentation is wrong, please see

[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
https://lore.kernel.org/all/20231024120808.GA15382@xxxxxxxxxx/

> 0 0 read_seqbegin_or_lock() [lockless]
> ...
> 1 0 write_seqlock()
> 1 0 need_seqretry() [seq=even; sequence!=seq: retry]

Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,

> 1 1 read_seqbegin_or_lock() [exclusive]

No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
it will do

seq = read_seqbegin(lock);

again.

> Note that it spins in __read_seqcount_begin() until we get an even seq,
> indicating that no write is currently in progress - at which point we can
> perform a lockless pass.

Exactly. And this means that "seq" is always even.

> > See thread_group_cputime() as an example, note that it does nextseq = 1 for
> > the 2nd round.
>
> That's not especially convincing.

See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c.
All other users are wrong.

Lets start from the very beginning. This code does

int seq = 0;
do {
read_seqbegin_or_lock(service_conn_lock, &seq);

do_something();

} while (need_seqretry(service_conn_lock, seq));

done_seqretry(service_conn_lock, seq);

Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does

*seq = read_seqbegin(lock);

and returns. Note that "seq" is still even.

Now. If need_seqretry(seq) detects the race with write_seqlock() it returns
true but it does NOT change this "seq", it is still even. So on the next
iteration read_seqbegin_or_lock() will do

*seq = read_seqbegin(lock);

again, it won't take this lock for writing. And again, seq will be even.
And so on.

And this means that the code above is equivalent to

do {
seq = read_seqbegin(service_conn_lock);

do_something();

} while (read_seqretry(service_conn_lock, seq));

and this is what this patch does.

Yes this is confusing. Again, even the documentation is wrong! That is why
I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
to change the semantics of need_seqretry() to enforce the locking on the 2nd
pass.

Oleg.