Re: [PATCH v1 01/11] seqlock: provide lockdep-free raw_seqcount_t variant

From: David Hildenbrand
Date: Fri Dec 17 2021 - 17:03:03 EST


On 17.12.21 22:28, Thomas Gleixner wrote:
> On Fri, Dec 17 2021 at 12:30, David Hildenbrand wrote:
>> Sometimes it is required to have a seqcount implementation that uses
>> a structure with a fixed and minimal size -- just a bare unsigned int --
>> independent of the kernel configuration. This is especially valuable, when
>> the raw_ variants of the seqlock function will be used and the additional
>> lockdep part of the seqcount_t structure remains essentially unused.
>>
>> Let's provide a lockdep-free raw_seqcount_t variant that can be used via
>> the raw functions to have a basic seqlock.
>>
>> The target use case is embedding a raw_seqcount_t in the "struct page",
>> where we really want a minimal size and cannot tolerate a sudden grow of
>> the seqcount_t structure resulting in a significant "struct page"
>> increase or even a layout change.
>

Hi Thomas,

thanks for your feedback!

> Cannot tolerate? Could you please provide a reason and not just a
> statement?

Absolutely.

"struct page" is supposed to have a minimal size with a fixed layout.
Embedding something inside such a structure can change the fixed layout
in a way that it can just completely breaks any assumptions on location
of values.

Therefore, embedding a complex structure in it is usually avoided -- and
if we have to (spin_lock), we work around sudden size increases.

There are ways around it: allocate the lock and only store the pointer
in the struct page. But that most certainly adds complexity, which is
why I want to avoid it for now.


I'll extend that answer and add it to the patch description.

>
>> Provide raw_read_seqcount_retry(), to make it easy to match to
>> raw_read_seqcount_begin() in the code.
>>
>> Let's add a short documentation as well.
>>
>> Note: There might be other possible users for raw_seqcount_t where the
>> lockdep part might be completely unused and just wastes memory --
>> essentially any users that only use the raw_ function variants.
>
> Even when the reader side uses raw_seqcount_begin/retry() the writer
> side still can use the non-raw variant which validates that the
> associated lock is held on write.

Yes, that's my understanding as well.

>
> Aside of that your proposed extra raw sequence count needs extra care
> vs. PREEMPT_RT and this want's to be very clearly documented. Why?
>
> The lock association has two purposes:
>
> 1) Lockdep coverage which unearthed bugs already

Yes, that's a real shame to lose.

>
> 2) PREEMPT_RT livelock prevention
>
> Assume the following:
>
> spin_lock(wrlock);
> write_seqcount_begin(seq);
>
> ---> preemption by a high priority reader
>
> seqcount_begin(seq); <-- live lock
>
> The RT substitution does:
>
> seqcount_begin(seq)
> cnt = READ_ONCE(seq->sequence);
>
> if (cnt & 1) {
> lock(s->lock);
> unlock(s->lock);
> }
>
> which prevents the deadlock because it makes the reader block on
> the associated lock, which allows the preempted writer to make
> progress.
>
> This applies to raw_seqcount_begin() as well.
>
> I have no objections against the construct itself, but this has to be
> properly documented vs. the restriction this imposes.

Absolutely, any input is highly appreciated.

But to mention it again: whatever you can do with raw_seqcount_t, you
can do with seqcount_t, and there are already users relying completely
on the raw_ function variants (see my other reply).

So the documentation should most probably be extended to cover the raw_
functions and seqcount_t in general.

>
> As you can see above the writer side therefore has to ensure that it
> cannot preempted on PREEMPT_RT, which limits the possibilities what you
> can do inside a preemption (or interrupt) disabled section on RT enabled
> kernels. See Documentation/locking/locktypes.rst for further information.

It's going to be used for THP, which are currently incompatible with
PREEMPT_RT (disabled in the Kconfig). But preemption is also disabled
because we're using bit_spin_lock(), which does a bit_spin_lock().

Certainly worth documenting!

Thanks for your input!

--
Thanks,

David / dhildenb