RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t

From: Reshetova, Elena
Date: Thu Nov 02 2017 - 07:05:04 EST


Sorry for delayed reply, but I was actually reading and trying to understand
all the involved notions, so it took a while...

> On Fri, Oct 27, 2017 at 06:49:55AM +0000, Reshetova, Elena wrote:
> > Could we possibly have a bit more elaborate discussion on this?
> >
> > Or alternatively, what then should be the correct way for a certain
> > variable (that behaves like a standard refcounter) to check if the
> > relaxed memory ordering is ok?
>
> The refcount_t includes sufficient ordering for normal reference
> counting. It provides a happens-before relation wrt. dropping the
> reference, such that all object accesses happen before we drop the
> reference; because at that point the object can go away and any further
> access would be UAF.
>
> So if the thing being replaced is purely a reference count, all should
> be well.

Yes, I understand this, but people are asking a totally different level of
detail and inside analysis and given that there are changes, they have a
point. So, that's why I am trying to figure the "whole story" below.

>
> > This is what Thomas was asking me to do for core kernel conversions
> > and this is what I don't know how to do correctly.
>
> Thomas asked you to verify that nothing relies on the stronger ordering
> provided by atomic_dec_and_test(). This means you have to audit the code
> with an eye towards memory ordering.
>
> Now the futex one is actually OK. Thomas just wants you to mention that
> refcount_dec_and_test is weaker and explain that in this case that is
> fine since pi_state::refcount is in fact a pure refcount and
> put_pi_state() does not in fact rely on further ordering.
>
> Just mentioning the above gives the maintainer:
>
> 1) the information he needs to perform review, mentioning memory
> ordering changes means he needs to look at it.
>
> 2) you identify the place where it changes (put_pi_state) which again
> aids him in review by limiting the amount of code he needs to double
> check.
>
> 3) that you actually looked at the problem; giving more trust you
> actually know wth you're doing.

Well refcount_dec_and_test() is not the only function that has different
memory ordering specifics. So, the full answer then for any arbitrary case
according to your points above would be smth like:

for each substituted function (atomic_* --> refcount_*) that actually
has changes in memory ordering *** perform the following:
- mention the difference
- mention the actual code place where the change would affect (
various put and etc. functions)
- potentially try to understand if it would make a difference for
this code (again here is where I am not sure how to do it properly)


*** the actual list of these functions to me looks like:
1) atomic_inc_not_zero -> refcount_inc_not_zero.
Change is from underlying atomic_cmpxchg() to atomic_try_cmpxchg_relaxed ()
First one implies SMP-conditional general memory barrier
(smp_mb()) on each side of the actual operation (at least according to documentation,
In reality it goes through so many changes, ifdefs and conditionals that one gets lost
Easily in the process).
Second one according to the comment implies no memory barriers whatsoever, BUT
"provides a control dependency which will order future stores against the inc"
So, every store (not load) operation (I guess we are talking here only about store operations that
touch the same object, but I wonder how it is defined in terms of memory location?
(overlapping?) that comes inside "if refcount_inc_not_zero(){}" cause would only
be executed if functions returns true.
So, practically what we might "loose" here is any updates on the object protected by this
refcounter done by another CPU. But for this purpose we expect the developer to take
some other lock/memory barrier into use, right? We only care of incrementing the
refcount atomically and make sure we don't do anything with object unless it is ready for
us to be used. If yes, then I guess it might be a big
change for the code that previously relied on atomic-provided smp_mb() barriers and now
instead needs to take an explicit locks/barriers by itself.

2) atomic_** -> refcount_add_not_zero. Fortunately these are super rare and need to see
per each case dependent on actual atomic function substituted.

3) atomic_add() --> refcount_add(). This should not make any change since both do not
provide memory ordering at all, but there is a comment in the refcount.c that says that
refcount_add " provide a control dependency and thereby orders future stores".
How is it done given that refcount_add is void returning function??
I am fully confused with this one.

4) atomic_dec () --> refcount_dec (). This one we have discussed extensively
before. Again first one implies SMP-conditional general memory barrier (smp_mb()) on each
side of the actual operation. Second one only provides "release" ordering meaning that prior
both loads and stores must be completed before the operation.
So, is it correct to express the difference in this way:

atomic_dec (): refcount_dec ():

smp_mb(); smp_mb();
do_atomic_dec_operation; do_atomic_dec_operation;
smp_mb(); /*note no any barrier here! */
/ * we assume that we hold the lock
* or anything like this on object
* anyway, otherwise code is broken
* to start with */
5) atomic_dec_and_test () --> refcount_dec_and_test (). Same as 4)
but in addition we have a control flow barrier.
So, is it correct to express the difference in this way:

atomic_dec_and_test (): refcount_dec_and_test ():

smp_mb(); smp_mb();
if (do_atomic_dec_and_test;) { if (do_atomic_dec_and_test;){
/* 1 --> 0 transition happened */ /* 1 --> 0 transition happened */
smp_mb(); /* in refcounter case we assume
do_anything including free(); that we are the last user of object
so that no concurrent access can happen*/
/* control_flow_barrier here */ <------ which one btw????
/* only need to guarantee that we free after
reaching zero so we are good here */
} }

6) atomic_sub_and_test () --> refcount_sub_and_test (). Identical to 5)
7) atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one().
Should be identical to 4)


Lock functions such as refcount_dec_and_lock() & refcount_dec_and_mutex_lock()
Provide exactly the same guarantees as they atomic counterparts.

Is the above a correct view on this? I think that instead of putting this whole info
in the individual commits, how about making some documentation subsection
(in memory-barriers or atomics or on its own, don't know what is better) with
examples like above? Then in each individual commit I can point to the respective line
in the doc and say that here is a particular memory guarantee change and in the case
of this particular code it would apply to your put_pi_state() or whatever else function.
Otherwise keeping this info only in commits do not benefit any future users of refcount.

>
> > Also, I got exactly the same question from xfs maintainer,
> > so if we provide an API that we hope will be used correctly in the
> > future, we should have a way for people to verify it.
>
> I actually replied to Dave (but have not checked if there's further
> email on the thread). I object to the claim that the code is correct if
> he cannot explain the memory ordering.
>
> Aside from that; if/when he explain the ordering requirements of those
> sites and those do indeed require more than refcount_dec_and_test()
> provides we should add a comment exactly explaining the ordering along
> with additional barriers, such as smp_mb__after_atomic().
>
> > Maybe it is just me, but I would think that having a way to verify
> > that your code is ok with this refcounter-specific relaxed memory
> > ordering applies to any memory ordering requirements, refcounters are
> > just a subset with certain properties, so is then the full answer is
> > to figure out how to verify any memory ordering requirement of your
> > code?
>
> Yes; and this can be quite tricky, but undocumented ordering
> requirements are a bug that need to be fixed. Mostly the code is
> 'simple' and refcount really does the right and sufficient thing.
>
> If the maintainer knows or suspects more ordering is required he can
> help out; but for that to happen you have to, at the very least, do 1&2
> above.

As I said, I want to do more if we are going this way. That's why I am trying to
create some kind of type of easy to read doc that would explain people the
differences and won't necessary imply reading the whole memory-barrier.txt
doc (more on my reading of it below).

>
> > We can also document this logic in more docs or even maybe try to
> > create a better documentation for current memory ordering bits since
> > it is not the most easiest read at the moment. Otherwise this might be
> > just bad enough reason for many people to avoid refcount_t type if it
> > is just easier to tell "let's take just old atomic, we knew it somehow
> > worked before"....
>
> That's just voodoo programming; and while that is rife I have absolutely
> no sympathy for it. Memory ordering is hard, but that doesn't mean we
> should just blindly sprinkle memory barriers around until it 'works'.
>
> So while memory-barriers.txt is a longish document, it is readable with
> a bit of time and effort. There are also tools being developed that can
> help people validate their code.

Could you please point to the tools? Might help my understanding also.

So, I actually went and tried to __slowly__ read the memory-barriers.txt
which indeed makes a lot of sense when you read it in full (not partially
like it did it before). But the issue with this doc is not so much the length
but not always consequent way of giving the information to unprepared reader.
I have many examples that I have wondered for a while what is meant with
certain term or notion somewhere at the beginning of the doc
only to discover a fully specified answer somewhere after line 1800.
I do have many suggestions that might help to improve
the readability of the document but it would take me at least a day to write
them down (and it is going to be a long read for anyone),
so before I go do it, I want to know if anyone is interested to hear :)
Anyway this would be a separate exercise to do if people want.

Best Regards,
Elena.

>
> There are enough people around that actually get this stuff (it really
> is not _that_ hard, but it does require a functional brain) so if you
> can describe the problem with sufficient detail we can help out getting
> the ordering right (or state its broken :-).