Re: [PATCH v2] tools: memory-model: Make plain accesses carry dependencies

From: Boqun Feng
Date: Tue Dec 06 2022 - 15:53:05 EST


On Tue, Dec 06, 2022 at 12:46:58PM -0800, Boqun Feng wrote:
> On Mon, Dec 05, 2022 at 11:18:13AM -0500, stern@xxxxxxxxxxxxxxxxxxx wrote:
> > On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
> > > > Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency
> > >
> > > Oh, good question. A bit hard for me to write down the answer clearly
> > > (which some people will claim that I don't understand it well myself,
> > > but I beg to differ :) :( :) ).
>
> Nah, I think your answer is clear to me ;-)
>
> > >
> > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z fulfils
> > > the same role as storing something in a register and then using it in
> > > a subsequent computation; x ->ctrl y ->... z (and ->addr) don't. So
> > > it's not due to smart compilers, just the fact that the other two
> > > cases seem unrelated to the problem being solved, and including them
> > > might introduce some unsoundness (not that I have checked if they do).
>
> So it's about whether a value can have a dataflow from x to y, right? In
> that case registers and memory cells should be treated the same by
> compilers, therefore we can extend the dependencies.
> >
> > More can be said here. Consider the following simple example:
> >
> > void P0(int *x, int *y)
> > {
> > int r1, r2;
> > int a[10];
> >
> > r1 = READ_ONCE(*x);
> > a[r1] = 1;
> > r2 = a[r1];
> > WRITE_ONCE(*y, r2);
> > }
> >
> > There is an address dependency from the READ_ONCE to the plain store in
> > a[r1]. Then there is an rfi and a data dependency to the WRITE_ONCE.
> >
> > But in this example, the WRITE_ONCE is _not_ ordered after the
> > READ_ONCE, even though they are linked by (addr ; rfi ; data). The
> > compiler knows that the value of r1 does not change between the two
> > plain accesses, so it knows that it can optimize the code to be:
> >
> > r1 = READ_ONCE(*x);
> > r2 = 1;
> > WRITE_ONCE(*y, r2);
> > a[r1] = r2;
> >
> > And then the CPU can execute the WRITE_ONCE before the READ_ONCE. This
> > shows that (addr ; rfi) must not be included in the carry-deps relation.
> >
> > You may be able to come up with a similar argument for (ctrl ; rfi),
> > although it might not be quite as clear.
> >
>
> Thank you, Alan! One question though, can a "smart" compiler optimize
> out the case below, with the same logic?
>
> void P0(int *x, int *y, int *a)
> {
> int r1, r2;
>
> r1 = READ_ONCE(*x); // A
>
> *a = r1 & 0xffff; // B
>
> r2 = *a & 0xffff0000; // C
>
> WRITE_ONCE(*y, r2); // D
>
> }
>
> I think we have A ->data B ->rfi C ->data D, however a "smart" compiler
> can figure out that r2 is actually zero, right? And the code get
> optimized to:
>
> r1 = READ_ONCE(*x);
> r2 = 0;
> WRITE_ONCE(*y, r2);
> *a = r1 & 0xffff;
>
> and break the dependency.
>
> I know that our memory model is actually unware of the differences of
> syntatics dependencies vs semantics syntatics, so one may argue that in
> the (data; rfi) example above the compiler optimization is outside the
> scope of LKMM, but won't the same reasoning apply to the (addr; rfi)
> example from you? The WRITE_ONCE() _syntatically_ depends on load of
> a[r1], therefore even a "smart" compiler can figure out the value, LKMM

I guess it should be that r2 (i.e. the load of a[r1]) _syntatically_
depends on the value of r1.

Regards,
Boqun

> won't take that into consideration.
>
> Am I missing something subtle here?
>
> Regards,
> Boqun
>
> > Alan