Re: [PATCH] compiler.h: Clarify comment about the need for barrier_data()

From: Arvind Sankar
Date: Fri Oct 16 2020 - 09:10:31 EST


On Fri, Oct 16, 2020 at 08:13:44AM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 15 October 2020 23:01
> ,,,
> > I think it's helpful to have the more detailed explanation about
> > register variables -- at first glance, it's a bit mystifying as to why
> > the compiler would think that the asm can't access the stack. Spilling
> > registers to the stack is actually an undesirable side-effect of the
> > workaround.
>
> That is the very bit that just confuses things.
> The data the memzero_explictit() is trying to clear is (probably)
> on-stack already - it won't be in registers.
>

Are you saying the explanation is confusing things?

What I think is confusing is the fact that the compiler believes that an
asm with a memory clobber cannot access a variable that happens to be in
memory, and the comment is explaining how the compiler came to that
conclusion. The comment is already saying that this applies to LLVM
(unlike GCC) even if the variable isn't actually in registers.

> If it were in registers you wouldn't need the memset().

There's obviously no guarantee of where the compiler decided to keep the
variables. This isn't so clear-cut: for SHA, there is a 256-byte array
that you can be pretty sure will be in memory, but there are also 10 u32
variables which may or may not be in registers depending on how many
registers the arch has and how clever the compiler was in allocating
them.

>
> Actually I suspect that the memset() is inlined so that is
> just assigns zeros to all the variables.
> This will be done using 'virtual registers' that cache the
> on-stack value.
> You then need to do something to force the instructions to flush
> the 'virtual registers' back to stack to be generated.

This is definitely getting too much into the weeds. What the compiler
knows is that memset does nothing other than writing to the variable,
and if the variable is never used after that, then the memset can be
eliminated. Whether the memset ends up getting inlined or not is not
relevant here: clang doesn't inline it, for eg.