Re: [PATCH] vfio: Fix endianness handling for emulated BARs

From: Alexander Graf
Date: Tue Jun 24 2014 - 09:22:59 EST



On 24.06.14 15:01, Alexey Kardashevskiy wrote:
On 06/24/2014 10:52 PM, Alexander Graf wrote:
On 24.06.14 14:50, Alexey Kardashevskiy wrote:
On 06/24/2014 08:41 PM, Alexander Graf wrote:
On 24.06.14 12:11, Alexey Kardashevskiy wrote:
On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

Working on big endian being an accident may be a matter of perspective
:-)

The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.
Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual "cancelling" swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.

I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.
Sadly there isn't ... In the old day we didn't even have the "be"
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just "don't swap", they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects).

Calling it iowrite*_native is also an abuse of the namespace.
Next thing we know some common code
will legitimately use that name.
I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of "native
byte order" MMIOs iirc (though don't ask me for examples, I can't really
remember).

If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.
Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.
Is there actually any difference in generated code with this patch applied
and without? I would hope that iowrite..() is inlined and cancels out the
cpu_to_le..() calls that are also inlined?
iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.
The confusion depends on where you're coming from. If you happen to know
that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
It was like this (and this is just confusing):

iowrite32(le32_to_cpu(val), io + off);

What would make sense (according to you and I would understand this) is this:

iowrite32(cpu_to_le32(val), io + off);


Or I missed your point, did I?

No, you didn't miss it. I think for people who know how iowrite32() works the above is obvious. I find the fact that iowrite32() writes in LE always pretty scary though ;).

So IMHO we should either create new, generic iowrite helpers that don't do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.


Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/