Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory

From: Lorenzo Pieralisi
Date: Tue Sep 26 2023 - 12:13:08 EST


On Tue, Sep 26, 2023 at 02:52:13PM +0100, Catalin Marinas wrote:
> On Tue, Sep 26, 2023 at 10:31:38AM +0200, Lorenzo Pieralisi wrote:
> > Currently, KVM for ARM64 maps at stage 2 memory that is
> > considered device (ie using pfn_is_map_memory() to discern
> > between device memory and memory itself) with DEVICE_nGnRE
> > memory attributes; this setting overrides (as per the ARM
> > architecture [1]) any device MMIO mapping present at stage
> > 1, resulting in a set-up whereby a guest operating system
> > can't determine device MMIO mapping memory attributes on its
> > own but it is always overriden by the KVM stage 2 default.
> >
> > This set-up does not allow guest operating systems to map
> > device memory on a page by page basis with combined attributes
> > other than DEVICE_nGnRE,
>
> Well, it also has the option of DEVICE_nGnRnE ;).

That's true - we have to fix it up.

"This set-up does not allow guest operating systems to choose
device memory attributes on a page by page basis independently
from KVM stage 2 mappings,..."

> > which turns out to be an issue in that
> > guest operating systems (eg Linux) may request to map
> > devices MMIO regions with memory attributes that guarantee
> > better performance (eg gathering attribute - that for some
> > devices can generate larger PCIe memory writes TLPs)
> > and specific operations (eg unaligned transactions) such as
> > the NormalNC memory type.
> >
> > The default device stage 2 mapping was chosen in KVM
> > for ARM64 since it was considered safer (ie it would
> > not allow guests to trigger uncontained failures
> > ultimately crashing the machine) but this turned out
> > to be imprecise.
> >
> > Failures containability is a property of the platform
> > and is independent from the memory type used for MMIO
> > device memory mappings (ie DEVICE_nGnRE memory type is
> > even more problematic than NormalNC in terms of containability
> > since eg aborts triggered on loads cannot be made synchronous,
> > which make them harder to contain); this means that,
> > regardless of the combined stage1+stage2 mappings a
> > platform is safe if and only if device transactions cannot trigger
> > uncontained failures; reworded, the default KVM device
> > stage 2 memory attributes play no role in making device
> > assignment safer for a given platform and therefore can
> > be relaxed.
> >
> > For all these reasons, relax the KVM stage 2 device
> > memory attributes from DEVICE_nGnRE to NormalNC.
> >
> > This puts guests in control (thanks to stage1+stage2
> > combined memory attributes rules [1]) of device MMIO
> > regions memory mappings, according to the rules
> > described in [1] and summarized here ([(S1) = Stage1][(S2) = Stage2]):
> >
> > �S1���������� |�� S2��������� |� Result
> > �NORMAL-WB����|� NORMAL-NC����|� NORMAL-NC
> > �NORMAL-WT����|� NORMAL-NC����|� NORMAL-NC
> > �NORMAL-NC����|� NORMAL-NC����|� NORMAL-NC
> > �DEVICE<attr>�|� NORMAL-NC����|� DEVICE<attr>
>
> Not sure what's wrong with my font setup as I can't see the above table
> but I know it from the Arm ARM.
>
> Anyway, the text looks fine to me. Thanks for putting it together
> Lorenzo.
>
> One thing not mentioned here is that pci-vfio still maps such memory as
> Device-nGnRnE in user space and relaxing this potentially creates an
> alias. But such alias is only relevant of both the VMM and the VM try to
> access the same device which I doubt is a realistic scenario.

I can update the log and repost, fixing the comment above as well (need
to check what's wrong with the table characters).

Thanks,
Lorenzo