Re: [PATCH v7 00/22] Support SDEI Virtualization

From: Gavin Shan
Date: Sun Jun 26 2022 - 21:19:14 EST


Hi Marc,

On 6/24/22 11:12 PM, Marc Zyngier wrote:
On Thu, 23 Jun 2022 07:11:08 +0100,
Gavin Shan <gshan@xxxxxxxxxx> wrote:
On 5/27/22 6:02 PM, Gavin Shan wrote:
This series intends to virtualize Software Delegated Exception Interface
(SDEI), which is defined by DEN0054C (v1.1). It allows the hypervisor to
deliver NMI-alike SDEI event to guest and it's needed by Async PF to
deliver page-not-present notification from hypervisor to guest. The code
and the required qemu changes can be found from:

https://developer.arm.com/documentation/den0054/c
https://github.com/gwshan/linux ("kvm/arm64_sdei")
https://github.com/gwshan/qemu ("kvm/arm64_sdei")

The design is quite strightforward by following the specification. The
(SDEI) events are classified into the shared and private ones according
to their scope. The shared event is system or VM scoped, but the private
event is vcpu scoped. This implementation doesn't support the shared
event because all the needed events are private. Besides, the critial
events aren't supported by the implementation either. It means all events
are normal in terms of priority.

There are several objects (data structures) introduced to help on the
event registration, enablement, disablement, unregistration, reset,
delivery and handling.

* kvm_sdei_event_handler
SDEI event handler, which is provided through EVENT_REGISTER
hypercall, is called when the SDEI event is delivered from
host to guest.
* kvm_sdei_event_context
The saved (preempted) context when SDEI event is delivered
for handling.
* kvm_sdei_vcpu
SDEI events and their states.

The patches are organized as below:

PATCH[01-02] Preparatory work to extend smccc_get_argx() and refactor
hypercall routing mechanism
PATCH[03] Adds SDEI virtualization infrastructure
PATCH[04-16] Supports various SDEI hypercalls and event handling
PATCH[17] Exposes SDEI capability
PATCH[18-19] Support SDEI migration
PATCH[20] Adds document about SDEI
PATCH[21-22] SDEI related selftest cases

The previous revisions can be found:

v6: https://lore.kernel.org/lkml/20220403153911.12332-4-gshan@xxxxxxxxxx/T/
v5: https://lore.kernel.org/kvmarm/20220322080710.51727-1-gshan@xxxxxxxxxx/
v4: https://lore.kernel.org/kvmarm/20210815001352.81927-1-gshan@xxxxxxxxxx/
v3: https://lore.kernel.org/kvmarm/20210507083124.43347-1-gshan@xxxxxxxxxx/
v2: https://lore.kernel.org/kvmarm/20210209032733.99996-1-gshan@xxxxxxxxxx/
v1: https://lore.kernel.org/kvmarm/20200817100531.83045-1-gshan@xxxxxxxxxx/


Copying Oliver's new email address (oliver.upton@xxxxxxxxx).

Please let me know if I need to rebase and repost the series.

My main issue with this series is that it is a solution in search of a
problem. It is only an enabler for Asynchronous Page Fault support,
and:

- as far as I know, the core Linux/arm64 maintainers have no plan to
support APF. Without it, this is a pointless exercise. And even with
it, this introduces a Linux specific behaviour in an otherwise
architectural hypervisor (something I'm quite keen on avoiding)

- It gives an incentive to other hypervisor vendors to add random crap
to the Linux mm subsystem, which is even worse. At this stage, we
might as well go back to the Xen PV days altogether.

- I haven't seen any of the KVM/arm64 users actually asking for the
APF horror, and the cloud vendors I directly asked had no plan to
use it, and not using it on their x86 systems either

- no performance data nor workloads that could help making an informed
decision have been disclosed, and the only argument in its favour
seems to be "but x86 has it" (hardly a compelling one)

Given the above, I don't see how to justify this series, as it has no
purpose on its own, no matter how well written it is.


Thank you for your time to review the series and provide comments. Long
time ago, I compare the features supported on x86 and arm64, to sort out
the gaps. Async page fault is one of the missed features. From that on,
I started to investigate x86's implementation and work on arm64's
implementation. It's the history why I continue to work on Async page
fault for arm64. It means there is no customer request, asking to support
Async page fault on arm64, on my side.

In order to support Async PF on arm64, there are two parts of changes,
which are related to kvm/arm64 and guest kernel. The service of Async
page fault won't be enabled if either kvm/arm64 or guest kernel doesn't
support it. The service is negotiated between host and guest. So I don't
think it would be a problem. It's true that Async page fault is only
beneficial to Linux host and Linux guest, until it gets supported on
other guest kernels.

SDEI implementation is following the specification. It's true that
Async PF isn't specified by arm64 architecture. However, it's also not
a architectural feature to x86 either. I guess the benefits count here.
The reason we need Async PF (and SDEI virtualization) is the benefit.

If I'm correct, Async PF has been used broadly on x86 because of
'post-copy live migration', which relies on userfaultfd. 'Async page fault'
is explicitly mentioned in its document (linux/Documentation/admin-guide/mm/userfaultfd.rst)
like below. It's the most important motivation to support Async PF.

Yeah, performance data is definitely helpful to measure the benefit,
especially for Async page fault on arm64. I used to revise both
serieses (SDEI virtualization and Async page fault) together, meaning
'Async page fault' series is revised if there are any code changes to
the series of 'SDEI virtualization', until I found it would be practical
to finialize 'SDEI virtualization' before working on 'Async page fault'.
It's why I don't post revised series of 'Async page fault' recently.
However, I think the performance data released in last year's KVM
forum is still relative. I certainly need to regain the performance
data when I continue to work on 'Async page fault' series after
'SDEI virutalization' is finalized.

https://static.sched.com/hosted_files/kvmforum2021/cb/sdei_apf_for_arm64_gavin.pdf
(In page 14 and 15, 41% to 68% improvement in live post-copy migration)


Extracted from linux/documentation/admin-guide/mm/userfaultfd.rst
------------------------------------------------------------------

QEMU/KVM
========

QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live
migration. Postcopy live migration is one form of memory
externalization consisting of a virtual machine running with part or
all of its memory residing on a different node in the cloud. The
``userfaultfd`` abstraction is generic enough that not a single line of
KVM kernel code had to be modified in order to add postcopy live
migration to QEMU.

Guest async page faults, ``FOLL_NOWAIT`` and all other ``GUP*`` features work
just fine in combination with userfaults. Userfaults trigger async
page faults in the guest scheduler so those guest processes that
aren't waiting for userfaults (i.e. network bound) can keep running in
the guest vcpus.

Thanks,
Gavin