Re: [PATCH v2 1/3] x86/tdx: Add TDX Guest event notify interrupt support

From: Huang, Kai
Date: Fri Apr 14 2023 - 09:34:51 EST


On Wed, 2023-04-12 at 20:41 -0700, Kuppuswamy Sathyanarayanan wrote:
> Host-guest event notification via configured interrupt vector is useful
> in cases where a guest makes an asynchronous request and needs a
> callback from the host to indicate the completion or to let the host
> notify the guest about events like device removal. One usage example is,
> callback requirement of GetQuote asynchronous hypercall.
>
> In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the
> guest to specify which interrupt vector to use as an event-notify
> vector from the VMM. Details about the SetupEventNotifyInterrupt
> hypercall can be found in TDX Guest-Host Communication Interface
> (GHCI) Specification, section "VP.VMCALL<SetupEventNotifyInterrupt>".
>
> As per design, VMM will post the event completion IRQ using the same
> CPU on which SetupEventNotifyInterrupt hypercall request is received.
> So allocate an IRQ vector from "x86_vector_domain", and set the CPU
> affinity of the IRQ vector to the CPU on which
> SetupEventNotifyInterrupt hypercall is made.
>
> Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb()
> interfaces to allow drivers register/unregister event noficiation
^
to register/unregister
> handlers.
>
>

[...]

> +/* Reserve an IRQ from x86_vector_domain for TD event notification */
> +static int __init tdx_event_irq_init(void)
> +{
> + struct irq_alloc_info info;
> + struct irq_cfg *cfg;
> + int irq;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> + return 0;
> +
> + init_irq_alloc_info(&info, NULL);
> +
> + /*
> + * Event notification vector will be delivered to the CPU
> + * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
^
on which (to be consistent with the changelog)

> + * So set the IRQ affinity to the current CPU (CPU 0).
> + */
> + info.mask = cpumask_of(0);

I think we should use smp_processor_id() to get the CPU id even for BSP.

> +
> + irq = irq_domain_alloc_irqs(x86_vector_domain, 1, cpu_to_node(0), &info);
> + if (irq <= 0) {
> + pr_err("Event notification IRQ allocation failed %d\n", irq);
> + return -EIO;
> + }
> +
> + irq_set_handler(irq, handle_edge_irq);
> +
> + /* Since the IRQ affinity is set, it cannot be balanced */
> + if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING,
> + "tdx_event_irq", NULL)) {
> + pr_err("Event notification IRQ request failed\n");
> + goto err_free_domain_irqs;
> + }

Firstly, this comment isn't right. The @info->mask is only used to allocate the
vector of the IRQ, but it doesn't have anything to do with IRQ affinity.
irq_domain_alloc_irqs() will set the IRQ to have the default affinity in fact.

The comment should be something like below instead:

/*
* The IRQ cannot be migrated because VMM always injects the vector
* event to the cpu on which the SetupEventNotifyInterrupt TDVMCALL
* is called. Set the IRQ with IRQF_NOBALANCING to prevent its
affinity
* from being changed.
*/

That also being said, you actually haven't set IRQ's affinity to the BSP yet
before request_irq(). IIUC you can either:

1) Explicitly call irq_set_affinity() after irq_domain_alloc_irqs() to set
affinity to BSP; or

2) Use __irq_domain_alloc_irqs(), which allows you to set the affinity directly,
to allocate the IRQ instead of irq_domain_alloc_irqs().

struct irq_affinity_desc desc;

cpumask_set_cpu(smp_processor_id(), &desc.mask);

irq = __irq_domain_alloc_irqs(..., &desc);

Personally I think 2) is more straightforward.

Using 2) also allows you to alternatively set desc.is_managed to 1 to set the
IRQ as kernel managed. This will prevent userspace from changing IRQ affinity.
Kernel can still change the affinity, though, but no other kernel code will do
that.

So both kernel managed affinity IRQ and IRQF_NOBALANCING should work. I have no
opinion on this.

And IIUC if you already set IRQ's affinity to BSP before request_irq(), then you
don't need to do:

info.mask = cpumask_of(0);

before allocating the IRQ as the vector will be allocated in request_irq() on
the BSP anyway.


> +
> + cfg = irq_cfg(irq);
> +
> + /*
> + * Since tdx_event_irq_init() is triggered via early_initcall(),
> + * it will called before secondary CPUs bringup. Since there is
> + * only one CPU, it complies with the requirement of executing
> + * the TDVMCALL_SETUP_NOTIFY_INTR hypercall on the same CPU where
> + * the IRQ vector is allocated.

IMHO this is unnecessary complicated.

In fact, IMHO we can just have one simple comment at the very beginning to
explain the whole thing:

/*
* TDX guest uses SetupEventNotifyInterrupt TDVMCALL to allow the
* hypervisor to notify the TDX guest when needed, for instance,
* when VMM finishes the GetQuote request from the TDX guest.  
*
* The VMM always notifies the TDX guest via the vector specified in
the
* SetupEventNotifyInterrupt TDVMCALL to the cpu on which the TDVMCALL
* is called. For simplicity, just allocate an IRQ (and a vector) 
* directly from x86_vector_domain for such notification and pin the
* IRQ to the BSP.
*/

And then all the code follows.

> + *
> + * Register callback vector address with VMM. More details
> + * about the ABI can be found in TDX Guest-Host-Communication
> + * Interface (GHCI), sec titled
> + * "TDG.VP.VMCALL<SetupEventNotifyInterrupt>".
> + */
> + if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) {
> + pr_err("Event notification hypercall failed\n");
> + goto err_free_irqs;
> + }
> +
> + tdx_event_irq = irq;
> +
> + return 0;
> +
> +err_free_irqs:
> + free_irq(irq, NULL);
> +err_free_domain_irqs:
> + irq_domain_free_irqs(irq, 1);
> +
> + return -EIO;
> +}
> +early_initcall(tdx_event_irq_init)
> +
>

[...]