Re: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on the GICR_VPENDBASER.Dirty bit

From: Marc Zyngier
Date: Wed Sep 16 2020 - 04:39:48 EST


On 2020-09-16 08:04, lushenming wrote:
Hi,

Our team just discussed this issue again and consulted our GIC hardware
design team. They think the RD can afford busy waiting. So we still think
maybe 0 is better, at least for our hardware.

In addition, if not 0, as I said before, in our measurement, it takes only
hundreds of nanoseconds, or 1~2 microseconds, to finish parsing the VPT
in most cases. So maybe 1 microseconds, or smaller, is more appropriate.
Anyway, 10 microseconds is too much.

But it has to be said that it does depend on the hardware implementation.

Exactly. And given that the only publicly available implementation is
a software model, I am reluctant to change "performance" related things
based on benchmarks that can't be verified and appears to me as a micro
optimization.

Besides, I'm not sure where are the start and end point of the total scheduling
latency of a vcpu you said, which includes many events. Is the parse time of
the VPT not clear enough?

Measure the time it takes from kvm_vcpu_load() to the point where the vcpu
enters the guest. How much, in proportion, do these 1/2/10ms represent?

Also, a better(?) course of action would maybe to consider whether we should
split the its_vpe_schedule() call into two distinct operations: one that
programs the VPE to be resident, and another that poll the Dirty bit *much
later* on the entry path, giving the GIC a chance to work in parallel with
the CPU on the entry path.

If your HW is a quick as you say it is, it would pretty much guarantee
a clear read of GICR_VPENDBASER without waiting.

M.
--
Jazz is not dead. It just smells funny...