Re: [PATCH v2 1/1] xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32

From: Boris Ostrovsky
Date: Fri Nov 05 2021 - 16:41:53 EST



On 11/4/21 1:59 PM, Dongli Zhang wrote:
Hi Boris,

On 11/1/21 10:34 AM, Boris Ostrovsky wrote:
On 10/27/21 9:25 PM, Dongli Zhang wrote:
The sched_clock() can be used very early since
commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from
0"), kdump kernel in Xen HVM guest may panic at very early stage when
accessing &__this_cpu_read(xen_vcpu)->time as in below:

setup_arch()
  -> init_hypervisor_platform()
      -> x86_init.hyper.init_platform = xen_hvm_guest_init()
          -> xen_hvm_init_time_ops()
              -> xen_clocksource_read()
                  -> src = &__this_cpu_read(xen_vcpu)->time;

This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.

However, when Xen HVM guest panic on vcpu >= 32, since
xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.

This patch delays xen_hvm_init_time_ops() to later in
xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
registered when the boot vcpu is >= 32.

Another option is to always delay xen_hvm_init_time_ops() for any vcpus
(including vcpu=0). Since to delay xen_hvm_init_time_ops() may lead to
clock backward issue,

This is referring to
https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg01516.html I
assume?
No.

So far there are clock forward (e.g., from 0 to 65345) issue and clock backward
issue (e.g., from 2.432 to 0).

The clock forward issue can be resolved by above link to enforce clock update
during vcpu info registration. However, so far I am only able to reproduce clock
forward when "taskset -c 33 echo c > /proc/sysrq-trigger".

That is, I am not able to see any clock forward drift during regular boot (on
CPU=0), without the fix at hypervisor side.

The clock backward issue is because the native clock source is used if we delay
the initialization of xen clock source. We will see a backward when the source
is switched from native to xen.


  it is preferred to avoid that for regular boot (The
pv_sched_clock=native_sched_clock() is used at the very beginning until
xen_sched_clock() is registered). That requires to adjust
xen_sched_clock_offset. That's why we only delay xen_hvm_init_time_ops()
for vcpu>=32.

We delay only on VCPU>=32 because we want to avoid the clock going backwards due
to hypervisor problem pointed to be the link above, not because we need to
adjust xen_sched_clock_offset (which we could if we wanted).
I will add that.

Just to clarify that so far I am not able to reproduce the clock backward issue
during regular boot (on CPU=0), when the fix is not available at hypervisor side.


FTR, Dongli and I had a chat and he is going to provide another version of the patch where time initialization is deferred for all vcpus.



-boris