Re: [PATCH linux v3 3/9] xen: introduce xen_vcpu_id mapping

From: David Vrabel
Date: Wed Sep 07 2016 - 05:10:21 EST


On 07/09/16 10:07, Vitaly Kuznetsov wrote:
> Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
>
>> On Tue, 6 Sep 2016, Vitaly Kuznetsov wrote:
>>> Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
>>>
>>>> On Mon, 5 Sep 2016, Vitaly Kuznetsov wrote:
>>>>> Julien Grall <julien.grall@xxxxxxx> writes:
>>>>>
>>>>>> Hi Vitaly,
>>>>>>
>>>>>> On 26/07/16 13:30, Vitaly Kuznetsov wrote:
>>>>>>> It may happen that Xen's and Linux's ideas of vCPU id diverge. In
>>>>>>> particular, when we crash on a secondary vCPU we may want to do kdump
>>>>>>> and unlike plain kexec where we do migrate_to_reboot_cpu() we try booting
>>>>>>> on the vCPU which crashed. This doesn't work very well for PVHVM guests as
>>>>>>> we have a number of hypercalls where we pass vCPU id as a parameter. These
>>>>>>> hypercalls either fail or do something unexpected. To solve the issue
>>>>>>> introduce percpu xen_vcpu_id mapping. ARM and PV guests get direct mapping
>>>>>>> for now. Boot CPU for PVHVM guest gets its id from CPUID. With secondary
>>>>>>> CPUs it is a bit more trickier. Currently, we initialize IPI vectors
>>>>>>> before these CPUs boot so we can't use CPUID. Use ACPI ids from MADT
>>>>>>> instead.
>>>>>>>
>>>>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>>>>>>> ---
>>>>>>> Changes since v2:
>>>>>>> - Use uint32_t for xen_vcpu_id mapping [Julien Grall]
>>>>>>>
>>>>>>> Changes since v1:
>>>>>>> - Introduce xen_vcpu_nr() helper [David Vrabel]
>>>>>>> - Use ACPI ids instead of vLAPIC ids /2 [Andrew Cooper, Jan Beulich]
>>>>>>> ---
>>>>>>> arch/arm/xen/enlighten.c | 10 ++++++++++
>>>>>>> arch/x86/xen/enlighten.c | 23 ++++++++++++++++++++++-
>>>>>>> include/xen/xen-ops.h | 6 ++++++
>>>>>>> 3 files changed, 38 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>>>>> index 75cd734..fe32267 100644
>>>>>>> --- a/arch/arm/xen/enlighten.c
>>>>>>> +++ b/arch/arm/xen/enlighten.c
>>>>>>> @@ -46,6 +46,10 @@ struct shared_info *HYPERVISOR_shared_info = (void *)&xen_dummy_shared_info;
>>>>>>> DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
>>>>>>> static struct vcpu_info __percpu *xen_vcpu_info;
>>>>>>>
>>>>>>> +/* Linux <-> Xen vCPU id mapping */
>>>>>>> +DEFINE_PER_CPU(uint32_t, xen_vcpu_id) = U32_MAX;
>>>>>>> +EXPORT_PER_CPU_SYMBOL(xen_vcpu_id);
>>>>>>> +
>>>>>>> /* These are unused until we support booting "pre-ballooned" */
>>>>>>> unsigned long xen_released_pages;
>>>>>>> struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
>>>>>>> @@ -179,6 +183,9 @@ static void xen_percpu_init(void)
>>>>>>> pr_info("Xen: initializing cpu%d\n", cpu);
>>>>>>> vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
>>>>>>>
>>>>>>> + /* Direct vCPU id mapping for ARM guests. */
>>>>>>> + per_cpu(xen_vcpu_id, cpu) = cpu;
>>>>>>> +
>>>>>>
>>>>>> We did some internal testing on ARM64 with the latest Linux kernel
>>>>>> (4.8-rc4) and noticed that this patch is breaking SMP support. Sorry
>>>>>> for noticing the issue that late.
>>>>>
>>>>> Sorry for the breakage :-(
>>>>>
>>>>>>
>>>>>> This function is called on the running CPU whilst some code (e.g
>>>>>> init_control_block in drivers/xen/events/events_fifo.c) is executed
>>>>>> whilst preparing the CPU on the boot CPU.
>>>>>>
>>>>>> So xen_vcpu_nr(cpu) will always return 0 in this case and
>>>>>> init_control_block will fail to execute.
>>>>>>
>>>>>
>>>>> I see,
>>>>>
>>>>> CPU_UP_PREPARE event happens before xen_starting_cpu() is called.
>>>>>
>>>>>
>>>>>> I am not sure how to fix. I guess we could setup per_cpu(xen_vcpu_id,
>>>>>> *) in xen_guest_init. Any opinions?
>>>>>
>>>>> As we're not doing kexec on ARM we can fix the immediate issue. I don't
>>>>> know much about ARM and unfortunatelly I don't have a setup to test but
>>>>> it seems there is no early_per_cpu* infrastructure for ARM so we may fix
>>>>> it with the following:
>>>>>
>>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>>> index 3d2cef6..f193414 100644
>>>>> --- a/arch/arm/xen/enlighten.c
>>>>> +++ b/arch/arm/xen/enlighten.c
>>>>> @@ -170,9 +170,6 @@ static int xen_starting_cpu(unsigned int cpu)
>>>>> pr_info("Xen: initializing cpu%d\n", cpu);
>>>>> vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
>>>>>
>>>>> - /* Direct vCPU id mapping for ARM guests. */
>>>>> - per_cpu(xen_vcpu_id, cpu) = cpu;
>>>>> -
>>>>> info.mfn = virt_to_gfn(vcpup);
>>>>> info.offset = xen_offset_in_page(vcpup);
>>>>>
>>>>> @@ -330,6 +327,7 @@ static int __init xen_guest_init(void)
>>>>> {
>>>>> struct xen_add_to_physmap xatp;
>>>>> struct shared_info *shared_info_page = NULL;
>>>>> + int cpu;
>>>>>
>>>>> if (!xen_domain())
>>>>> return 0;
>>>>> @@ -380,7 +378,8 @@ static int __init xen_guest_init(void)
>>>>> return -ENOMEM;
>>>>>
>>>>> /* Direct vCPU id mapping for ARM guests. */
>>>>> - per_cpu(xen_vcpu_id, 0) = 0;
>>>>> + for_each_possible_cpu(cpu)
>>>>> + per_cpu(xen_vcpu_id, cpu) = cpu;
>>>>>
>>>>> xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
>>>>> if (xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
>>>>>
>>>>> (not tested, if we can't use for_each_possible_cpu() that early we'll
>>>>> have to check against NR_CPUS instead).
>>>>
>>>> Kind of defeat the purpose of xen_vcpu_id, but I guess it should work.
>>>>
>>>>> But unfortunatelly we'll have to get back to this in future. Turns out
>>>>> we need to know Xen's idea of vCPU id _before_ this vCPU starts
>>>>> executing code.
>>>>
>>>> Why?
>>>
>>> E.g. for FIFO event channels we currently call
>>> evtchn_fifo_cpu_notification() for CPU_UP_PREPARE event (and do a
>>> hypercall which requires us to know vcpu id) and this happens before the
>>> secodary cpu starts. I'm not sure this can't be changed in future.
>>>
>>>>
>>>>> On x86 we used ACPI_ID from MADT. Is there anything like
>>>>> it on ARM?
>>>>
>>>> MPIDR:
>>>>
>>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIHEBGFG.html
>>>>
>>>
>>> Thanks, but doesn't it work the same way CPUID instruction works? In
>>> that case we won't be able to use it to figure out other CPU's id.
>>
>> I don't know that much about cpuid, but the virtual MPIDR is constructed
>> from the vcpu id right now:
>>
>> v->arch.vmpidr = MPIDR_SMP | vcpuid_to_vaffinity(v->vcpu_id);
>>
>> [...]
>>
>> static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid)
>> {
>> register_t vaff;
>>
>> vaff = (vcpuid & 0x0f) << MPIDR_LEVEL_SHIFT(0);
>> vaff |= ((vcpuid >> 4) & MPIDR_LEVEL_MASK) << MPIDR_LEVEL_SHIFT(1);
>>
>> return vaff;
>> }
>
> This could work but only in case there is a way to get MPIDR for _other_
> cpu (e.g. CPU0 needs to get MPIDR of CPU1 when CPU1 is not yet runnning)
> or we'll have to change the machinery of how we bring up secondary CPUs
> - e.g. CPUn starts, writes its id somewhere and 'hangs' waiting for CPU0
> to set up event channels.

Perhaps we can set up event channels on the CPU being brought up? Local
irqs would need to have been disabled, but I don't see any other
restriction.

David
>