Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.

From: Wei Xu
Date: Thu Jun 28 2018 - 06:21:10 EST


Hi James,

On 2018/6/28 9:45, James Morse wrote:
> Hi Wei,
>
> On 27/06/18 14:26, Wei Xu wrote:
>> Sorry, I should highlight that I have only updated the default value
>> of CONFIG_NR_CPUS by menuconfig in the previous mail.
>> That is why it showed dirty.
>
> (menuconfig changes don't show up like this)

Thanks!
Sorry, yes, you are right.
I did not see dirty after I reset the proc.S.

>
>
> More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
> VMIDs does work with KVM, its just going to trigger rollover frequently.
>

No, we just ran one VM.

> Just to check, what kernel version is the host running? Does it have commit
> f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
> (looks like that went in as a fix for v4.17-rc3)

Yes, the host is runing 4.18-rc2 as the guest including above commit.

>
> Are you running (lots) of other VMs whenever this happens? Do they have multiple
> vcpus? (I'm thinking of the scenario in that patch's description)

No, we just ran one VM with 1 cpu.

>
> Is the host system otherwise idle when this happens?
> (If not, can you reproduce the issue without exhausting the VMIDs?)
>
>
> It may be that writing back the page-table entries with the MMU off, and
> changing the cache maintenance are just changing the timing of something else.
>

Yes, maybe. Now we are debugging with the SoC guys together.
Thanks!

Best Regards,
Wei

>
> Thanks,
>
> James
>
> .
>