Re: AMD erratum 665 on f15h processor?

From: Borislav Petkov
Date: Mon Dec 18 2017 - 16:05:57 EST


When you git reply, please hit reply-to-all in your mail client so that
mailing lists get CCed too.

On Mon, Dec 18, 2017 at 07:54:52PM +0300, Andrew Randrianasulu wrote:
> Ð ÑÐÐÐÑÐÐÐÐ ÐÑ Monday 18 December 2017 16:22:15 ÐÑ ÐÐÐÐÑÐÐÐ:
> > + kvm ML.
> >
> > On Mon, Dec 18, 2017 at 06:01:21AM +0300, Andrew Randrianasulu wrote:
> > > Ð ÑÐÐÐÑÐÐÐÐ ÐÑ Sunday 17 December 2017 23:52:05 ÐÑ ÐÐÐÐÑÐÐÐ:
> > > > On Sun, Dec 17, 2017 at 12:04:28PM +0300, Andrew Randrianasulu wrote:
> > > > > Hello!
> > > > >
> > > > > I was trying to investigate why all my old kernels can't be booted on
> > > > > my relatively new machine. Kernels 4.10+ naturally boot - I use
> > > > > 4.14.3 right now - but old kernels die early ...
> > > > >
> > > > > After some digging I found this
> > > > > https://patchwork.kernel.org/patch/9311567/
> > > > >
> > > > > Patch talk about family 12h, but my machine has this CPU:
> > > > >
> > > > > [ 0.056000] smpboot: CPU0: AMD FX(tm)-4300 Quad-Core Processor
> > > > > (family: 0x15, model: 0x2, stepping: 0x0)
> > > > > [ 0.056000] Performance Events: Fam15h core perfctr, AMD PMU
> > > > > driver.
> > > >
> > > > Yes, your machine is not affected by that erratum. So far so good.
> > > >
> > > > The rest of your mail I have hard time understanding: you're talking
> > > > about old kernels not booting on a new machine but then you paste a
> > > > qemu 32-bit guest kernel boot log and after that I'm lost.
> > > >
> > > > Perhaps you should try again by explaining in detail what exactly
> > > > you're trying to do and how exactly you're going about doing that...
> > >
> > > Hi, Borislav!
> > >
> > > I was trying to boot few self-made liveCD/DVDs - they use self-compiled
> > > kernels in 3.2-4.2 range. None of those old disks boots in qemu if I set
> > > it to cpu type 'host'. I have whole collection of old kernels since 2011,
> > > and none work anymore ! Even older CD with 2.6.23.something plainly
> > > rebooted after kernel and initrd were loaded by isolinux on physical
> > > machine! But 2.6.27.9 worked at least in qemu (not really want to reboot
> > > machine due to some stuff in tmpfs). So, because 4.2.0-i486 was my
> > > previous failsafe kernel, and it most likely will not work anymore - I
> > > guess I will use 4.12.0-x64.. I was just trying to find any change
> > > explaining this error, and your fix was closer I was able to find in this
> > > time interval (2015-2017). May be it was just some unrelated purely
> > > software bug in amd detection code.. I spend some time trying to figure
> > > out how to copy/paste from qemu, finally -curses interface worked.
> > >
> > > I think I missed this misbehavior because I mostly used just qemu,
> > > without -cpu host (but with -enable-kvm), so it worked without problems.
> >
> > So -cpu host means:
> >
> > x86 host KVM processor with all supported host features (only
> > available in KVM mode)
> >
> > which would theoretically mean that those guest kernel configs shouldn't
> > boot on the baremetal box either, if they fail on the guest.
> >
> > But who knows what's happening.
> >
> > You can give me a guest kernel .config of a kernel which fails along
> > with the exact qemu cmdline to try out here.
>
> .config attached.
>
> for reproducting just launch qemu like this:
>
> qemu-system-i386 -kernel /home/admin/slax-build/boot/vmlinuz -cpu
> host --enable-kvm (just tried).
>
> Of course replace path to kernel image with your own. I can also attach binary
> image, but I think it will be of little use for you.....

Nah, I built it using your .config.

So my guest stops very early in the BIOS with

"Failed to allocate space for phdrs

-- System halted."

Then I looked at this:

https://bugzilla.kernel.org/show_bug.cgi?id=114671

and there's a patch

https://bugzilla.kernel.org/attachment.cgi?id=209601&action=diff&collapsed=&headers=1&format=raw

With it, it booted a bit further. But I still couldn't see any output.

So I booted with my cmdline to see more output and it did say:

general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-i486+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
task: c05b9a80 ti: c05b2000 task.ti: c05b2000
EIP: 0060:[<c010e390>] EFLAGS: 00210293 CPU: 0
EIP is at cpu_has_amd_erratum+0x24/0xb0
EAX: 00210bf7 EBX: 00000001 ECX: c0010140 EDX: c044ccf4
ESI: c0616900 EDI: c044ccf8 EBP: c05b3f68 ESP: c05b3f58
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: ffc77000 CR3: 006ae000 CR4: 00040690
Stack:
02008140 00000000 c0616900 00000000 c05b3fa8 c010ec8b f5001d80 0000001e
00000000 00000000 00000009 00000010 00000000 c0616900 00000000 c05b3fa8
c010cf58 c0616900 c0616900 c061695c c05b3fc8 c010d156 c061698b c061695c
Call Trace:
[<c010ec8b>] init_amd+0x5ee/0x631
[<c010cf58>] ? get_cpu_cap+0x121/0x126
[<c010d156>] identify_cpu+0x1f9/0x37d
[<c0624a18>] identify_boot_cpu+0xd/0x80
[<c0624abd>] check_bugs+0x8/0x35
[<c061ea42>] start_kernel+0x32a/0x339
[<c061e2c2>] i386_start_kernel+0x8c/0x90
Code: cf 5b c0 89 e5 5d c3 55 89 e5 57 56 53 51 89 c6 8b 1a 8d 7a 04 81 fb ff ff 00 00 77 57 8b 40 2c 0f ba e0 09 73 4e b9 40 01 01 c0 <0f> 32 89 45 f0 89 d8 89 d1 99 39 ca 77 3b 72 05 3b 5d f0 73 34
EIP: [<c010e390>] cpu_has_amd_erratum+0x24/0xb0 SS:ESP 0068:c05b3f58
---[ end trace 7fb9e71b486a229a ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

Which is exactly like the splat you've posted and that fails:

Code: cf 5b c0 89 e5 5d c3 55 89 e5 57 56 53 51 89 c6 8b 1a 8d 7a 04 81 fb ff ff 00 00 77 57 8b 40 2c 0f ba e0 09 73 4e b9 40 01 01 c0 <0f> 32 89 45 f0 89 d8 89 d1 99 39 ca 77 3b 72 05 3b 5d f0 73 34
All code
========
0: cf iret
1: 5b pop %rbx
2: c0 89 e5 5d c3 55 89 rorb $0x89,0x55c35de5(%rcx)
9: e5 57 in $0x57,%eax
b: 56 push %rsi
c: 53 push %rbx
d: 51 push %rcx
e: 89 c6 mov %eax,%esi
10: 8b 1a mov (%rdx),%ebx
12: 8d 7a 04 lea 0x4(%rdx),%edi
15: 81 fb ff ff 00 00 cmp $0xffff,%ebx
1b: 77 57 ja 0x74
1d: 8b 40 2c mov 0x2c(%rax),%eax
20: 0f ba e0 09 bt $0x9,%eax
24: 73 4e jae 0x74
26: b9 40 01 01 c0 mov $0xc0010140,%ecx
2b:* 0f 32 rdmsr <-- trapping instruction
2d: 89 45 f0 mov %eax,-0x10(%rbp)
30: 89 d8 mov %ebx,%eax
32: 89 d1 mov %edx,%ecx
34: 99 cltd
35: 39 ca cmp %ecx,%edx
37: 77 3b ja 0x74
39: 72 05 jb 0x40
3b: 3b 5d f0 cmp -0x10(%rbp),%ebx
3e: 73 34 jae 0x74

because it tries to read from a non-existent MSR - 0xc0010140 - and
maybe it is because of the -cpu host emulation or so but those MSRs do
get virtualized, see

2b036c6b861d ("KVM: SVM: Add support for AMD's OSVW feature in guests")

but I'd refer to the kvm/qemu people to explain what the deal here
exactly is.

What I do, is use -cpu Opteron_G5 which is also F15h and that works.
Oh, and I'd use 64-bit kernels - 32-bit is not really being tested as
extensively.

HTH.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.