Re: [PATCH v3] x86: svm: use kvm_fast_pio_in()

From: Radim KrÄmÃÅ
Date: Tue Mar 03 2015 - 15:42:39 EST


2015-03-03 13:48-0600, Joel Schopp:
> >> + unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
> > Shouldn't we handle writes in EAX differently than in AX and AL, because
> > of implicit zero extension.
> I don't think the implicit zero extension hurts us here, but maybe there
> is something I'm missing that I need understand. Could you explain this
> further?

According to APM vol.2, 2.5.3 Operands and Results, when using EAX,
we should zero upper 32 bits of RAX:

Zero Extension of Results. In 64-bit mode, when performing 32-bit
operations with a GPR destination, the processor zero-extends the 32-bit
result into the full 64-bit destination. Both 8-bit and 16-bit
operations on GPRs preserve all unwritten upper bits of the destination
GPR. This is consistent with legacy 16-bit and 32-bit semantics for
partial-width results.

Is IN not covered?

> >> + BUG_ON(!vcpu->arch.pio.count);
> >> + BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax));
> > (Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be
> > sufficient.)
> I prefer the checks that are there now after your last review,
> especially since surrounded by BUG_ON they only run on debug kernels.

BUG_ON is checked on essentially all kernels that run KVM.
(All distribution-based configs should have it.)

If we wanted to validate the size, then this is strictly better:
BUG_ON(vcpu->arch.pio.count != 1 || vcpu->arch.pio.size > sizeof(new_rax))

> >> + memcpy(&new_rax, vcpu, sizeof(new_rax));
> >> + trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size,
> >> + vcpu->arch.pio.count, vcpu->arch.pio_data);
> >> + kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
> >> + vcpu->arch.pio.count = 0;
> > I think it is better to call emulator_pio_in_emulated directly, like
> >
> > emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
> > vcpu->arch.pio.port, &new_rax, 1);
> > kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
> >
> > because we know that vcpu->arch.pio.count != 0.
> I think two extra lines of code in my patch vs your suggestion are worth
> it to a) reduce execution path length b) increase readability c) avoid
> breaking the abstraction by not checking the return code d) avoid any
> future bugs introduced by changes the function that would return a value
> other than 1.

True, it is horrible, the attached patch should have addressed (c) and
(d), and it could be inlined to match (a).

Pasting the same code creates bug opportunities when we forget to modify
all places. This class of problems can be harder to deal with, that (c)
and (d), because we can't simply print all callers.

> > Refactoring could avoid the weird vcpu->ctxt->vcpu conversion.
> > (A better name is always welcome.)
> The pointer chasing is making me dizzy. I'm not sure why
> emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does it
> immediately translate that to a vcpu and never use the x86_emulate_ctxt,
> why not pass the vcpu in the first place?

It is a part of x86_emulate_ops, where ctxt is more important ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/