Re: [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru devicesassigned to KVM guests

From: Stefan Hajnoczi
Date: Tue Nov 20 2012 - 08:40:59 EST


On Tue, Nov 20, 2012 at 06:31:48AM +0000, Pandarathil, Vijaymohan R wrote:
> Add support for error containment when a PCI pass-thru device assigned to a KVM
> guest encounters an error. This is for PCIe devices/drivers that support AER
> functionality. When the OS is notified of an error in a device either
> through the firmware first approach or through an interrupt handled by the AER
> root port driver, concerned subsystems are notified by invoking callbacks
> registered by these subsystems. The device is also marked as tainted till the
> corresponding driver recovery routines are successful.
>
> KVM module registers for a notification of such errors. In the KVM callback
> routine, a global counter is incremented to keep track of the error
> notification. Before each CPU enters guest mode to execute guest code,
> appropriate checks are done to see if the impacted device belongs to the guest
> or not. If the device belongs to the guest, qemu hypervisor for the guest is
> informed and the guest is immediately brought down, thus preventing or
> minimizing chances of any bad data being written out by the guest driver
> after the device has encountered an error.

I'm surprised that the hypervisor would shut down the guest when PCIe
AER kicks in for a pass-through device. Shouldn't we pass the AER event
into the guest and deal with it there?

The equivalent to this policy on physical hardware would be that the CPU
is reset or the machine is powered down on AER. That doesn't sound
right.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/