Re: [PATCH v2 2/2] Intel-IOMMU, intr-remap: source-id checking

From: Ingo Molnar
Date: Tue May 19 2009 - 07:51:29 EST



* Weidong Han <weidong.han@xxxxxxxxx> wrote:

> To support domain-isolation usages, the platform hardware must be
> capable of uniquely identifying the requestor (source-id) for each
> interrupt message. Without source-id checking for interrupt
> remapping , a rouge guest/VM with assigned devices can launch
> interrupt attacks to bring down anothe guest/VM or the VMM itself.
>
> This patch adds source-id checking for interrupt remapping, and
> then really isolates interrupts for guests/VMs with assigned
> devices.
>
> Because PCI subsystem is not initialized yet when set up IOAPIC
> entries, use read_pci_config_byte to access PCI config space
> directly.
>
> Signed-off-by: Weidong Han <weidong.han@xxxxxxxxx>
> ---
> arch/x86/kernel/apic/io_apic.c | 6 +++
> drivers/pci/intr_remapping.c | 90 ++++++++++++++++++++++++++++++++++++++-
> drivers/pci/intr_remapping.h | 2 +
> include/linux/dmar.h | 11 +++++
> 4 files changed, 106 insertions(+), 3 deletions(-)

Code structure looks nice now. (and i susect you have tested this on
real and relevant hardware?) I've Cc:-ed Eric too ... does this
direction look good to you too Eric?

Have a few minor nits only:

> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 30da617..3d10c68 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1559,6 +1559,9 @@ int setup_ioapic_entry(int apic_id, int irq,
> irte.vector = vector;
> irte.dest_id = IRTE_DEST(destination);
>
> + /* Set source-id of interrupt request */
> + set_ioapic_sid(&irte, apic_id);
> +
> modify_irte(irq, &irte);
>
> ir_entry->index2 = (index >> 15) & 0x1;
> @@ -3329,6 +3332,9 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_ms
> irte.vector = cfg->vector;
> irte.dest_id = IRTE_DEST(dest);
>
> + /* Set source-id of interrupt request */
> + set_msi_sid(&irte, pdev);
> +
> modify_irte(irq, &irte);
>
> msg->address_hi = MSI_ADDR_BASE_HI;
> diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c
> index 946e170..9ef7b0d 100644
> --- a/drivers/pci/intr_remapping.c
> +++ b/drivers/pci/intr_remapping.c
> @@ -10,6 +10,8 @@
> #include <linux/intel-iommu.h>
> #include "intr_remapping.h"
> #include <acpi/acpi.h>
> +#include <asm/pci-direct.h>
> +#include "pci.h"
>
> static struct ioapic_scope ir_ioapic[MAX_IO_APICS];
> static int ir_ioapic_num;
> @@ -405,6 +407,61 @@ int free_irte(int irq)
> return rc;
> }
>
> +int set_ioapic_sid(struct irte *irte, int apic)
> +{
> + int i;
> + u16 sid = 0;
> +
> + if (!irte)
> + return -1;
> +
> + for (i = 0; i < MAX_IO_APICS; i++)
> + if (ir_ioapic[i].id == apic) {
> + sid = (ir_ioapic[i].bus << 8) | ir_ioapic[i].devfn;
> + break;
> + }

Please generally put extra curly braces around each multi-line loop
body. One reason beyond readability is robustness: the above
structure can be easily extended in a buggy way via [mockup patch
hunk]:

> sid = (ir_ioapic[i].bus << 8) | ir_ioapic[i].devfn;
> break;
> }
> + if (!sid)
> + break;

And note that if this slips in by accident how unobvious this bug is
during patch review - the loop head context is not present in the
3-line default context and the code "looks" correct at a glance.

With extra braces, such typos or mismerges:

> }
> }
> + if (!sid)
> + break;

stick out during review like a sore thumb :-)

> + if (sid == 0) {
> + printk(KERN_WARNING "Failed to set source-id of "
> + "I/O APIC (%d), because it is not under "
> + "any DRHD\n", apic);
> + return -1;
> + }

please try to keep kernel messages on a single line - even if
checkpatch complains. Also, it's a good idea to use pr_warning(),
it's shorter by 8 characters.

> +
> + irte->svt = 1; /* requestor ID verification SID/SQ */
> + irte->sq = 0; /* comparing all 16-bit of SID */
> + irte->sid = sid;

this is a borderline suggestion:

Note how you already lined up the _comments_ vertically, so you did
notice that it makes sense to align vertically. The same visual
arguments can be made for the initialization itself too:

> +
> + irte->svt = 1; /* requestor ID verification SID/SQ */
> + irte->sq = 0; /* comparing all 16-bit of SID */
> + irte->sid = sid;
> +
> + return 0;

But ... it might make even more sense to introduce a set_irte()
helper method, so it can all be written in a single line as:

set_irte(irte, 1, 0, sid);

and explain common values in the set_irte() function's description -
that way those comments above (and below) dont have to be made at
the usage sites.

> +}
> +
> +int set_msi_sid(struct irte *irte, struct pci_dev *dev)
> +{
> + struct pci_dev *tmp;
> +
> + if (!irte || !dev)
> + return -1;
> +
> + tmp = pci_find_upstream_pcie_bridge(dev);

variable names like 'tmp' are only used if they are _really_ short
in scope. Here a much better name would be 'bridge'.

> + if (!tmp) { /* PCIE device or integrated PCI device */
> + irte->svt = 1; /* verify requestor ID verification SID/SQ */
> + irte->sq = 0; /* comparing all 16-bit of SID */
> + irte->sid = (dev->bus->number << 8) | dev->devfn;
> + return 0;
> + }
> +
> + if (tmp->is_pcie) { /* this is a PCIE-to-PCI/PCIX bridge */
> + irte->svt = 2; /* verify request ID verification SID */
> + irte->sid = (tmp->bus->number << 8) | dev->bus->number;

is irte->sq left alone intentionally?

> + } else { /* this is a legacy PCI bridge */
> + irte->svt = 1; /* verify requestor ID verification SID/SQ */
> + irte->sq = 0; /* comparing all 16-bit of SID */
> + irte->sid = (tmp->bus->number << 8) | tmp->devfn;
> + }
> + if (tmp->is_pcie) { /* this is a PCIE-to-PCI/PCIX bridge */
> + irte->svt = 2; /* verify request ID verification SID */
> + irte->sid = (tmp->bus->number << 8) | dev->bus->number;

here too?

> + } else { /* this is a legacy PCI bridge */
> + irte->svt = 1; /* verify requestor ID verification SID/SQ */
> + irte->sq = 0; /* comparing all 16-bit of SID */
> + irte->sid = (tmp->bus->number << 8) | tmp->devfn;
> + }
> +
> + return 0;
> +}
> +
> static void iommu_set_intr_remapping(struct intel_iommu *iommu, int mode)
> {
> u64 addr;
> @@ -610,6 +667,35 @@ error:
> return -1;
> }
>
> +static void ir_parse_one_ioapic_scope(struct acpi_dmar_device_scope *scope,
> + struct intel_iommu *iommu)
> +{
> + struct acpi_dmar_pci_path *path;
> + u8 bus;
> + int count;
> +
> + bus = scope->bus;
> + path = (struct acpi_dmar_pci_path *)(scope + 1);
> + count = (scope->length - sizeof(struct acpi_dmar_device_scope))
> + / sizeof(struct acpi_dmar_pci_path);
> +
> + while (--count > 0) {
> + /*
> + * Access PCI directly due to the PCI
> + * subsystem isn't initialized yet.
> + */
> + bus = read_pci_config_byte(bus, path->dev, path->fn,
> + PCI_SECONDARY_BUS);
> + path++;
> + }
> +
> + ir_ioapic[ir_ioapic_num].bus = bus;
> + ir_ioapic[ir_ioapic_num].devfn = PCI_DEVFN(path->dev, path->fn);
> + ir_ioapic[ir_ioapic_num].iommu = iommu;
> + ir_ioapic[ir_ioapic_num].id = scope->enumeration_id;

this too can be aligned vertically.

> + ir_ioapic_num++;
> +}
> +
> static int ir_parse_ioapic_scope(struct acpi_dmar_header *header,
> struct intel_iommu *iommu)
> {
> @@ -634,9 +720,7 @@ static int ir_parse_ioapic_scope(struct acpi_dmar_header *header,
> " 0x%Lx\n", scope->enumeration_id,
> drhd->address);
>
> - ir_ioapic[ir_ioapic_num].iommu = iommu;
> - ir_ioapic[ir_ioapic_num].id = scope->enumeration_id;
> - ir_ioapic_num++;
> + ir_parse_one_ioapic_scope(scope, iommu);

how much nicer this helper looks like now!

> }
> start += scope->length;
> }
> diff --git a/drivers/pci/intr_remapping.h b/drivers/pci/intr_remapping.h
> index ca48f0d..dd35780 100644
> --- a/drivers/pci/intr_remapping.h
> +++ b/drivers/pci/intr_remapping.h
> @@ -3,6 +3,8 @@
> struct ioapic_scope {
> struct intel_iommu *iommu;
> unsigned int id;
> + u8 bus; /* PCI bus number */
> + u8 devfn; /* PCI devfn number */

hm, in kernel data structures we usually encode devfn as unsigned
int - and sometimes even bus. Only 8 bits will be used of both so
it's the same end result, but it results in more efficient
instructions without byte shuffling.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/