Re: [PATCH] pci: Warn if BME cannot be turned off during kexec

From: Deepa Dinamani
Date: Mon Jan 06 2020 - 14:38:44 EST


On Mon, Jan 6, 2020 at 5:54 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> Hi Deepa,
>
> Thanks for the patches. Since these two patches touch the same piece
> of code in pci_device_shutdown(), they conflict with each other. I
> could resolve this myself, but maybe you could make them a series that
> applies cleanly together?

Sure, will make this a series.

> Can you also please edit the subject lines so they match the
> convention (use "git log --oneline drivers/pci/pci-driver.c" to see
> it).

Will do.

> On Sat, Jan 04, 2020 at 02:50:52PM -0800, Deepa Dinamani wrote:
> > BME not being off is a security risk, so for whatever
> > reason if we cannot disable it, print a warning.
>
> "BME" is not a common term in drivers/pci; can you use "Bus Master
> Enable" (to match the PCIe spec) or "PCI_COMMAND_MASTER" (to match the
> Linux code)?

Will do.

> Can you also explain why this is a security risk? It looks like we
> disable bus mastering if the device is in D0-D3hot. If the device is
> in D3cold, it's powered off, so we can't read/write config space. But
> if it's in D3cold, the device is powered off, so it can't be a bus
> master either, so why would we warn about it?

I was mainly concerned about the PCI_UNKNOWN state here. Maybe we can
add a specific check for the unknown state if that is preferable.

> > Signed-off-by: Deepa Dinamani <deepa.kernel@xxxxxxxxx>
> > ---
> > drivers/pci/pci-driver.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 0454ca0e4e3f..6c866a81f46c 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -491,8 +491,12 @@ static void pci_device_shutdown(struct device *dev)
> > * If it is not a kexec reboot, firmware will hit the PCI
> > * devices with big hammer and stop their DMA any way.
> > */
> > - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
> > - pci_clear_master(pci_dev);
> > + if (kexec_in_progress) {
> > + if (likely(pci_dev->current_state <= PCI_D3hot))
>
> No need to use "likely" here unless you can measure a difference. I
> doubt this is a performance path.
>
> > + pci_clear_master(pci_dev);
> > + else
> > + dev_warn(dev, "Unable to turn off BME during kexec");
>
> How often and for what sort of devices would you expect this warning
> to be emitted? If this is a common situation and the user can't do
> anything about it, the warnings will clutter the logs and train users
> to ignore them.

This is not a common situation. I think I have seen this only a couple
of times in my kexec experiments. I have not found any documentation
about if a device can go into an unknown power state and still be
harmful or not. But, if you need more testing, I could check the patch
into the Google datacenter code and sweep the logs to see if these get
printed at all.

-Deepa