Re: [PATCH v2 2/4] pciehp: Use link change notifications for hot-plugand removal

From: Guenter Roeck
Date: Mon Dec 16 2013 - 21:36:10 EST


On 12/16/2013 05:14 PM, Bjorn Helgaas wrote:
On Mon, Dec 16, 2013 at 10:39 AM, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
On Sun, Dec 15, 2013 at 05:18:26PM -0700, Bjorn Helgaas wrote:
On Sun, Dec 15, 2013 at 4:24 PM, Rajat Jain <rajatjain@xxxxxxxxxxx> wrote:

Once again: the way I interpret this is: * Always enable Link events.
* Disable presence events if attention button is present.

That sounds like a good plan to me.

How about Diag_Reset from MPT2SAS and others? link could up and down


I am assuming you are referring to

static int _base_diag_reset(struct MPT2SAS_ADAPTER *ioc, int sleep_flag)

Which as far as I could understand would cause link to go down and come up
again without the kernel knowing anything about it? ...

In general, I guess the question is when a link goes down and back up,
whether or not we want to treat it as a hot unplug followed by a hotplug. I
think there may be cases such as AER (or the one Yinghai mentions) where we
don't want to treat it as a hotplug (see note below). And there may be cases
that we definitely want to treat it as hotplug (need link events!).
Situation gets more complex since there may be pciehp slots downstream of a
link getting reset.

It seems to me that we are saying that a mechanism is needed so that a
voluntary Link flap is NOT treated like a hotplug temporarily? ...

Notes: * it may not OK, if the kernel thinks the device is accessible when
it is really not. What if during this downtime, someone tries to access the
device (say userspace)? * How do we know after the link up, that the device
is really the same. If during this reset, the device changed its
"character", say a different class? I think a rescan should be mandated
after every such event. * Do we need to unload and reload the driver after
the link reset, since it can be a different device?

I am quite dubious about the idea of a voluntary link flap. If the link goes
down and comes back up, I don't see how we can make any assumptions about what
device is there.

I let Alex talk me into pciehp_reset_slot(), which disables presence detect
interrupts while resetting a device, so we already have a bit of precedent for
the idea. But even in that case, the device could easily come out of reset as
a different device, e.g., if the reset caused it to load updated firmware.

I would feel much better if we treated link down as a remove and did a rescan
on the link up.

Agreed. Question is if we might need some means for a driver to tell the PCIe
core about an upcoming "planned" link flap. pciehp_reset_slot() doesn't seem
to address the condition where a driver resets a connected chip by other means
than by calling pciehp_reset_slot(). Still not sure what happens when the
mpt2sas driver issues its diagnostic reset, to take Yinghai's example (or if
there would be a cleaner way to implement such a reset).

In my opinion we should not add the concept of a planned link flap.
We already have pci_reset_function(), and we can probably make that
deal with link up/down events internally, so I think we should try to
use that if we can. I think it's too much of a mess to try to support
link flaps for random driver-initiated resets that don't go through
the PCI core.

Perfectly fine with me.

That probably means going through and identifying all the drivers we
can find that do their own internal resets, and converting them.
We'll likely miss some, since the mechanisms are driver-specific. And

Also might be difficult - that kind of work really asks for having
the hardware available for testing.

Guenter

maybe there are some driver resets that think they add value over the
core's pci_reset_function() (I'm not sure what that would be, but I'm
open to discussion about it).

Bjorn



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/