Re: [git pull] PCI fixes

From: Kenji Kaneshige
Date: Wed Dec 07 2011 - 02:59:16 EST


(2011/12/07 1:14), Linus Torvalds wrote:
On Tue, Dec 6, 2011 at 12:08 AM, Kenji Kaneshige
<kaneshige.kenji@xxxxxxxxxxxxxx> wrote:

In the past, pciehp waits 100 ms instead of 1 sec after checking the link
state. This 100 ms was based on PCIe spec.

I would like to point out that these kinds of delays are *really*
annoying to users. And they add up.

One second per se is not a huge problem, but imagine that you're
hotplugging some regular user device (think thunderbolt or something
that we'd expect normal users to see). First one second for the kernel
to even see it, then some random udev rules, then some disk spinup
times or whatever, and soon a few delays here and a few delays there,
and it takes say three seconds for the folder to show up on the
desktop (or whatever acknowledgement of "yes, I see your device now").

That's a *long* time, and it's irritating to the user. It makes the
user think "the machine is slow".

We used to have this exact problem with USB hotplugging with slow
devices, so I know. It's still not necessarily immediate, but it's
better than it has been.

One second *total* is what people will consider pretty much immediate.
Any more than that is "thumb twiddling time".

And quite frankly, an unconditional one-second delay here seems bad.
Two seconds was unacceptable, one second is just bad.

This is
based on the PCIe description "software must allow 1 second after the Data
Link Layer Link Active bit reads 1b before it is permitted to determine
that a hot plugged device which fails to return a Successful Completion for
a Valid Configuration Request is a broken device".

Quite frankly, the way that reads to me says "you must wait at most 1s
before you consider a device broken".

But a *successful* read of the LT bit should abort the wait early. So
that good devices that aren't broken can complete setup much faster.

Please try to make something like that work. Instead of always waiting
for one second, wait for up to one second only for failure cases. Any
possibility of that?

Clearly most devices are perfectly fine almost immediately. It's sad
to wait for good devices for no good reason.


Thank you for comments. Yes, I think we can do more efforts on this.

To improve this, I think what we need is something to know if configuration
request to the hot-added device starts working. One possible idea so far is
to enable CRS Software Visibility and check vendor ID. The pci_scan_device()
function already has vendor ID check, but the code to enable CRS was removed
for some reason.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ad7edfe0490877864dc0312e5f3315ea37fc4b3a

Regards,
Kenji Kaneshige
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/