Removing PCI CRS code could cause problems

From: Matthew Wilcox
Date: Sun Jan 20 2008 - 09:35:56 EST



I just remembered why I put in the software-visible Config Retry Status
code. With conventional PCI, devices must return within something like
2^25 clock cycles (about a second for 33MHz busses). PCIe devices can
return CRS to the root complex which will automatically retry -- but to
the CPU, this looks like a read that hasn't returned yet.

My concern was/is that devices which take a long time to initialise may
cause watchdog timers to fire. I don't know if there's some CPUs which
might enter error recovery, but I think the native Linux watchdog might
well trigger.

Now, I have no evidence that this is going to happen. And we definitely
have buggy hardware to contend with (that caused the initial reversion).
But the commit ad7edfe0490877864dc0312e5f3315ea37fc4b3a isn't as risk-free
as it might seem.

Would you accept a patch which just black-listed the bridge that was
causing problems? Or possibly we don't even need that if we take Ivan's
patch to use conf1 accesses instead of mmconf for conventional config
space -- I understand the device in question to only have problems when
CRS is combined with mmconfig.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/