Re: Regression from 2.6.26: Hibernation (possibly suspend) brokenon Toshiba R500 (bisected)

From: Linus Torvalds
Date: Tue Dec 02 2008 - 19:42:20 EST

Next message: Rusty Russell: "Re: [RFC 5/8] param: arch_get_boot_command_line()"
Previous message: Jonathan Bastien-Filiatrault: "Re: EXT4 ENOSPC Bug"
In reply to: Rafael J. Wysocki: "Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)"
Next in thread: Rafael J. Wysocki: "Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 3 Dec 2008, Rafael J. Wysocki wrote:
>
> Hm, what about (from the copy of /proc/iomem without the patch at
> http://www.sisk.pl/kernel/debug/mainline/2.6.28-rc7/rc7-nopatch/iomem):
>
> 88000000-8bffffff : PCI Bus 0000:03
> 88000000-8bffffff : PCI CardBus 0000:04
> 8c000000-91ffffff : PCI Bus 0000:03
> 8c000000-8fffffff : PCI CardBus 0000:04
>
> (1) Why two ranges are allocated for 0000:03 without the patch while there is
> only one range with the patch:
>
> 88000000-880fffff : PCI Bus 0000:03

So look at the lspci information that has more details for info on this.

Basically, a Cardbus controller (as well as a PCI bridge) has _two_ very
different MMIO windows - one for prefetchable MMIO, and one for
non-prefetchable.

And the windows should match up, although it's always ok to map a
prefetchable region in a non-prefetchable window (although you may lose
some performance, since the upstream will now not be able to prefetch).

And it's not really different, exactly because 00:1e.0 is a transparent
bridge. That "transparency" means that there is an implied decoding window
DESPITE a lack of an explicit one - it may just be a few PCI cycles slower
(or not, depending on how the bridge is implemented. It's probably
technically quite ok for a transparent bridge to ignore its IO/MEM window
contents entirely).

So those "PCI Bus 0000:03" windows are technically irrelevant, except as a
possible performance improvement. The bridge will forward PCI cycles
whether they are there or not. That is what "transparent" means.

> (2) Why are they so large without the patch while with the patch they are much
> smaller (O(2^28) vs O(2^21) if I'm not mistaken)?

Because without the patch, we'll size the PCI bridge windows according to
the default setup of the CardBus controller, namely:

- two IO regions of size 'pci_cardbus_io_size' (256 bytes)

- one prefetchable and one non-prefetchable IO region of size
'pci_cardbus_mem_size' (64MB by default, you can change it with
the 'pci=cbmemsize=xyz' kernel command line parameter but nobody ever
does)

See pci_bus_size_cardbus() in drivers/pci/setup-bus.c for details.

HOWEVER - when the alignment is wrong, we skip all that, and don't set up
the CardBus regions there at all, because the whole "pbus_size_mem()"
thing will just give up.

And then what happens is that we won't set up the CardBus bridge during
PCI bus setup AT ALL, but later, by the Yenta code. And that one will try
to do something else entirely.

[ And yes, if you think that it might be a good idea to try to share the
code and not have two different cardbus sizing logics, I'm not going to
argue against that AT ALL! ]

In the yenta code, we don't try to to just have a fixed maximum size,
because that code is literally designed to be a fallback case for when the
PCI sizing fails (ie literally "oops, we had so little space that we
couldn't use the default max size!" case).

For the yenta code, see drivers/pcmcia/yenta_socket.c ("Yenta" is just the
name for the CardBus standard programming model), and notice all the logic
there, see:

- BRIDGE_MEM_MAX/ACC/MIN and BRIDGE_IO_MAX/ACC/MIN for "max",
"acceptable" and "minimum" values respectively.

Realize thatt he "MAX" value is just 4MB, which is obviously smaller
than the 64MB mentioned above, and that's exactly because this is
assumed to be a "uhhuh, we ran out of space" case.

- See yenta_allocate_res() and the helper functions above it that in
addition try to shrink the window further if it doesn't fit.

- Notice how the Yenta code - unlike the generic PCI code - does _not_
try to allocate parent PCI bridge memory window resources, so if we
fall into this case, we're going to depend on previously set up windows
and/or the fact that a transparent bridge doesn't need them.

So this explains why in one case you'd see a 64M resource, and in another
just a 4M resource.

_Most_ cardbus cards by far only need a few kB of IOMEM resources, but
there are some crazy people who do CardBus graphics cards and/or video
capture cards, and they really do want 32MB+ of MMIO, which is why we try
to get such a big CardBus MMIO window by default.

As mentioned, you could try to just use

pci=cbmemsize=4M

on the kernel command line, and see if that also hides the bug. It's
entirely possible that your suspend/resume problem is not so much about
the cardbus allocation itself, as about some other memory area that wants
to use it (eg hidden RAM used by the bios SMM code, whatever).

> (3) Why are they overlapping with the ranges for CardBus 0000:04, although
> without the patch they aren't? Is that actually correct at all?

See the earlier explanation of the topology, and realize the
overlappingness is actually _required_ in order for a PCI bridge to work,
and forward the PCI cycles to the right lower bus.

But then, with a transparent bridge, if nobody else overlaps, it will
do what's called "negative decode", which just means that "if nobody else
said they wanted this PCI cycle, I'll decode it and forward it to the
lower bus", so a transparent bridge doesn't _need_ the overlap.

Do

/sbin/lspci -t

to understand the relationship, in particular see how device 0:1e.0 is the
one that bridges _to_ that CardBus controller.

See the comments about "topology" in my previous email.

> > So in many ways, the debug patch that gets the alignment wrong (on
> > purpose) is really the inferior one. Plain -rc7 seems to do everything
> > right.
>
> Well, I'm not sure ...

I'm pretty sure.

The fact that you then have hibernation issues is almost certainly due to
something else. Most likely something else that we don't know about from a
resource angle that now got "hidden" by the fact that we programmed the
0:1e.0 bridge to forward to the cardbus controller rather than to some
insane power management chip.

It's why we have all those quirks in drivers/pci/quirks.c. It's very
possible that your chipset is missing some quirk.

> > So are you saying that the unpatched kernel still reliably doesn't
> > hibernate for you, while the (arguably _incorrect_) patched kernel
> > reliably does hibernate?
>
> Yes, I am.

So see what happens if you add

pci=cbmemsize=4M

to make the cardbus allocations smaller. Perhaps that will just change the
allocations enough that now it doesn't cover something else.

Oh, and please do a "lspci -vvxxx" (as root) too so that I can see the
actual values in your PCI config space. We have two quirks already
triggering for you:

pci 0000:00:1f.0: quirk: region d800-d87f claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.0: quirk: region eec0-eeff claimed by ICH6 GPIO

but I'm maybe there's another one missing.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rusty Russell: "Re: [RFC 5/8] param: arch_get_boot_command_line()"
Previous message: Jonathan Bastien-Filiatrault: "Re: EXT4 ENOSPC Bug"
In reply to: Rafael J. Wysocki: "Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)"
Next in thread: Rafael J. Wysocki: "Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]