Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)

From: Rafael J. Wysocki
Date: Tue Dec 02 2008 - 19:01:09 EST


On Wednesday, 3 of December 2008, Linus Torvalds wrote:
>
> On Tue, 2 Dec 2008, Rafael J. Wysocki wrote:
> >
> > * dmesg output including one hibernation-resume cycle from 2.6.28-rc7 with the
> > debug patch (appended for completness):
> >
> > http://www.sisk.pl/kernel/debug/mainline/2.6.28-rc7/dmesg-rc7-patched-prep.log
> >
> > * dmesg output including one hibernation-resume cycle from 2.6.28-rc7 without
> > the debug patch:
> >
> > http://www.sisk.pl/kernel/debug/mainline/2.6.28-rc7/dmesg-rc7-nopatch-prep.log
>
> As with Frans, the debug patch seems to make no difference what-so-ever.
> Yes, the cardbus regions get allocated differently, but they're fine in
> either case, and arguably (exactly as with Frans) the debug patch actually
> makes things uglier by actively getting the alignment wrong, and skipping
> cardbus setup until later.

Hm, what about (from the copy of /proc/iomem without the patch at
http://www.sisk.pl/kernel/debug/mainline/2.6.28-rc7/rc7-nopatch/iomem):

88000000-8bffffff : PCI Bus 0000:03
88000000-8bffffff : PCI CardBus 0000:04
8c000000-91ffffff : PCI Bus 0000:03
8c000000-8fffffff : PCI CardBus 0000:04

(1) Why two ranges are allocated for 0000:03 without the patch while there is
only one range with the patch:

88000000-880fffff : PCI Bus 0000:03

(copy of the file at
http://www.sisk.pl/kernel/debug/mainline/2.6.28-rc7/rc7-patched/iomem)?
That seems to look like a difference to me.

(2) Why are they so large without the patch while with the patch they are much
smaller (O(2^28) vs O(2^21) if I'm not mistaken)?

(3) Why are they overlapping with the ranges for CardBus 0000:04, although
without the patch they aren't? Is that actually correct at all?

> That's what your patch (without debugging) should have resulted in too,
> except you'd not have seen the "bad alignment flags" printout, of course
> (but you probably would have seen the "bad alignment 0: [...]" one).

Yes, I saw that:

bad alignment flags 21200 4000000 (0)
pci 0000:03:0b.0: BAR 9 bad alignment 0: [0x000000-0x3ffffff]
bad alignment flags 20200 4000000 (0)
pci 0000:03:0b.0: BAR 10 bad alignment 0: [0x000000-0x3ffffff]

> In fact, I'm starting to think I know why we set up the prefetch window
> without the patch, and why we don't with it - because with the patch, the
> PCI code ends up never seeing any valid prefetchable region for the
> cardbus controller at all, so it never even bothers to try to set up a
> prefetchable window.
>
> So in many ways, the debug patch that gets the alignment wrong (on
> purpose) is really the inferior one. Plain -rc7 seems to do everything
> right.

Well, I'm not sure ...

> > * diff between the two:
> >
> > http://www.sisk.pl/kernel/debug/mainline/2.6.28-rc7/dmesg-rc7_nopatch-rc7_patched.diff
>
> Gaah. Using "-U 0" is likely the least readable form of diffs there
> exists, even if it makes the diff smaller.

Sorry.

To me it's more readable this way, but well.

> > This part of the diff (+ is the patched one) seems to be particularly
> > interesting to me, especially the overlapping MEM windows for 0000:00:1e.0 and
> > 0000:03:0b.0 (may that be the reason for the observed failures?):
>
> No, those are very much on purpose.
>
> Device 0000:00:1e.0 is the PCI bridge that bridges to PCI bus#3, so the
> MEM window is very much intentional - exactly because MMIO goes through
> that PCI bridge bus to get to bus#3, which is where the cardbus controller
> is.
>
> IOW, the topology is as follows:
> - CPU is on the root bus (bus #0)
> - device 00:1e.0 is the PCI bridge to bus #3
> - device 03:0b.0 is the CardBus bridge (to bus #4)
> and any actual cardbus cards (if you had any) would be on that bus #4, so
> they'd be named "04:xx.y".
>
> Now, that PCI bridge 00:1e.0 is a transparent bridge (aka "[Subtractive
> decode]" in your lspci output - as compared to the other bridges that say
> "[Normal decode]"), which means that you don't actually _have_ to set up
> any MMIO window on them, since the bridge will forward _any_ PCI cycles
> that don't get responded to by any other PCI device.
>
> But having an explicit window is still generally a good idea, since it
> should allow the PCI bridge to pick up the PCI cycles earlier (no need to
> wait to see if others respond to it), and possibly allows for better
> prefetching behavior. So again, the dmesg and the PCI layout actually
> looks _better_ without the hacky patch.
>
> So are you saying that the unpatched kernel still reliably doesn't
> hibernate for you, while the (arguably _incorrect_) patched kernel
> reliably does hibernate?

Yes, I am.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/