swsusp logic error?

From: martin f krafft
Date: Tue Feb 08 2005 - 15:41:47 EST


I am trying to get swsusp working on a 2.6.10 Debian kernel
(2.6.10-1-686, custom compile, enabling only CONFIG_SOFTWARE_SUSPEND
and leaving CONFIG_PM_STD_PARTITION empty) on this Sony Vaio Z1RSP
Centrino 1.7 Pentium M laptop... without much success. Whenever
I enter swsusp mode, the kernel reports that it cannot find the swap
space and aborts.

I checked the code and found the following problem:

swsusp_swap_check() calls is_resume_device(..), which compares the
device specified in CONFIG_PM_STD_PARTITION and overridden by the
'resume' kernel boot parameter with the list of available swap
partitions.

IMHO, the problem is not with the swap partitions, but rather with
the handling of the resume_file variable. A dev_t is just an
integer, and to compare the devices, is_resume_device(..) converts
the device node of each swap file to a dev_t, using the MKDEV(..)
macro. For me, the swap partition is hda2, and MKDEV correctly
returns the dev_t for 3:2.

However, in is_resume_device, the resume_device variable is 0, which
translates to the 0:0 device. On inspection, this is no surprise:

resume_device is a static in swsusp.c. However, it is only ever
written once: in swsusp_read(), which is called to restore a memory
image from swap. That image can never be created because
is_resume_device(..) will always fail due to the comparison against
the (uninitialised) static resume_device.

I tried to rectify the situation by duplicating the line

resume_device = name_to_dev_t(resume_file);

to the beginning of the swsusp_swap_check() function, so that it
gets set to the dev_t corresponding to the device identified in
resume_file before is_resume_device(..) is called.

However, name_to_dev_t(..) does more than converting a name to the
dev_t structure... in particular, it crashes the kernel when called
from swsusp_swap_check(). If I execute

echo platform >| disk; echo disk >| state

from the shell (zsh), then the kernel will report a crash in the zsh
process, the top of the trace is

[<c0134780>] swsusp_swap_check+0x30/0x100

and the corresponding disassembly is available from

http://rafb.net/paste/results/HV8eCI97.txt

The Code at the bottom of the crash dump is 2.5 lines of 'cc cc
...', and I am being told that

<6>zsh[6632] exited with preempt_count 1.

The machine is then pretty much dead. The network interface reports
too much work at the interrupt, and I can still switch virtual
consoles, but I cannot type, and sysrq does not work.

Anyway, I have no more time to work on this, unfortunately.
Hopefully my analysis helps to solve that problem.

--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:"; net@madduck

invalid/expired pgp subkeys? use subkeys.pgp.net as keyserver!
spamtraps: madduck.bogus@xxxxxxxxxxx

"there are two major products that come out of berkeley: lsd and unix.
we don't believe this to be a coincidence."
-- jeremy s. anderson

Attachment: signature.asc
Description: Digital signature