Re: Corrupted files after suspend to disk

From: Andreas Hartmann
Date: Wed Mar 07 2012 - 06:08:08 EST


richard -rw- weinberger schrieb:
> On Wed, Mar 7, 2012 at 12:54 AM, Andreas Hartmann
> <andihartmann@xxxxxxxxxxxxxxx> wrote:
>> Andreas Hartmann schrieb:
>>> Rafael J. Wysocki schrieb:
>>>> On Friday, February 17, 2012, richard -rw- weinberger wrote:
>>>>> On Thu, Feb 16, 2012 at 11:26 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>>> On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
>>>>>>
>>>>>>>> FWIW, we've been seeing a number of hard to diagnose failures
>>>>>>>> with suspend to disk for the last few releases in Fedora.
>>>>>>>> Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
>>>>>>>> for a while, but there's no smoking gun that really explains what's
>>>>>>>> getting into these states. Further complicating things, is that it
>>>>>>>> doesn't seem to be 100% reproducable.
>>>>>>>
>>>>>>> I wonder if that's reproducible with the filesystems freezing patch I posted
>>>>>>> some time ago (it will need some rebasing to apply to the current mainline or
>>>>>>> 3.2.y).
>>>>>
>>>>> Where can I find this patch?
>>>>> I'll happily test it.
>>>>> But it may take some time as the bug is not easy to reproduce.
>>>>
>>>> This is the last version posted:
>>>>
>>>> http://marc.info/?l=linux-kernel&m=132775832509351&w=4
>>>>
>>>> However, it only may help if you use the kernel-based hibernation i.e.
>>>> "echo disk > /sys/power/state" (that may be worth testing without the
>>>> patch too, but Fedora is using this AFAICS, so it probably has that
>>>> problem too).
>>>
>>> I'm having the same problem. Please take a look at the following bug
>>> report at suse for more information:
>>>
>>> https://bugzilla.novell.com/show_bug.cgi?id=732908
>>>
>>> Do you know, which way of suspending openSUSE uses in 12.1?
>>
>> I changed SLEEP_MODULE="uswsusp" to "kernel" in
>> /usr/lib/pm-utils/defaults and tested your patch mentioned above with
>> linux 3.2.9.
>>
>> Unfortunately the behaviour didn't change at all - I can see the same
>> problems as before.
>>
>> I tested with and without X. I tested with the call "pm-hibernate" and
>> with "echo disk > /sys/power/state". I always could see the corrupted
>> files after 2 to 4 times of hibernating / resuming.
>>
>>
>> Kind regards,
>> Andreas
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> On my system kernel suspend *seems* to work.
> I've seen no corrupted files so far.
>
> But sometimes the resume is failing. (One out of 5 resumes fails).
> I was unable to get any kernel output.
>
> So I'm not sure whether this is the same issue
> or another one. :-\

I'm pretty sure that this is the same issue. What you are telling
correlates with my research here.
I even got resumes where the machine came up again, but nothing could be
done (it wasn't possible to switch of the password secured screen saver
any more - login at the shell wasn't possible, too, because the started
bash crashed), because of some relevant libraries have been broken.
Reboot with CTRL-ALT-DEL often doesn't work, too, because of the lack of
bash.
But afterwards, I could see in logfiles, that there were file
corruptions (e.g. in ~/.xsession-errors).

It's absolute easy for me to reproduce the problem, because it's just
more or less every time without doing something special. It isn't even
necessary to have a running X session. Runlevel 3 is enough to get the
problem triggered.

I'm checking the md5 sum of some directories each time after resuming
with this script:

#!/bin/sh

dir="/bin /sbin /lib /lib64 /usr/lib64 /usr/bin /usr/sbin"

for i in $dir
do
echo "$i"
cd $i
md5sum -c md5sum.out | grep -v "OK"
done


The initial creation is done directly after a fresh boot with this script:

#!/bin/sh

dir="/bin /sbin /lib /lib64 /usr/lib64 /usr/bin /usr/sbin"

for i in $dir
do
echo "$i"
cd $i
rm md5sum.out
md5sum * > md5sum.out
done


What about the filesystem layout (openSUSE 12.1)? I'm using the
following layout:

- /dev/sda
- /dev/sda1 -> /boot
- /dev/sda2 -> cr_sda2 (crypted partition with cryptsetup luksOpen ...)
- cr_sda2 is a PV for LVM
- The PV is put to the VG "system"
- The following LV's are part of the VG system:
/root
swap
/usr
/var
/home
/opt

cr_sda2 is decrypted during initrd.


Kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/