Re: Oops in swsusp

From: Andreas Steinmetz
Date: Mon Apr 11 2005 - 11:19:51 EST


Rafael J. Wysocki wrote:
> Hi,
>
> On Monday, 11 of April 2005 01:17, Andreas Steinmetz wrote:
>
>>Pavel,
>>during testing of the encrypted swsusp_image on x86_64 I did get an Oops
>>from time to time at memcpy+11 called from swsusp_save+1090 which turns
>>out to be the memcpy in copy_data_pages() of swsusp.c.
>>The Oops is caused by a NULL pointer (I don't remember if it was source
>>or destination).
>
>
> It's quite important, however. If it's the destination, it's probably a bug in
> swsusp. Otherwise, the problem is more serious. Could you, please,
> add BUG_ON(!pbe->address) right before the memcpy() and retest?

Here's the result:

BUG_ON(!pbe->address) hits, so is seems to be a swsusp problem.

So I added a BUG_ON(!to_copy) as shown below (mangled for mailing):

if (saveable(zone, &zone_pfn)) {
struct page * page;
BUG_ON(!to_copy);
page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
pbe->orig_address = (long) page_address(page);
/* copy_page is not usable for copying task structs. */
BUG_ON(!pbe->address);
memcpy((void *)pbe->address, (void *)pbe->orig_address,
PAGE_SIZE);
pbe++;
to_copy--;
}

The BUG_ON(!to_copy) hits, too.

So it seems that the result sum of saveable() increases between
count_data_pages() and copy_data_pages(). The problem seems to be
easiest hit when starting a larger GUI application and suspending during
application startup. It does ususally take a few suspend/resume
iterations to hit the bug.

Pavel,
the crypto stuff in swsusp.c starts in data_write() which is called
after copy_data_pages() if I'm right. In this case the crypto stuff
can't be the culprit.
--
Andreas Steinmetz SPAMmers use robotrap@xxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/