Re: [RFC][PATCH] swsusp: support creating bigger images

From: Rafael J. Wysocki
Date: Tue Apr 25 2006 - 04:27:30 EST


Hi,

On Tuesday 25 April 2006 00:16, Pavel Machek wrote:
> > The appended patch allows swsusp to break the "50% of the normal zone" limit.
> > This is achieved by using the observation that pages mapped by frozen
> > userland tasks need not be copied before saving.
>
> I've not followed the patch too carefully, but it is not as bad as I
> expected...
>
> > With this patch applied I was able to save (and restore ;-)) ~800 MB suspend
> > images on a box with 1 GB of RAM.
>
> And did it also work second time? ;-).

Yes, somethinkg like 10 times in a row. :-)

> Okay, so it can be done, and patch does not look too bad. It still
> scares me. Is 800MB image more responsive than 500MB after resume?

Yes, it is, slightly, but I think 800 meg images are impractical for
performance reasons (like IMO everything above 500 meg with the current
hardware). However this means we can save 80% of RAM with the patch
and that should be 400 meg instead of 250 on a 500 meg machine, or
200 meg instead of 125 on a 250 meg machine.

Apart from this we save some memory allocations and copyings during
suspend.

> I guess we can revert to old behaviour by simply returning 1 from
> need_to_copy, right?

Yes.

> I assume that need_to_copy returns 1 in case page is shared by current
> and some other process?

Yes. [If the page is shared with the current task it has to mapped by it, in
which case it will be copied.]

> > [Please don't beat me very hard, just couldn't resist. ;-)]
>
> Well, you are about to force me to learn about mm internals. Plus you
> force everyone who tries to modify swsusp. ... it may be okay if
> benefit is great enough and if it gets proper testing. Not 2.6.17
> material.

Obviously not, and yes it requires a _lot_ of testing especially on boxes
with 512 meg of RAM or less.

> Is benefit worth it?

Well, that depends. I think for boxes with 1 GB of RAM or more it's just
unnecessary (as of today, but this may change if faster disks are available).
On boxes with 512 MB of RAM or less it may change a lot as far as the
responsiveness after resume is concerned.

Anyway do you think it may go into -mm (unless Andrew shoots it down,
that is ;-))?

> > --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h 2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h 2006-04-22 10:34:45.000000000 +0200
> > @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
> > */
> > unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
> >
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *);
> > +#else
> > +static inline int page_mapped_by_current(struct page *page) { return 0; }
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
>
> I'd leave it undefined. That will prevent nasty surprise when someone
> tries to use it w/o CONFIG_SOFTWARE_SUSPEND set.

OK

> > @@ -251,6 +253,14 @@ int restore_special_mem(void)
> > return ret;
> > }
> >
> > +/* Represents a stacked allocated page to be used in the future */
> > +struct res_page {
> > + struct res_page *next;
> > + char padding[PAGE_SIZE - sizeof(void *)];
> > +};
> > +
> > +static struct res_page *page_list;
> > +
> > static int pfn_is_nosave(unsigned long pfn)
> > {
> > unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >>
> > PAGE_SHIFT;
>
> Is this part of some other patch?

No. That's because I use a linked list of allocated pages that has been
defined in a slightly different way in
swsusp-use-less-memory-during-resume.patch,
so I just redefine and reuse it.

> > @@ -490,6 +613,11 @@ void swsusp_free(void)
> > buffer = NULL;
> > }
> >
> > +void swsusp_free(void)
> > +{
> > + free_image();
> > + restore_active_inactive_lists();
> > +}
>
> This still scares me. Nice test would be to
> save/restore_active_inactive_lists repeatedly in a loop ... on running
> system ... aha, but it probably can't work outside of refrigerator?

restore_active_inactive_lists() only does something if suspend fails
(otherwise the lists active_pages and inactive_pages are empty).

The suspend failure may be simulated by running swapoff and suspendig
with "echo disk > /sys/power/state". However it wouldn't be interesting to
do this in a loop because it will make the same set of pages go out of the
LRU lists and back in a round robin fashion.

I did the the following test:
- ran swapoff
- tried to suspend with "echo disk > /sys/power/state" (which failed,
obviously)
- checked the numbers of active/inactive pages in /proc/meminfo
- ran swapon
- set image_size to 0
- suspended with "echo disk > /sys/power/state" to verify if the right number
of pages would be reclaimed
- resumed

at it worked, apparently.

> > case SNAPSHOT_UNFREEZE:
> > if (!data->frozen)
> > break;
> > + if (data->ready) {
> > + error = -EPERM;
> > + break;
> > + }
> > down(&pm_sem);
> > thaw_processes();
> > enable_nonboot_cpus();
>
> Error from UNFREEZE is not nice:
>
> Unfreeze:
> unfreeze(snapshot_fd);
> return error;
> }
> ... we don't handle it.

It will only occur if we don't call free_snapshot() before unfreeze() which is
a bug anyway. We prevent it from occuring by always calling free_snapshot()
before unfreeze(). :-)

> OTOH I guess it will be all fixed on exit()?

Well, sort of, but I think the LRU lists always have to be restored before
thawing processes (eg. kswapd).

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/