Re: Software Suspend [HELP: buffer&schedule problems]

Gabor Kuti (ksx@sch.bme.hu)
Wed, 2 Dec 1998 14:32:25 +0100 (MET)


> > Hi
> >
> > A thaught just occured to me. Would this software suspend be usable on
> > machines without apm? Eg instead of shutting down, you save the state to
> > disk, and next time you boot it loads the state from disk. I would really
> > like to have this feature. Apart from that, it would just be cool:)
>
> This brings all kinds of ideas to mind ... I remember working the lab once
> and there was a power cut. I would have given anything to be able to have
> reloaded a previous 'state' of the machine (say from the previous night's
> save -- like an autosave?) and carried on. I am sure there are loads of
> applications which would benefit from something like
> --> machine hangs (interrupted, ...)
> ----> reboot if no response for a bit (implemented) or on power up
> -------> load last saved state from disk to memory
>
> and away we go ... Just a thought
Yes, it is to save state on machines [any arch] _without_ APM.

Making snapshots will not be available in this kind of implementation,
because suspending hangs on swap state [every process is tried to swapped
out, so image won't be that big], and the state of filesystems also.
This is a drawback [so I can't suspend a state, run another [not using
that swap!] then continue the first one]. This is not good, because nfs
filesystem may vary meantime [and why wouldn't they do that?].
Every possible info is shrinked [inode, buffer, dcache] but I don't know
what to do with those are in use..
Maybe we should revalidate them? But what if we did this suspend in the
meantime of writing a buffer [and haven't marked it dirty yet - we
load a previous state..].
And what if we marked it... to reavalidate we have to flush it to disk as
it is [with unfinished modifying]... It is not a problem if we continue
with the original state but running another one [or a new one] is
definitely bad.. FS corruption ... :O

=========================================================================
===================> Problem description starts here <===================
And anyway.. here comes my 2 problems I can't cope with..
One is buffer revalidating, the other is how to stop processes from
running while suspending?

I repeat them [I don't know if they have arrived or I've just deleted that
letter].

So the two problems are [look out scheduler and buffer gurus :)]

1) how to 'revalidate' buffers? They will be out of date, because we
modify page allocation while writing our image [what we have just
duplicated - including not droppable buffers [buffers in use!]]. And after
resume they won't match with disk [even if we had run another [new] state
on this root partition].
We could delete the image before restoring pages, but root is r/o and
anyway.. we have to think about possible nfs connections [their pages are
cached, aren't they?], and they may vary meantime.
OR - write them out as they are and by resuming get them again. [It's
probably bad if we boot up _without_ resuming and writing our
root partition.. the same with nfs. So we really should revalidate
somehow]

2) On shrinking memory [to swap] I did call try_to_free_pages with gfp
mask __GFP_WAIT. And it slept [of course :)]. So other processes were
still running :(. [my mpeg player e.g.]. How to disable them? [It made an
endless loop of swapping out/in :I :)].
If I make them TASK_INTERRUPTIBLE then they may wake up because they were
already in another wait_queue. If I make them TASK_UNITERRUPTIBLE then by
resuming I should we all processes up and if it doesn't check if it is
happened what it's been waiting for there may be definitely problems.

I was thinking of a shortcut in schedule() [if suspeinding is in progress
then don't run any other process than the suspending one] But it's _ugly_.

The other idea was that I don't wait for I/O in shrink_memory [so calling
ll_rw_block won't cause wait_on_page] but I call directly
run_task_queue(tq_disk);. But this may be buggy later when eg. in ide.c
in waiting for disk to be ready that may sleep for about half a second.
[look at the note].

I'm working on 2.1.125. [Will be upgraded of course].
I really want to post this patch [so ppl can judge it :)] but without
these two things it worths nearly nothing.. :( [and _then_ will come hw
state restoring..]

Seasons
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
"One who has time to complain has time to submit patches." <chinese proverb>
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/