Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659

From: Rafael J. Wysocki
Date: Thu Jan 26 2012 - 19:57:49 EST


On Thursday, January 26, 2012, Srivatsa S. Bhat wrote:
> On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote:
>
> > Hi,
> >>
> >> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as:
> >>
> >> if (data->mode != O_RDONLY || !data->frozen || data->ready) {
> >> error = -EPERM;
> >> break;
> >> }
> >>
> >> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However,
> >> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0.
> >> This makes it possible for hibernation_snapshot() and hence
> >> freeze_workqueues_begin() to be called a second time, which is unfortunate.
> >
> > Yes, I obviously forgot about that code path when I was working on the commit
> > that introduced the problem. :-(
> >
> > Thanks a lot for the great analysis, it's really helpful!
> >
>
>
> Welcome :-) It was fun!
>
>
> >> And actually, the patch I posted in my previous mail is not really the right
> >> long-term fix, though it might fix the particular issue that Jiri is facing..
> >>
> >> Because, allowing hibernation_snapshot() to get called a second time while
> >> kernel threads are still frozen brings us to the same situation that commit
> >> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory)
> >> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be
> >> done inside hibernation_snapshot(), when kernel threads are frozen.. which
> >> is known to break XFS, to give one example as mentioned in the changelog
> >> of the above commit.
> >
> > That's exactly right.
> >
> >> So, the right way to fix this IMHO, would be to split up thaw_processes()
> >> just like freezing phase:
> >>
> >> /* freezes or thaws user space processes */
> >> freeze_processes() - thaw_processes()
> >>
> >> /* freezes or thaws kernel threads */
> >> freeze_kernel_threads() - thaw_kernel_threads()
> >>
> >> We have to insert this thaw_kernel_threads() at appropriate places in such a
> >> way as to not require another ioctl if possible... Then things would be
> >> more symmetric (and hence more easy to understand) and we can avoid getting
> >> into strange situations as discussed here.
> >>
> >> But before we venture into that, it would be good to know if the patch posted
> >> in the previous mail fixes the particular problem reported in this thread,
> >> atleast just to see if there are other problems lurking that we aren't aware
> >> of yet..
> >
> > Jiri has already said that the patch works.
> >
> > I think we could avoid the issue entirely by introducing thaw_kernel_threads
> > and making SNAPSHOT_FREE call it. No other changes should be necessary.
> >
> > IOW, Jiri, does the patch below help?
> >
> > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?]
> >
> > ---
>
>
> This is exactly the kind of fix I was suggesting.. Thanks Rafael!
>
> I have a small request for a comment. Please see below.
> I have a question too, but for that I'll have to reply to my earlier
> thread so that I can comment on the userspace code.
>
> > include/linux/freezer.h | 2 ++
> > kernel/power/process.c | 19 +++++++++++++++++++
> > kernel/power/user.c | 1 +
> > 3 files changed, 22 insertions(+)
> >
> > Index: linux/include/linux/freezer.h
> > ===================================================================
> > --- linux.orig/include/linux/freezer.h
> > +++ linux/include/linux/freezer.h
> > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
> > extern int freeze_processes(void);
> > extern int freeze_kernel_threads(void);
> > extern void thaw_processes(void);
> > +extern void thaw_kernel_threads(void);
> >
> > static inline bool try_to_freeze(void)
> > {
> > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
> > static inline int freeze_processes(void) { return -ENOSYS; }
> > static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> > static inline void thaw_processes(void) {}
> > +static inline void thaw_kernel_threads(void) {}
> >
> > static inline bool try_to_freeze(void) { return false; }
> >
> > Index: linux/kernel/power/process.c
> > ===================================================================
> > --- linux.orig/kernel/power/process.c
> > +++ linux/kernel/power/process.c
> > @@ -188,3 +188,22 @@ void thaw_processes(void)
> > printk("done.\n");
> > }
> >
> > +void thaw_kernel_threads(void)
> > +{
> > + struct task_struct *g, *p;
> > +
> > + pm_nosig_freezing = false;
> > + printk("Restarting kernel threads ... ");
> > +
> > + thaw_workqueues();
> > +
> > + read_lock(&tasklist_lock);
> > + do_each_thread(g, p) {
> > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> > + __thaw_task(p);
> > + } while_each_thread(g, p);
> > + read_unlock(&tasklist_lock);
> > +
> > + schedule();
> > + printk("done.\n");
> > +}
> > Index: linux/kernel/power/user.c
> > ===================================================================
> > --- linux.orig/kernel/power/user.c
> > +++ linux/kernel/power/user.c
> > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
> > swsusp_free();
> > memset(&data->handle, 0, sizeof(struct snapshot_handle));
> > data->ready = 0;
>
>
> It would be nice to have a comment here explaining why we call
> thaw_kernel_threads() here.

I agree and I'll add one in the final version.

> (Such a comment would avoid confusion when people
> look at SNAPSHOT_CREATE_IMAGE and SNAPSHOT_FREE and wonder why there is
> thawing involved, while the corresponding freezing is nowhere in sight..
> Of course the freezing is hidden inside hibernation_snapshot(), but that
> might not be immediately apparent to everyone.)

Sure.

> > + thaw_kernel_threads();
>
> > break;
>
> >
> > case SNAPSHOT_PREF_IMAGE_SIZE:

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/