Re: [PATCH] PM: hibernate: Fix a bug in copying the zero bitmap to safe pages

From: Pavan Kondeti
Date: Mon Oct 02 2023 - 23:08:43 EST


On Sat, Sep 30, 2023 at 07:37:13AM -0400, Brian Geffon wrote:
> On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti
> <quic_pkondeti@xxxxxxxxxxx> wrote:
> >
> Hi Pavankumar,
>
> > The following crash is observed 100% of the time during resume from
> > the hibernation on a x86 QEMU system.
> >
> > [ 12.931887] ? __die_body+0x1a/0x60
> > [ 12.932324] ? page_fault_oops+0x156/0x420
> > [ 12.932824] ? search_exception_tables+0x37/0x50
> > [ 12.933389] ? fixup_exception+0x21/0x300
> > [ 12.933889] ? exc_page_fault+0x69/0x150
> > [ 12.934371] ? asm_exc_page_fault+0x26/0x30
> > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100
> > [ 12.935428] snapshot_write_next+0x7c/0x9f0
> > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370
> > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0
> > [ 12.937035] ? hib_submit_io+0xa5/0x110
> > [ 12.937501] load_image+0x83/0x1a0
> > [ 12.937919] swsusp_read+0x17f/0x1d0
> > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240
> > [ 12.938967] load_image_and_restore+0x45/0xc0
> > [ 12.939494] software_resume+0x13c/0x180
> > [ 12.939994] resume_store+0xa3/0x1d0
> >
> > The commit being fixed introduced a bug in copying the zero bitmap
> > to safe pages. A temporary bitmap is allocated in prepare_image()
> > to make a copy of zero bitmap after the unsafe pages are marked.
> > Freeing this temporary bitmap later results in an inconsistent state
> > of unsafe pages. Since free bit is left as is for this temporary bitmap
> > after free, these pages are treated as unsafe pages when they are
> > allocated again. This results in incorrect calculation of the number
> > of pages pre-allocated for the image.
> >
> > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages;
> >
> > The allocate_unsafe_pages is estimated to be higher than the actual
> > which results in running short of pages in safe_pages_list. Hence the
> > crash is observed in get_buffer() due to NULL pointer access of
> > safe_pages_list.
>
> Rafael pulled https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=f0c7183008b41e92fa676406d87f18773724b48b
> which addresses the null pointer dereference which regardless
> shouldn't be touching the list directly and should be using
> __get_safe_page().

Thanks for pointing me to this. I have verified hibernation by pulling this
commit to v6.6-rc3 and it works as expected.

This commit is currently queued for v6.7, can it be included in next -rc or
we have to apply the patch I have sent to make sure that hibernation works on
v6.6 when it gets released.

>
> >
> > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file")
> > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@xxxxxxxxxxx>

Thanks,
Pavan