Re: Re: Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

From: Minchan Kim
Date: Mon Nov 30 2015 - 23:44:55 EST


On Tue, Dec 01, 2015 at 11:31:41AM +0900, kyeongdon.kim wrote:
> On 2015-12-01 ìì 8:18, Minchan Kim wrote:
> > Hi Kyeongdon,
> >
> > On Mon, Nov 30, 2015 at 07:42:02PM +0900, kyeongdon.kim wrote:
> >
> >> > On Fri, Nov 27, 2015 at 01:10:49PM +0900, Sergey Senozhatsky wrote:
> >> >> From: Kyeongdon Kim <kyeongdon.kim@xxxxxxx>
> >> >>
> >> >> When we're using LZ4 multi compression streams for zram swap,
> >> >> we found out page allocation failure message in system running test.
> >> >> That was not only once, but a few(2 - 5 times per test).
> >> >> Also, some failure cases were continually occurring to try allocation
> >> >> order 3.
> >> >>
> >> >> In order to make parallel compression private data, we should call
> >> >> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
> >> >> 2/3 size memory to allocate in that time, page allocation fails.
> >> >> This patch makes to use vmalloc() as fallback of kmalloc(), this
> >> >> prevents page alloc failure warning.
> >> >>
> >> >> After using this, we never found warning message in running test, also
> >> >> It could reduce process startup latency about 60-120ms in each case.
> >> >>
> >> >> For reference a call trace :
> >> >>
> >> >> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> >> >> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty
> >> > #20
> >> >> Call trace:
> >> >> [<ffffffc0002069c8>] dump_backtrace+0x0/0x270
> >> >> [<ffffffc000206c48>] show_stack+0x10/0x1c
> >> >> [<ffffffc000cb51c8>] dump_stack+0x1c/0x28
> >> >> [<ffffffc0002bbfc8>] warn_alloc_failed+0xfc/0x11c
> >> >> [<ffffffc0002bf518>] __alloc_pages_nodemask+0x724/0x7f0
> >> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> >> >> [<ffffffc0002ed6a4>] kmalloc_order_trace+0x38/0xd8
> >> >> [<ffffffc0005d9738>] zcomp_lz4_create+0x2c/0x38
> >> >> [<ffffffc0005d78f0>] zcomp_strm_alloc+0x34/0x78
> >> >> [<ffffffc0005d7a58>] zcomp_strm_multi_find+0x124/0x1ec
> >> >> [<ffffffc0005d7c14>] zcomp_strm_find+0xc/0x18
> >> >> [<ffffffc0005d8fa0>] zram_bvec_rw+0x2fc/0x780
> >> >> [<ffffffc0005d9680>] zram_make_request+0x25c/0x2d4
> >> >> [<ffffffc00040f8ac>] generic_make_request+0x80/0xbc
> >> >> [<ffffffc00040f98c>] submit_bio+0xa4/0x15c
> >> >> [<ffffffc0002e8bb0>] __swap_writepage+0x218/0x230
> >> >> [<ffffffc0002e8c04>] swap_writepage+0x3c/0x4c
> >> >> [<ffffffc0002c7384>] shrink_page_list+0x51c/0x8d0
> >> >> [<ffffffc0002c7e88>] shrink_inactive_list+0x3f8/0x60c
> >> >> [<ffffffc0002c86c8>] shrink_lruvec+0x33c/0x4cc
> >> >> [<ffffffc0002c8894>] shrink_zone+0x3c/0x100
> >> >> [<ffffffc0002c8c10>] try_to_free_pages+0x2b8/0x54c
> >> >> [<ffffffc0002bf308>] __alloc_pages_nodemask+0x514/0x7f0
> >> >> [<ffffffc0002bf5f8>] __get_free_pages+0x14/0x5c
> >> >> [<ffffffc0003446cc>] proc_info_read+0x50/0xe4
> >> >> [<ffffffc0002f5204>] vfs_read+0xa0/0x12c
> >> >> [<ffffffc0002f59c8>] SyS_read+0x44/0x74
> >> >> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> >> >> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
> >> >>
> >> >> [minchan: change vmalloc gfp and adding comment about gfp]
> >> >> [sergey: tweak comments and styles]
> >> >> Signed-off-by: Kyeongdon Kim <kyeongdon.kim@xxxxxxx>
> >> >> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> >> >
> >> > Kyeongdon, Could you test this patch on your device?
> >> >
> >> > Thanks.
> >>
> >> Sorry to have kept you waiting,
> >> Obviously, I couldn't see allocation fail message with this patch.
> >> But, there is something to make some delay(not sure yet this is normal).
> >
> > You mean new changes makes start-up delay of your application sometime
> > still,
> > but not frequent like old?
> >
> I couldn't see start-up delay during my test after this patch.
> But, I checked the return value from alloc function like the below :
>
> static void *zcomp_lz4_create(void)
> <snip>
> ret = kzalloc(LZ4_MEM_COMPRESS, GFP_NOIO | __GFP_NORETRY |
> __GFP_NOWARN | __GFP_NOMEMALLOC);
> printk("%s: %d: ret = %p\n",__func__,__LINE__,ret); //line 32
> if (!ret) {
> ret = __vmalloc(LZ4_MEM_COMPRESS,
> GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN |
> __GFP_NOMEMALLOC | __GFP_ZERO | __GFP_HIGHMEM,
> PAGE_KERNEL);
> printk("%s: %d: ret = %p\n",__func__,__LINE__,ret); //line 38
> }
> return ret;

OK, I got it.

>
> log message :
> [ 352.226014][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.226035][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.226791][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.226809][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230348][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230369][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230460][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230485][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230507][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230520][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230608][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230619][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.230888][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.230902][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.231406][0] zcomp_lz4_create: 32: ret = ffffffc002088000
> [ 352.234024][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234060][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234359][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234384][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234618][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234639][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234667][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234685][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234738][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234748][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234800][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234816][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.234852][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.234865][0] zcomp_lz4_create: 38: ret = (null)
> [ 352.235136][0] zcomp_lz4_create: 32: ret = (null)
> [ 352.235179][0] zcomp_lz4_create: 38: ret = ffffff80016a4000
>
> I thought this pattern from vmalloc is not normal.
> >>
> >> static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> >> {
> >> <snip>
> >>
> >> zstrm->private = comp->backend->create();
> >> ^ // sometimes, return 'null' continually(2-5times)
> >
> > Hmm, I think it is caused by __GFP_NOMEMALLOC.
> > Could you test it without the flag?
> >
> >>
> >> As you know, if there is 'null' return, this function is called again to
> >> get a memory in while() loop. I just checked this one with printk().
> >>
> >> If you guys don't mind, I'll test more with trace log to check time
> > delay.
> >
> > No problem.
> >
> >>
> >> However, If this is fully expectable status to you.
> >> I think I don't need to do it.
> >
> > It's not what I expected. Actually, I thought failure of vmalloc
> > in that place should be *really really* rare. I think it's caused by
> > __GFP_NOMEMALLOC so I want to see test result without the flag.
> >
> > Thanks for the careful test!
> >
> You're welcome.
>
> After I removed flag '__GFP_NOMEMALLOC', I couldn't find return 'null'
> from vmalloc until now.

Thanks for the test.
You said you cannot see the delay any more so it's fine as it is.

Regardless of it, I want to remove __GFP_NOMEMALLOC in the gfp_mask.
The reason we decided to add vmalloc fallback is that we expect that
it is likely to succeed a few order-0 pages to allocate (sizeof(buffer)
> PAGE_SIZE) although it was failed by kmalloc.
But what I missing is that zram-swap is working in direct reclaim
normally so it is more flexible to use emegency memory pool(
e,g, zsmalloc doesn't use __GFP_NOMEMALLOC) so there is no point
to use __GFP_NOMEMALLOC in here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/