Re: [PATCHSET] block, mempool, percpu: implement percpu mempool andfix blkcg percpu alloc deadlock

From: Andrew Morton
Date: Thu Dec 22 2011 - 18:16:52 EST


On Thu, 22 Dec 2011 15:00:47 -0800
Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello, Andrew.
>
> On Thu, Dec 22, 2011 at 02:54:26PM -0800, Andrew Morton wrote:
> > > These stats are userland visible and quite useful ones if blkcg is in
> > > use. I don't really see how these can be removed.
> >
> > What stats?
>
> The ones allocated in the last patch. blk_group_cpu_stats.

What last patch.

I can find no occurence of "blk_group_cpu_stats" on linux-kernel or in
the kernel tree.

> > For starters, doing pagetable allocation on the I/O path sounds nutty.
> >
> > Secondly, GFP_NOIO is a *weaker* allocation mode than GFP_KERNEL. By
> > permitting it with this patchset, we have a kernel which is more likely
> > to get oom failures. Fixing the kernel to not perform GFP_NOIO
> > allocations for these counters will result in a more robust kernel.
> > This is a good thing, which improves the kernel while avoiding adding
> > more compexity elsewhere.
> >
> > This patchset is the worst option and we should try much harder to avoid
> > applying it!
>
> The stats are per cgroup - request_queue pair. We don't want to
> allocate for all of them for each combination as there are
> configurations with stupid number of request_queues and silly many
> cgroups and #cgroups * #request_queue * #cpus can be huge. So, we
> want on-demand allocation. While the stats are important, they are
> not critical and allocations can be opportunistic. If the allocation
> fails this time, we can try it for the next time.

Without code to look at I am at a loss.

request_queues are allocated in blk_alloc_queue_node(), which uses
GFP_KERNEL (and also mysteriously takes a gfp_t arg).

> So, yeah, the suggested solution fits the problem. If you have a
> better idea, please don't be shy.

Unsure which solution you're referring to here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/