Re: [PATCHSET] mempool, percpu, blkcg: fix percpu stat allocationand remove stats_lock

From: Andrew Morton
Date: Wed Mar 07 2012 - 18:06:19 EST


On Wed, 7 Mar 2012 09:55:56 -0500
Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:

> On Tue, Mar 06, 2012 at 01:55:31PM -0800, Andrew Morton wrote:
>
> [..]
> > > > hoo boy that looks like an infinite loop. What's going on here?
> > >
> > > If allocation fails, I am trying to allocate it again in infinite loop.
> > > What should I do? Try it after sleeping a bit? Or give up after certain
> > > number of tries? This is in worker thread context though, so main IO path
> > > is not impacted.
> >
> > On a non-preemptible unprocessor kernel it's game over, isn't it?
> > Unless someone frees some memory from interrupt context it is time for
> > the Big Red Button.
>
> Yes. Its an issue on non-preemptible UP kernels. I changed the logic to
> msleep(10) before retrying. Tested on UP non-preemptible kernel with
> always failing allocation and things are fine.
>
> >
> > I'm not sure what to suggest, really - if an allocation failed then
> > there's nothing the caller can reliably do to fix that. The best
> > approach is to fail all the way back to userspace with -ENOMEM.
>
> As user space is not waiting for this allocation, -ENOMEM is really
> not an option.

Well, it would have to be -EIO, because the block layer is stupid about
errnos.

> >
> > In this context I suppose you could drop a warning into the logs then
> > bale out and retry on the next IO attempt.
>
> Yes, that also can be done. I found msleep(10) to be easier solution then
> remvoing group from list, and trying again when new IO comes in. Is this
> acceptable?

Seems a bit sucky to me. That allocation isn't *needed* for the kernel
to be able to complete the IO operation. It's just that we
(mis)designed things so that we're dependent upon it succeeding. Sigh.

msleep() will cause that kernel thread to contribute to load average
when it is in this state. Intentional?

> [..]
> >
> > btw, speaking of uniprocessor: please do perform a uniprocessor build
> > and see what impact the patch has upon the size(1) output for the .o
> > files. We should try to minimize the pointless bloat for the UP
> > kernel.
>
> But this logic is required both for UP and SMP kernels. So bloat on UP
> is not unnecessary?

UP doesn't need a per-cpu variable, hence it doesn't need to run
alloc_per_cpu() at all. For UP all we need to do is to aggregate a
`struct blkio_group_stats' within `struct blkg_policy_data'?

This could still be done with suitable abstraction and wrappers.
Whether that's desirable depends on how fat the API ends up, I guess.

> I ran size(1) on block/blk-cgroup.o with and without the patch and I can
> see some bloat.
>
> Without patch(UP kernel)
> ------------------------
> # size block/blk-cgroup.o
> text data bss dec hex filename
> 12950 5248 50 18248 4748 block/blk-cgroup.o
>
> With patch(UP kernel)
> ------------------------
> # size block/blk-cgroup.o
> text data bss dec hex filename
> 13316 5376 58 18750 493e block/blk-cgroup.o

Yeah.

The additional text imposes runtime overhead, but there's also
additional cost from things like the extra pointer hops to access the
per-cpu data.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/