Re: Help Resource Counters Scale better (v4)

From: Daisuke Nishimura
Date: Tue Aug 11 2009 - 23:26:50 EST


On Tue, 11 Aug 2009 16:31:59 -0700, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 11 Aug 2009 20:14:05 +0530
> Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
>
> > Enhancement: Remove the overhead of root based resource counter accounting
> >
> > From: Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>
> >
> > This patch reduces the resource counter overhead (mostly spinlock)
> > associated with the root cgroup. This is a part of the several
> > patches to reduce mem cgroup overhead. I had posted other
> > approaches earlier (including using percpu counters). Those
> > patches will be a natural addition and will be added iteratively
> > on top of these.
> >
> > The patch stops resource counter accounting for the root cgroup.
> > The data for display is derived from the statisitcs we maintain
> > via mem_cgroup_charge_statistics (which is more scalable).
> >
> > The tests results I see on a 24 way show that
> >
> > 1. The lock contention disappears from /proc/lock_stats
> > 2. The results of the test are comparable to running with
> > cgroup_disable=memory.
> >
> > Please test/review.
>
> I don't get it.
>
> The patch apepars to skip accounting altogether for the root memcgroup
> and then adds some accounting back in for swap. Or something like
> that. How come? Do we actually not need the root memcgroup
> accounting?
>
IIUC, this patch doesn't remove the root memcgroup accounting, it just changes
the counter for root memcgroup accounting from res_counter to cpustat[cpu] to reduce
the lock congestion of res_counter especially on a big platform.
Using res_counter(lock, check limit, charge) for root memcgroup would be overkill
because root memcgroup has no limit now(by memcg-remove-the-overhead-associated-with-the-root-cgroup.patch).
And, MEM_CGROUP_STAT_SWAPOUT would be needed to show memsw.usage_in_bytes of root
memcgroup. We didn't have cpustat[cpu] counter for swap accounting so far.

> IOW, the changelog sucks ;)
>
> Is this an alternative approach to using percpu_counters, or do we do
> both or do we choose one or the other? res_counter_charge() really is
> quite sucky.
>
> The patch didn't have a signoff.
>
> It would be nice to finalise those performance testing results and
> include them in the new, improved patch description.
>
agreed.


Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/