Re: [patch 0/7] cpuset writeback throttling

From: David Rientjes
Date: Thu Nov 06 2008 - 15:37:06 EST


On Thu, 6 Nov 2008, KAMEZAWA Hiroyuki wrote:

> > Agreed. This patchset is admittedly from a different time when cpusets
> > was the only relevant extension that needed to be done.
> >
> BTW, what is the problem this patch wants to fix ?
> 1. avoid slow-down of memory allocation by triggering write-out earlier.
> 2. avoid OOM by throttoling dirty pages.
>
> About 1, memcg's diry_ratio can help if mounted as
> mount -t cgroup none /somewhere/ -o cpuset,memory
> (If the user can accept overheads of memcg.)
> If implemented.
>

Yeah, it needs to be generalized to its own cgroup so that it doesn't
depend on both CONFIG_CPUSETS or CONFIG_CGROUP_MEM_RES_CTLR. If we get
the dirty and writeback page statistics added to memcg, this becomes much
simpler.

> About 2, A Google guy posted OOM handler cgroup to linux-mm.
>

Yeah, this could enable one of the workarounds that Christoph earlier
described: the oom handler has the ability to notify userspace and allows
it to defer invoking the oom killer if there's an alternative way to
remedy the situation. So the oom handler posted to linux-mm could work by
doing a sync anytime it ran low on memory, but the objective of this
patchset is different.

The idea here is to implement per-cpuset (and now per-memcg) dirty and
background dirty ratios to avoid using the global sysctls. This is
currently problematic for users of cpusets who divide their machine for
batches of tasks, usually for NUMA optimizations: a cpuset, for example,
can represent 40% of the system's memory and if the global dirty ratio is
set to 50%, we still won't begin writeback even if all the memory in the
cpuset is dirty.

> > If we are to support memcg-specific dirty ratios, that requires the
> > aforementioned statistics to be collected so that the calculation is even
> > possible. The series at
> >
> > http://marc.info/?l=linux-kernel&m=122123225006571
> > http://marc.info/?l=linux-kernel&m=122123241106902
> >
> yes. we(memcg) need this kind of.
>

Andrea, what's the status of the patch to add dirty and writeback
statistics to memcg? I don't see it in the October 30 mmotm or any
followup discussion on it.

> > is a step in that direction, although I'd prefer to see NR_UNSTABLE_NFS to
> > be extracted separately from MEM_CGROUP_STAT_FILE_DIRTY so
> > throttle_vm_writeout() can also use the new statistics.
> >

Is this possible in a second version?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/