Re: [RFC][0/3] Virtual address space control for cgroups (v2)

From: Balbir Singh
Date: Fri Mar 28 2008 - 14:17:35 EST


Paul Menage wrote:
> On Thu, Mar 27, 2008 at 8:59 PM, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
>> > Java (or at least, Sun's JRE) is an example of a common application
>> > that does this. It creates a huge heap mapping at startup, and faults
>> > it in as necessary.
>> >
>>
>> Isn't this controlled by the java -Xm options?
>>
>
> Probably - that was just an example, and the behaviour of Java isn't
> exactly unreasonable. A different example would be an app that maps a
> massive database file, but only pages small amounts of it in at any
> one time.
>
>> I understand, but
>>
>> 1. The system by default enforces overcommit on most distros, so why should we
>> not have something similar and that flexible for cgroups.
>
> Right, I guess I should make it clear that I'm *not* arguing that we
> shouldn't have a virtual address space limit subsystem.
>
> My main arguments in this and my previous email were to back up my
> assertion that there are a significant set of real-world cases where
> it doesn't help, and hence it should be a separate subsystem that can
> be turned on or off as desired.
>
> It strikes me that when split into its own subsystem, this is going to
> be very simple - basically just a resource counter and some file
> handlers. We should probably have something like
> include/linux/rescounter_subsys_template.h, so you can do:
>
> #define SUBSYS_NAME va
> #define SUBSYS_UNIT_SUFFIX in_bytes
> #include <linux/rescounter_subsys_template.h>
>
> then all you have to add are the hooks to call the rescounter
> charge/uncharge functions and you're done. It would be nice to have a
> separate trivial subsystem like this for each of the rlimit types, not
> just virtual address space.
>

OK, I'll consider doing a separate controller, once we get the mm->owner issue
sorted out.

>> And specifying
>> > them manually requires either unusually clueful users (most of whom
>> > have enough trouble figuring out how much physical memory they'll
>> > need, and would just set very high virtual address space limits) or
>> > sysadmins with way too much time on their hands ...
>> >
>>
>> It's a one time thing to setup for sysadmins
>>
>
> Sure, it's a one-time thing to setup *if* your cluster workload is
> completely static.
>
>> > As I said, I think focussing on ways to tell apps that they're running
>> > low on physical memory would be much more productive.
>> >
>>
>> We intend to do that as well. We intend to have user space OOM notification.
>
> We've been playing with a user-space OOM notification system at Google
> - it's on my TODO list to push it to mainline (as an independent
> subsystem, since either cpusets or the memory controller can be used
> to cause OOMs that are localized to a cgroup). What we have works
> pretty well but I think our interface is a bit too much of a kludge at
> this point.

It's good to know you have something generic working. I was planning to start
work on it later.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/