Re: [PATCH 1/6] AKT - Tunable structure and registration routines

From: Andi Kleen
Date: Tue Feb 13 2007 - 05:51:45 EST


On Tuesday 13 February 2007 11:18, Nadia Derbey wrote:
> Andi Kleen wrote:
> > Nadia.Derbey@xxxxxxxx writes:
> >
> >>+
> >>+This feature aims at making the kernel automatically change the tunables
> >>+values as it sees resources running out.
> >
> >
> > The only reason we have resource limit is to avoid DOS when one
> > resource consumes too much memory. When there is no such danger then
> > there isn't any reason to have a limit at all and it could be just
> > eliminated (or set to unlimited by default)
>
> Automatic tuning is a way to set the limit to unlimited, in a sense,
> doesn't it? With this feature, we can leave the default limits as they
> are for an "every-day" usage, and when a particular application runs on
> the machine, authorize the limit to grow up as needed.

That would be effectively no limit, so why not just do away with them
completely? You have to solve the DOS issues first of course, either way.

> > Your feature doesn't address the DOS and without that there isn't
> > any reason to have limits at all. So what's the point?
>
> As I told Eric Biederman in another mail, DoS in ensured in AKT by
> exporting the min and max values for each tunable to sysfs (actually
> Eric complained about these min and max :-( ). These are RW atrributes
> that make it possible for a sysadmin to set the max value a tunable can
> ever reach, instead of letting it grow up to huge values.

Then you could just set the limit always to that max value and would
get the same effect.

Min limit doesn't make much sense to me because near all Linux data structures
are on demand only anyways (excluding mempools and a few special cases)
and min is defined as the current number of used resources.

> Agree with you, BUT between the default max_files and the "too many
> files" situation, there is a gap that can be crossed by automatically
> tuning max_files, isn't there? e.g. max_file default value is NR_FILE
> (0x2000), while Oracle expects to have it set to 0x10000.

The reason NR_FILE is by default relatively low is mostly because the existing
ulimits suck :-) .i.e. you want to limit the memory consumption
of a user you limit the number of the user's processes and the number of files
per process -- then the max memory they can allocate in files is process ulimit*nr_files
[in theory in practice there are holes like in flight fds in unix socket fd passing]

But of course that ends up with either NR_FILE being far too low or process far
too low or too much memory anyways. In practice it doesn't work particularly
well.

The real solution for that is "beancounter" which has been posted in several
forms recently (from OpenVZ and from Google). Basically it adds real limits
(for much more than just file descriptors) per uid or container.

With such infrastructure in place NR_FILES could be set to unlimited, as long
as you have a reasonable (large) default per uid.

Similar reasoning applies to other resources which are currently limited.

I think you're just attacking the symptoms here instead of the basic issue.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/