Re: [RFC] Parallelize IO for e2fsck

From: Theodore Tso
Date: Sat Jan 26 2008 - 08:23:20 EST


On Fri, Jan 25, 2008 at 05:55:51PM -0800, Bryan Henderson wrote:
> I was surprised to see AIX do late allocation by default, because IBM's
> traditional style is bulletproof systems. A system where a process can be
> killed at unpredictable times because of resource demands of unrelated
> processes doesn't really fit that style.
>
> It's really a fairly unusual application that benefits from late
> allocation: one that creates a lot more virtual memory than it ever
> touches. For example, a sparse array. Or am I missing something?

I guess it depends on how far you try to do "bulletproof". OSF/1 used
to use "bulletproof" as its default --- and I had to turn it off on
tsx-11.mit.edu (the first North American ftp server for Linux :-),
because the difference was something like 50 ftp daemons versus over
500 on the same server. It reserved VM space for the text segement of
every single process, since at least in theory, it's possible for
every single text page to get modified using ptrace if (for example) a
debugger were to set a break point on every single page of every
single text segement of every single ftp daemon.

You can also see potential problems for Java programs. Suppose you
had some gigantic Java Application (say, Lotus Notes, or Websphere
Application Server) which is taking up many, many, MANY gigabytes of
VM space. Now suppose the Java application needs to fork and exec
some trivial helper program. For that tiny instant, between the fork
and exec, the VM requirements in "bulletproof" mode would double,
since while 99.9999% of the time programs will immediately discard the
VM upon the exec, there is always the possibility that the child
process will touch every single data page, forcing a copy on write,
and never do the exec.

There are of course different levels of "bulletproof" between the
extremes of "totally bulletproof" and "late binding" from an
algorithmic standpoint. For example, you could ignore the needed
pages caused by ptrace(); more challenging would be to how to handle
the fork/exec semantics, although there could be kludges such as
strongly encouraging applications to use an old-fashed BSD-style
vfork() to guarantee that the child couldn't double VM requirements
between the vfork() and exec(). I certainly can't say for sure what
the AIX designers had in mind, and why they didn't choose one of the
more intermediate design choices.

However, it is fair to say that "100% bulletproof" can require
reserving far more VM resources than you might first expect. Even a
company which is highly incented to sell large amounts of hardware,
such as Digital, might not have wanted their OS to be only able to
support an embarassingly small number of simultaneous ftpd
connections. I know this for sure because the OSF/1 documentation,
when discussing their VM tuning knobs, specifically talked about the
scenario that I ran into with tsx-11.mit.edu.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/