Re: [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus().
From: Stephen Champion
Date: Mon Aug 04 2008 - 09:13:43 EST
Eric W. Biederman wrote:
Robin Holt <holt@xxxxxxx> writes:
Oops, confusing details. That was a different problem we had been
tracking.
Which leads back to the original question. What were you measuring
that showed improvement with a larger pid hash size?
Almost by definition a larger hash table will perform better. However
my intuition is that we are talking about something that should be in
the noise for most workloads.
Robin asked me to chime in on this, as I did the early "look at that"
work and suggested it to Robin.
I noticed the potential for increasing pid_shift while chasing down a
patch to our kernel (2.6.16 stable based) which had proc_pid_readdir()
calling find_pid() for init_task through the highest pid #. This patch
caused a rather serious problem on a 2048 core Altix. Before
identifying the culprit, I increased pidhash_shift. This made a *huge*
difference: enough to get the box marginally functional while I tracked
down the origins of the problem.
After backing out the problematic patch, I took a look at pidhash_shift
in normal circumstances: With pidhash_shift == 12, running only a few
common services and monitoring tools (sendmail, nagios, etc for ~28k
active processes, mostly of the kernel variety), the 20 cpu boot cpuset
we use on that system to confine normal system processes and interactive
logins was spending >1% of it's time in find_pid(), and an 'ls /proc >
/dev/null' took >0.4s. With pidhash_shift == 16, the timing went to
<0.2, and the total time spent in find_pid() was reduced to noise level.
In addition to raising the limit on larger systems, it looked reasonable
to scale the pid hash with the # processors instead of memory. While I
observed variably high process:cpu ratios on small systems (2c - 32c),
they also have relatively few processes. The 192c - 2048c systems I was
able to look at were all hovering at 13 +/- 2 processes per cpu, even
with wildly varying memory sizes.
Despite more recent changes in proc_pid_readdir, my results should apply
to current source. It looks like both the old 2.6.16 implementation and
the current version will call find_pid (or equivalent) once for each
successive getdents() call on /proc, excepting when the cursor is on the
first entry. A quick look, and we have 88 getdents64() calls both 'ps'
and 'ls /proc' with 29k processes running, which appears to be the
primary source of calls.
It's not giganormous, although I probably could come up with a pointless
microbenchmark to show it's 300% better. Importantly, it does
noticeably improve normal interactive tools like 'ps' and 'top', a
performance visualization tool developed by a customer (nodemon)
refreshes faster. For a 512k init allocation, that seems like a very
good deal.
I'd like to lose 20,000 kernel processes in addition to growing the pid
hash!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/