Re: [GIT PULL] cputime patch for 2.6.30-rc6

From: Martin Schwidefsky
Date: Mon May 25 2009 - 07:36:21 EST


On Mon, 25 May 2009 13:09:26 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Mon, 2009-05-25 at 12:50 +0200, Martin Schwidefsky wrote:
> > On Tue, 19 May 2009 11:00:35 +0200
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > > So, I'm really not objecting too much to the patch at hand, but I'd love
> > > to find a solution to this problem.
> >
> > It is not hard so solve the problem for /proc/uptime, e.g. like this:
> >
> > static u64 uptime_jiffies = INITIAL_JIFFIES;
> > static struct timespec ts_uptime;
> > static struct timespec ts_idle;
> >
> > static int uptime_proc_show(struct seq_file *m, void *v)
> > {
> > cputime_t idletime;
> > u64 now;
> > int i;
> >
> > now = get_jiffies_64();
> > if (uptime_jiffies != now) {
> > uptime_jiffies = now;
> > idletime = cputime_zero;
> > for_each_possible_cpu(i)
> > idletime = cputime64_add(idletime,
> > kstat_cpu(i).cpustat.idle);
> > do_posix_clock_monotonic_gettime(&ts_uptime);
> > monotonic_to_bootbased(&ts_uptime);
> > cputime_to_timespec(idletime, &ts_idle);
> > }
> >
> > seq_printf(m, "%lu.%02lu %lu.%02lu\n",
> > (unsigned long) ts_uptime.tv_sec,
> > (ts_uptime.tv_nsec / (NSEC_PER_SEC / 100)),
> > (unsigned long) ts_idle.tv_sec,
> > (ts_idle.tv_nsec / (NSEC_PER_SEC / 100)));
> > return 0;
> > }
> >
> > For /proc/stat it is less clear. Just storing the values in static
> > variables is not such a good idea as there are lots of values.
> > 10*NR_CPUS + NR_IRQS values to be exact. With NR_CPUS in the thousands
> > this will waste quite a bit of memory.
>
> Right, I know of for_each_possible_cpu() loops that took longer than a
> jiffy and caused general melt-down -- not saying the loop for idle time
> will be one such a loop, but then it seems silly anyway, who's
> incrementing the idle time when we're idle?

Psst, I do ;-) Look at the arch_idle_time macro in fs/proc/stat.c..

> I really prefer using things like percpu_counter/vmstat that have error
> bounds that scale with the number of cpus in the system.
>
> We simply have to start educating people that numbers on the global
> state of the machine are inaccurate (they were anyway, because by the
> time the userspace bits that read the /proc file get scheduled again the
> numbers will have changed again).

That is one problem, the other is that the values you'll get are not
atomic in any way. Not even the totals in /proc/stat match the sum over
the cpus.

> There's a variant of Heisenberg's uncertainty principle applicable to
> (parallel) computers in that one either gets concurrency or accuracy on
> global state, you cannot have both.

If the time you need to generate a value is longer than the maximum
error you do have a problem.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/