Re: [PATCH] sched: Provide iowait counters

From: Andrew Morton
Date: Sat Jul 25 2009 - 01:05:15 EST


On Fri, 24 Jul 2009 21:48:22 -0700 Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> wrote:

> Andrew Morton wrote:
> > On Fri, 24 Jul 2009 21:33:02 -0700 Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> wrote:
> >
> >> Andrew Morton wrote:
> >>> On Mon, 20 Jul 2009 11:31:47 -0700 Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> wrote:
> >>>
> >>>> For counting how long an application has been waiting for (disk) IO,
> >>>> there currently is only the HZ sample driven information available, while
> >>>> for all other counters in this class, a high resolution version is
> >>>> available via CONFIG_SCHEDSTATS.
> >>>>
> >>>> In order to make an improved bootchart tool possible, we also need
> >>>> a higher resolution version of the iowait time.
> >>>>
> >>>> This patch below adds this scheduler statistic to the kernel.
> >>> Doesn't this duplicate the delay accounting already available via the
> >>> taskstats interface?
> >> we have how long we wait. we do not have how long we iowait afaik...
> >> at least not in nanosecond granularity. (We do have the sampled data, but that
> >> is milisecond sampled data, not very useful for making charts based on time
> >> to show the sequence of events)
> >
> > See include/linux/sched.h's definition of task_delay_info - u64
> > blkio_delay is in nanoseconds. It uses
> > do_posix_clock_monotonic_gettime() internally.
>
> looks like it does.. to bad we don't expose that data in a /proc/<pid>/delay or something field
> like we do with the scheduler info...
>

I thought we did deliver a few of the taskstats counters via procfs,
but maybe I dreamed it. It would have been a rather bad thing to do.

taskstats has a large advantage over /proc-based things: it delivers a
packet to the monitoring process(es) when the monitored task exits. So
with no polling at all it is possible to gather all that information
about the just-completed task. This isn't possible with /proc.

There's a patch on the list now to teach taskstats to emit a packet at
fork- and exit-time too.

The monitored task can be polled at any time during its execution also,
like /proc files.

Please consider switching whatever-you're-working-on over to use
taskstats rather than adding (duplicative) things to /proc (which
require CONFIG_SCHED_DEBUG, btw).

If there's stuff missing from taskstats then we can add it - it's
versioned and upgradeable and is a better interface. It's better
to make taskstats stronger than it is to add /proc/pid fields, methinks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/