Re: [PATCH] fix granularity of task_u/stime(), v2

From: Stanislaw Gruszka
Date: Mon Nov 23 2009 - 05:13:02 EST


On Fri, Nov 20, 2009 at 11:00:21AM +0900, Hidetoshi Seto wrote:
> >>> Could you please test this patch, if it solve all utime decrease
> >>> problems for you:
> >>>
> >>> http://patchwork.kernel.org/patch/59795/
> >>>
> >>> If you confirm it work, I think we should apply it. Otherwise
> >>> we need to go to propagate task_{u,s}time everywhere, which is not
> >>> (my) preferred solution.
> >> That patch will create another issue, it will allow a process to hide
> >> from top by arranging to never run when the tick hits.
> >
>
> Yes, nowadays there are many threads on high speed hardware,
> such process can exist all around, easier than before.
>
> E.g. assume that there are 2 tasks:
>
> Task A: interrupted by timer few times
> (utime, stime, se.sum_sched_runtime) = (50, 50, 1000000000)
> => total of runtime is 1 sec, but utime + stime is 100 ms
>
> Task B: interrupted by timer many times
> (utime, stime, se.sum_sched_runtime) = (50, 50, 10000000)
> => total of runtime is 10 ms, but utime + stime is 100 ms

How tis is probable, that task is running very long, but not getting
the ticks ? I know this is possible, otherwise we will not see utime
decreasing after do_sys_times() siglock fix, but how probable?

> You can see task_[su]time() works well for these tasks.
>
> > What about that?
> >
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index 1f8d028..9db1cbc 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -5194,7 +5194,7 @@ cputime_t task_utime(struct task_struct *p)
> > }
> > utime = (cputime_t)temp;
> >
> > - p->prev_utime = max(p->prev_utime, utime);
> > + p->prev_utime = max(p->prev_utime, max(p->utime, utime));
> > return p->prev_utime;
> > }
>
> I think this makes things worse.
>
> without this patch:
> Task A prev_utime: 500 ms (= accurate)
> Task B prev_utime: 5 ms (= accurate)
> with this patch:
> Task A prev_utime: 500 ms (= accurate)
> Task B prev_utime: 50 ms (= not accurate)
>
> Note that task_stime() calculates prev_stime using this prev_utime:
>
> without this patch:
> Task A prev_stime: 500 ms (= accurate)
> Task B prev_stime: 5 ms (= not accurate)
> with this patch:
> Task A prev_stime: 500 ms (= accurate)
> Task B prev_stime: 0 ms (= not accurate)
>
> >
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index ce17760..8be5b75 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -914,8 +914,8 @@ void do_sys_times(struct tms *tms)
> > struct task_cputime cputime;
> > cputime_t cutime, cstime;
> >
> > - thread_group_cputime(current, &cputime);
> > spin_lock_irq(&current->sighand->siglock);
> > + thread_group_cputime(current, &cputime);
> > cutime = current->signal->cutime;
> > cstime = current->signal->cstime;
> > spin_unlock_irq(&current->sighand->siglock);
> >
> > It's on top of Hidetoshi patch and fix utime decrease problem
> > on my system.
>
> How about the stime decrease problem which can be caused by same
> logic?

Yes, above patch screw up stime. Below should be a bit better, but
not solve objections you have:

diff --git a/kernel/exit.c b/kernel/exit.c
index f7864ac..17491ad 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -91,6 +91,8 @@ static void __exit_signal(struct task_struct *tsk)
if (atomic_dec_and_test(&sig->count))
posix_cpu_timers_exit_group(tsk);
else {
+ cputime_t utime, stime;
+
/*
* If there is any task waiting for the group exit
* then notify it:
@@ -110,8 +112,16 @@ static void __exit_signal(struct task_struct *tsk)
* We won't ever get here for the group leader, since it
* will have been the last reference on the signal_struct.
*/
- sig->utime = cputime_add(sig->utime, task_utime(tsk));
- sig->stime = cputime_add(sig->stime, task_stime(tsk));
+
+ utime = task_utime(tsk);
+ stime = task_stime(tsk);
+ if (tsk->utime > utime || tsk->stime > stime) {
+ utime = tsk->utime;
+ stime = tsk->stime;
+ }
+
+ sig->utime = cputime_add(sig->utime, utime);
+ sig->stime = cputime_add(sig->stime, stime);
sig->gtime = cputime_add(sig->gtime, task_gtime(tsk));
sig->min_flt += tsk->min_flt;
sig->maj_flt += tsk->maj_flt;

> According to my labeling, there are 2 unresolved problem [1]
> "thread_group_cputime() vs exit" and [2] "use of task_s/utime()".
>
> Still I believe the real fix for this problem is combination of
> above fix for do_sys_times() (for problem[1]) and (I know it is
> not preferred, but for [2]) the following:
>
> >> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> >> >> index 5c9dc22..e065b8a 100644
> >> >> --- a/kernel/posix-cpu-timers.c
> >> >> +++ b/kernel/posix-cpu-timers.c
> >> >> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> >> >>
> >> >> t = tsk;
> >> >> do {
> >> >> - times->utime = cputime_add(times->utime, t->utime);
> >> >> - times->stime = cputime_add(times->stime, t->stime);
> >> >> + times->utime = cputime_add(times->utime, task_utime(t));
> >> >> + times->stime = cputime_add(times->stime, task_stime(t));
> >> >> times->sum_exec_runtime += t->se.sum_exec_runtime;
> >> >>
> >> >> t = next_thread(t);
>

That works for me and I agree that this is right fix. Peter had concerns
about p->prev_utime races and additional need for further propagation of
task_{s,u}time() to posix-cpu-timers code. However I do not understand
these problems.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/