Re: [PATCH] introduce get_mm_hiwater_xxx(), fixtaskstats->hiwater_xxx accounting

From: Oleg Nesterov
Date: Sun Dec 07 2008 - 11:21:34 EST


(sorry for delay, I am travelling till 11 Dec)

On 12/06, Balbir Singh wrote:
>
> * Hugh Dickins <hugh@xxxxxxxxxxx> [2008-12-06 09:56:19]:
>
> > On Sat, 6 Dec 2008, Balbir Singh wrote:
> > >
> > > Yes, true and getdelays can display all the exported information.
> > >
> > > The race does seem concerning, I would vote for keeping the update in
> > > there and disabling preemption around the update, so that hiwater
> > > cannot swing back and forth.
> >
> > ?? Oleg is _fixing_ a race by removing the update from do_exit();
> > and he is fixing the way the hiwater info was collected in tsacct.c.
>
> I see that change and the reasoning seems accurate that we can query
> the task at anytime, but I worry that if taskstats is not enabled, we'll
> never call update_hiwater.* on the exiting task.

With this patch, even if taskstats _is_ enabled, we never call update_
on do_exit() path. Because there is no point to do this.

> I wonder if a thread came in and like Oleg said, did (without taskstats
> enabled)
>
> free(malloc(some size)), followed by exit()
>
> whether task_mem() would show the correct results for hiwater.*.

unlike taskstats, task_mem() doesn't rely on update_hiwater_xxx(),
it reads the current values and calculates the maximum. And this is
the "right thing".

update_hiwater_xxx() is only needed when we are going to decrease
the current value, so we can lose the info if we don't calculate
the maximum right now.

We can disable preemption around update_ in do_exit(), but this
doesn't close the race. We can even disable irqs but this (in
theory) is not enough either. But the main point we do not need
to update.

And please note that taskstats was wrong even if update_ was not
racy. Exactly because it relies on update_ in do_exit(), but it
should not.

As for ru_maxrss accounting, we can keep these update_hiwater_xxx()
calls in do_exit() and then use mm->hiwater_xxx directly, but we
should check group_dead in that case. I don't really think this
would be cleaner/better, and then we have the similar problems with
CLONE_VM tasks.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/