Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()

From: Konstantin Khlebnikov
Date: Fri Apr 20 2012 - 15:23:32 EST

Oleg Nesterov wrote:
On 04/13, Oleg Nesterov wrote:

Damn, Konstantin I have to admit, I'll try to find another technical
reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
even with this fix ;)

Most probably I am wrong, but it looks overcomplicated. Somehow I
dislike irrationally the fact you moved mm_release() from exit_mm().

And perhaps you can help me to discredit your patch?

It turns out, I do not really understand this code in do_exit:

/* sync mm's RSS info before statistics gathering */
if (tsk->mm)

Which "statistics gathering" ? Probably I missed something, but
after the quick grep it seems to me that this is only needed for

So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?

Yes, this way we do not "account" put_user(clear_child_tid) but
I think we do not care.

Why we don't care? Each thread can corrupt these counters by one.
I do not think that we are satisfied with nearly accurate rss accounting.
+/- one page for each clone()-exit().

Actually I don't really like this per-task rss-delta.
Probably it would be better to use per-cpu counters.

IOW, what do you think about the trivial patch below? Uncompiled,
untested, probably incomplete. acct_update_integrals() looks
suspicious too.

what a mess! =)


--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
mm = get_task_mm(p);
if (mm) {
+ sync_mm_rss(mm);
/* adjust to KB unit */
stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
mm_release(tsk, mm);
if (!mm)
+ sync_mm_rss(mm);
* Serialize with any possible pending coredump.
* We must hold mmap_sem around checking core_state
@@ -960,9 +962,6 @@ void do_exit(long code)

- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);

if (old_mm) {
+ sync_mm_rss(old_mm);
* Make sure that if there is a core dump in progress
* for the old mm, we get out and die instead of going

