Re: Some questions about linux kernel.

From: Andrea Arcangeli (andrea@suse.de)
Date: Sun Mar 12 2000 - 23:19:02 EST


On Sun, 12 Mar 2000, Adam wrote:

>
>> The idea I had a few weeks ago to solve the problem and so to find out the
>> hog (and that I'll experiment in real life 2.3.x soon) is to add a
>> per-task page fault rate (ala avg_slice). Once we'll know the page fault
>> rate and the time of the last fault per each process, we'll be almost able
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> to find out the memory hog without possible mistakes and we won't need
>> anything else.
>
>That sounds interesting. I would be interested to see actual statistic
>of the per task page faults.
>
>Few things which come to my mind:
>
> Some box with long uptime. init on it has been running since it
>was booted up. So it is likely to accumulate a lot pagefault over time.
>For example I have here this box with 484 days of uptime. Let say that
>tomorrow "the evil program" starts, It might it take a while before it
>accumulates more page faults than init. ... but I guess we don't need to
>worry about this scenarios since I think you talk about fault RATE not
>total faults.

Exactly. That's the whole point. Only the _rate_ is interesting (the
number of faults is non interesting obviously). And even the rate can be
uninteresting if the task is not doing a page fault since yesterday (see
underlined line in my previous email quoted above).

Between the tasks that are faulting recently (say in the last 30 seconds)
the one which has the huge fault rate is the one to kill. That's what I'll
try out very soon. I believe it will work fine.

> How about the kwapd or syslogd they are going to trying to do
>their jobs when system is hevaily loaded and are likely to generate lots
>of page faults too. I think they could become potential targs too, just
>because they are tying to do their job.

kswapd can't do page faults.

For syslogd if it has a higher page fault _rate_ than all the other tasks
it's also probably the hog (it may had a bug that caused it to exploed)
and it's right to kill it.

The only delicate thing to get right is the collapse time, when the system
is oom and almost dead. At that time the page fault rate decrease to once
fault per looots of time. We'll have to use a well tuned histeresis on the
page fault rate calculation to avoid to get fooled by the very low fault
rate during mem collapse.

>Just some thoughs. I'm NOT saying the idea is bad. I'm just CURIOUS what
>the stats will be.

Once we'll have the page fault rate and the time of the last fault, we can
also make them available via /proc for statistic purposes btw.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:23 EST