Re: [RFC][PATCH] ps command race fix take2 [1/4] list token

From: Eric W. Biederman
Date: Wed Aug 23 2006 - 02:09:33 EST


KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> writes:

> On Tue, 22 Aug 2006 10:56:08 -0600
> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote:
>
>> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> writes:
>>
>> > This is ps command race fix take2. Unfortunately, against 2.6.18-rc4.
>> > I'll rebase this to appropriate kernel if O.K. (I think this is RFC)
>> >
>> > This patch implements Paul Jackson's idea, 'inserting false link in task
> list'.
>>
>> Currently the tasklist_lock is one of the more highly contended locks in
>> the kernel. Adding an extra place it is taken is undesirable.
> yes. taking lock is a probem.
> I know current readdir() uses 8192 bytes buffer for getdents64(). Then,
> maybe write-lock will be acquired all-tgids/400+ times for inserting token
> (in 32bit system).
>
>> If could see a better algorithm for sending a signal to all processes
>> in a process groups we could remove the tasklist_lock entirely.
>>
> ??
> Sorry, could you explain more ?

The core problem is not when there is a single user. The problem is
that no matter how large the system gets we have a single lock. So it
gets increasingly contended.

I almost removed the tasklist_lock from all read paths. But as it
happens sending a signal to a process group is an atomic operation
with respect to fork so that path has to take the lock, or else
we get places where "kill -9 -pgrp" fails to kill every process in
the process group. Which is even worse.


>> In addition you only solves half the readdir problems. You don't solve
>> the seek problem which is returning to an offset you had been to
>> before. A relatively rare case but...
>>
> Ah, I should add lseek handler for proc root. Okay.

Hmm. Possibly. Mostly what I was thinking is that a token in the
list simply cannot solve the problem of a guaranteeing lseek to a
previous position works. I really haven't looked closely on
how you handle that case.

>> > Good point of this approach is cost of searching task is O(N) (N=num of
> tgids).
>> > Bad point is lock and kmalloc/kfree.
>> > I didin't modified thread_list and cpuset's proc list, maybe future work.
>> >
>> > If searching pid bitmap is better, please take Erics.
>>
>> My patch at least needs a good changelog but I believe it will work
>> better and can be further improved with a better pid data structure
>> if there is actually a problem there. Given that I don't take
>> any locks it should be much friendlier at scale, and the code
>> was simpler.
> yes. it has several good points and simple.
> My patch's point is just using task_list if we can, because it exists for
> keeping
> all tasks(tgids).

One of the reasons I have an issue with it, is that with the
impending introduction of multiple pid spaces is that the task list
really isn't what we want to traverse.

>> However I will miss a few newly forked processes and I don't think your
>> technique will miss any. Still neither will miss a process that
>> existed the entire time.
>>
>> If nothing else I think it was worth posting so we could contrast the two.
>>
> please post again. I think comparing the two is good.
> I will post take3 with improved comments and lseek handler, and so
> on.

I intend to, I'm unfortunately busy in another direction at the
moment.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/