Re: [RFC] process wide itimer cruft

From: Peter Zijlstra
Date: Tue Feb 03 2009 - 12:52:11 EST


On Tue, 2009-02-03 at 18:23 +0100, Oleg Nesterov wrote:
> On 02/03, Peter Zijlstra wrote:
> >
> > On Mon, 2009-02-02 at 09:53 +0100, Peter Zijlstra wrote:
> >
> > I'm punting the sum-all-threads work off to a workqueue,
>
> I don't really understand how this works, but I didn't try to read
> this part carefully. For example, when we call thread_group_cputime()
> we don't really get the "group" statistics immediately? But this looks
> very interesting anyway.

Because our thread group can be extremely large and take longer than a
jiffy to sum up -- this is the situation that started all this itimer
tinkering.

However, Ingo spoke to me on IRC and suggested another approach, which
I'm currently working on -- hopefully done tomorrow.

> > The remaining option is to make signal struct itself rcu freed, but
> > before I do that, I thought I'd run this code by some folks.
>
> I think we should follow the Ingo's suggestion: we should make ->signal
> refcountable, we should never clear task->signal, it should be freed
> by __put_task_struct()'s path.

Right, that'd make a lot of sense.

> In fact I was going to make this patches the previous week, will try
> to do this week. But we need another counter for that, we can't use
> signal->count.

I'm not quite sure I understand all that code quite yet, although I've
been staring at it for the past day or so.

->live -- the number of associated tasks,
->count -- not quite a refcount?

I can see adding a 3rd counter for reference counting could solve
things, but can we start by clarifying the exact semantics of these two?
If only for future readers..

> This blows signal_struct a bit, but otoh with this change we can
> move some fields (for example, ->group_leader) to signal_struct.
> And we can do many simplifications. Just for example, __sched_setscheduler()
> takes ->siglock just to read signal->rlim[].

Could you shed a bit of light on the distinction between sighand and
signal?

> > @@ -96,14 +105,16 @@ static void __exit_signal(struct task_struct *tsk)
> > spin_lock(&sighand->siglock);
> >
> > posix_cpu_timers_exit(tsk);
> > - if (atomic_dec_and_test(&sig->count))
> > + if (!atomic_read(&sig->live)) {
> > posix_cpu_timers_exit_group(tsk);
>
> This doesn't look exactly right, but I don't see the "real" problems
> with this change.
>
> We can have a lot of threads which didn't even pass exit_notify(),
> another process can attach the cpu timer to us once we drop the
> locks. OK, no real problems afaics, because each sub-thread will
> in turn do posix_cpu_timers_exit_group() later.

Yeah, you can get multiple invocations of the
posix_cpu_timers_exit_group() stuff, and less summing if dead tasks, the
latter might be an issue.

> But this looks a bit too early. It is better to continue to account
> these threads, they can consume a lot of cpu. Anyway, this very
> minor issue.

Agreed.

> > - else {
> > + sig->curr_target = NULL;
>
> complete_signal() can crash if it hits ->curr_target = NULL, and
> we are still "visible" to signals even if sig->live == 0.

Ooh, missed that. Good catch indeed.

> > + } else {
> > /*
> > * If there is any task waiting for the group exit
> > * then notify it:
> > */
> > - if (sig->group_exit_task && atomic_read(&sig->count) == sig->notify_count)
> > + if (sig->group_exit_task &&
> > + atomic_read(&sig->live) == sig->notify_count)
>
> This looks wrong. de_thread() can hang forever, put_signal() doesn't
> wake up ->group_exit_task.
>
> I think we really need another counter, at least for now.

Don't rush on my account, Ingo's proposed solution doesn't need this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/