Re: [PATCH] softlockup: remove hung_task_check_count

From: Frédéric Weisbecker
Date: Wed Jan 21 2009 - 08:14:47 EST


2009/1/21 Ingo Molnar <mingo@xxxxxxx>:
>
> * Mandeep Singh Baines <msb@xxxxxxxxxx> wrote:
>
>> read_lock(&tasklist_lock);
>> do_each_thread(g, t) {
>> - if (!--max_count)
>> - goto unlock;
>> + if (!--max_count) {
>> + /*
>> + * Drop the lock every once in a while and resched if
>> + * necessary. Don't want to hold the lock too long.
>> + */
>> + get_task_struct(t);
>> + read_unlock(&tasklist_lock);
>> + max_count = HUNG_TASK_CHECK_COUNT;
>> + if (need_resched())
>> + schedule();
>> + read_lock(&tasklist_lock);
>> + put_task_struct(t);
>> + /*
>> + * t was unlinked from tasklist. Can't continue in this
>> + * case. Exit and try again next time.
>> + */
>> + if (t->state == TASK_DEAD)
>> + goto unlock;
>> + }
>
> firstly, this bit should move into a helper function. Also, why dont you
> do the need_resched() check first (it's very lighweight) - and thus only
> do the heavy ops (get-task-struct & tasklist_lock unlock) if that is set?
>
> But most importantly, isnt the logic above confused? --max_count will
> reach zero exactly once, and then we'll loop for a long time.
>
> Ingo
>

Thanks Mandeep for this patch.

By reading Ingo's review, I notice that the checks done by
check_hung_task() are quite lightweight and then quick.
I guess your loop is able to go through a good number of tasks before
actually needing to be resched.
Perhaps the max_count can be dropped actually, since it is a kind of
arbitrary chosen constant.
So you would just need to check need_resched() at each iteration.

ie (should be split in helper functions):

bool retry = false;

do {
read_lock(&tasklist_lock);
do_each_thread(g, t) {
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, now, timeout);
if (need_resched()) {
get_task_struct(t);
read_unlock(&tasklist_lock);
schedule();
read_lock(&tasklist_lock);
put_task_struct(t);

if (t->state == TASK_DEAD) {
retry = true;
goto unlock;
}
}

} while_each_thread(g, t);
unlock:
read_unlock(&tasklist_lock);
} while(retry);


There are good chances that your "t" task hasn't died, but if so, you
can just retry the whole thing.
Perhaps that should require some tests to see if there is a good
average of t->stat == TASK_DEAD path not taken,
but of course, it depends on what the system is doing most.

What do you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/