Re: workqueue thing

From: Tejun Heo
Date: Mon Dec 21 2009 - 08:31:43 EST


Hello, Peter.

On 12/21/2009 06:22 PM, Peter Zijlstra wrote:
> On Mon, 2009-12-21 at 12:04 +0900, Tejun Heo wrote:
>> When IO goes wrong, in extreme
>> cases, it can easily take over thirty secs to recover and that's
>> required by the hardware specifications, so anything which ends up
>> waiting on IO can take a pretty long time. The only piece of code
>> which is necessary to support that is the code necessary to migrate
>> back tasks to CPUs when they come online again. It's not a lot of
>> ugly code.
>
> Why does it need to get migrated back, there are no affinity promises if
> you allow hotplug to continue, so it might as well complete and continue
> on the other cpu.
>
> And yes, it is a lot of very ugly code.

Migrating to online but !active CPU is necessary to call rescuers
during CPU_DOWN_PREPARE which is necessary to guarantee forward
progress during cpu down operation. Given that, the only extra code
which is necessary purely for migrating back when a CPU comes back
online is a few tens of lines of code which handles TRUSTEE_RELEASE
case. That's not a lot. If we do it differently (ie. let unbound
workers not process new works, just drain and let them die), it will
take more code.

I think you're primarily concerned with the scheduler modifications
and think that the choose-between-two-masks on migration is ugly. I
agree it's not the prettiest thing in this world but then again it's
not a lot of code. The reason why it looks ugly is because the way
migration is implemented and parameter is passed in. API-wise, I
think making kthread_bind() synchronized against cpu onliness should
be pretty clean.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/