Re: [PATCH 4/7] sched: implement force_cpus_allowed()

From: Tejun Heo
Date: Wed Dec 09 2009 - 00:25:03 EST

Next message: Stephen Rothwell: "linux-next: manual merge of the tty tree with the trivial tree"
Previous message: Jon Masters: "CONFIG_FTRACE_STARTUP_TEST in 2.6.32"
In reply to: Peter Zijlstra: "Re: [PATCH 4/7] sched: implement force_cpus_allowed()"
Next in thread: Peter Zijlstra: "Re: [PATCH 4/7] sched: implement force_cpus_allowed()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

On 12/08/2009 10:35 PM, Peter Zijlstra wrote:
>> Slow and indeterminism comes in different magnitudes.
>
> Determinism does _not_ come in magnitudes, its a very binary property,
> either it is or it is not.

It's semantics, right? The whole process is made closer to being
deterministic by making each component deterministic, so in that way
"more deterministic" is a meaningful expression or it also can
describe how deterministic it feels to human perception.

> As to the order of slowness for unplug, that is about maximal, its _the_
> slowest path in the whole kernel.

Long running works may run for minutes. They're slow to wait for as
perceived by human beings, so we're talking about pretty different
scales.

> Ok, maybe, but that is not what I would call a generic thread pool.

Sure, it's not completely generic, not yet anyway. The main focus is
to solve concurrency issues with the current workqueue implementation.
With concurrency issues solved, accomodating long running works
becomes quite easy - all we need to do is to migrate back unbound
workers during cup up - and as we have considerable number of users
which require such usage, it's reasonable to implement it.

> So the reason we have tons of idle workqueues around are purely because
> of deadlock scenarios? Or is there other crap about?

No, that's part of concurrency issues. Works don't devour and compete
for CPU cycles but they overlap each other frequently. Works are
often used to provide task context so that the users can use sleeping
synchronization constructs or wait for events.

> So why not start simple and only have one thread per cpu (lets call it
> events/#) and run all works there. Then when you enqueue a work and
> events/# is already busy with a work from anther wq, hand the work to a
> global event thread which will spawn a special single shot kthread for
> it, with a second exception for those reclaim wq's, for which you'll
> have this rescue thread which you'll bind to the right cpu for that
> work.
>
> That should get rid of all these gazillion threads we have, preserve the
> queue property and not be as invasive as your current thing.

That doesn't solve concurrency problem at all and we'll be ending up
bouncing around a LOT of works.

> If they're really as idle as reported you'll never need the fork-fest
> you currently propose, simply because there's not enough work.
>
> So basically, have events/# service the first non-empty cwq, when
> there's more non empty cwqs spawn them single shot threads, or use a
> rescue thread.

Idle doesn't mean they don't overlap. They aren't cpu cycle hungry
but are context hungry.

It just isn't possible to implement shared worker pool which can scale
according to necessary level of concurrency without doing some sort of
dynamic worker management which necessarily involves forking new ones
when in need and killing some of them off when there are more than
enough.

As for the force_cpus_allowed() bit, I think it's a rather natural
interface to have and maybe we can replace kthread_bind() with it or
make kthread_bind() in terms of it. It's the basic migration function
which adheres to the cpu hot plug/unplug synchronization rules.

>> I thought about adding an unbound pool of workers
>> for cpu intensive works for completeness but I really couldn't find
>> much use for that. If enough number of users would need something
>> like that, we can add an anonymous pool but for now I really don't see
>> the need to worry about that.
>
> And I though I'd heard multiple parties express interesting in exactly
> that, btrfs, bdi and pohmelfs come to mind, also crypto looks like one
> that could actually do some work.

Yeah, anything cryptography related or crunches large chunk of data
would be a good candidate but still compared to the works and kthreads
we have just to have context and be able to wait for things, they're
minority. Plus, it's something we can continue to work on if there
are enough reasons to do so. If accomodating long running works there
makes more sense, we'll do that but for most of them whether bound to
certain cpu or not just isn't intersting at all and all we need to
serve them is the ability to migrate back the threads during CPU UP.
It's pretty isolated path.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Stephen Rothwell: "linux-next: manual merge of the tty tree with the trivial tree"
Previous message: Jon Masters: "CONFIG_FTRACE_STARTUP_TEST in 2.6.32"
In reply to: Peter Zijlstra: "Re: [PATCH 4/7] sched: implement force_cpus_allowed()"
Next in thread: Peter Zijlstra: "Re: [PATCH 4/7] sched: implement force_cpus_allowed()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]