Re: [PATCH rfc] workqueue: honour cond_resched() more effectively.

From: NeilBrown
Date: Mon Nov 09 2020 - 21:26:38 EST


On Mon, Nov 09 2020, Peter Zijlstra wrote:

> On Mon, Nov 09, 2020 at 01:50:40PM +0000, Trond Myklebust wrote:
>> On Mon, 2020-11-09 at 09:00 +0100, Peter Zijlstra wrote:
>
>> > I'm thinking the real problem is that you're abusing workqueues. Just
>> > don't stuff so much work into it that this becomes a problem. Or
>> > rather,
>> > if you do, don't lie to it about it.
>>
>> If we can't use workqueues to call iput_final() on an inode, then what
>> is the point of having them at all?
>
> Running short stuff, apparently.

Also running stuff that sleeps. If only does work in short bursts, and
sleeps between the works, it can run as long as it likes.
It is only sustained bursts that are currently not supported with
explicit code.

>
>> Neil's use case is simply a file that has managed to accumulate a
>> seriously large page cache, and is therefore taking a long time to
>> complete the call to truncate_inode_pages_final(). Are you saying we
>> have to allocate a dedicated thread for every case where this happens?
>
> I'm not saying anything, but you're trying to wreck the scheduler
> because of a workqueue 'feature'. The 'new' workqueues limit concurrency
> by design, if you're then relying on concurrency for things, you're
> using it wrong.
>
> I really don't know what the right answer is here, but I thoroughly hate
> the one proposed.

Oh good - plenty for room for improvement then :-)

I feel strongly that this should work transparently. Expecting people
too choose the right option to handle cases that don't often some up in
testing is naive.
A warning whenever a bound,non-CPU-intensive worker calls cond_resched()
is trivial to implement and extremely noise. As mentioned, I get twenty
just to boot.

One amusing example is rhashtable which schedule a worker to rehash a
table. This is expected to be cpu-intensive because it calls
cond_resched(), but it is run with schedule_work() - clearly not
realizing that will block other scheduled work on that CPU.

An amusing example for the flip-side is crypto/cryptd.c which creates a
WQ_CPU_INTENSIVE workqueue (cryptd) but the cryptd_queue_worker() has
a comment "Only handle one request at a time to avoid hogging crypto
workqueue." !!! The whole point of WQ_CPU_INTENSIVE is that you cannot
hog the workqueue!!

Anyway, I digress.... warning on ever cond_resched() generates lots of
warnings, including some from printk.... so any work item that might
ever print a message needs to be CPU_INTENSIVE???
I don't think that scales.

Is there some way the scheduler can help? Does the scheduler notice
"time to check on that CPU over there" and then:
- if it is in user-space- force it to schedule
- if it is in kernel-space (and preempt is disabled), then leave it
alone
??

If so, could there be a third case - if it is a bound,non-cpu-intensive
worker, switch it to cpu-intensive???

I wonder how long workers typically run - do many run long enough that
the scheduler might want to ask them to take a break?

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature