Re: [RFC] sched/eevdf: sched feature to dismiss lag on wakeup

From: Tobias Huschle
Date: Wed Mar 06 2024 - 06:32:30 EST


On Thu, Feb 29, 2024 at 09:06:16AM +0530, K Prateek Nayak wrote:
> (+ Xuewen Yan, Ke Wang)
>
> Hello Tobias,
>
<...>
> >
> > Questions:
> > 1. The kworker getting its negative lag occurs in the following scenario
> > - kworker and a cgroup are supposed to execute on the same CPU
> > - one task within the cgroup is executing and wakes up the kworker
> > - kworker with 0 lag, gets picked immediately and finishes its
> > execution within ~5000ns
> > - on dequeue, kworker gets assigned a negative lag
> > Is this expected behavior? With this short execution time, I would
> > expect the kworker to be fine.
> > For a more detailed discussion on this symptom, please see:
> > https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/
>
> Does the lag clamping path from Xuewen Yan [1] work for the vhost case
> mentioned in the thread? Instead of placing the task just behind the
> 0-lag point, clamping the lag seems to be more principled approach since
> EEVDF already does it in update_entity_lag().
>
> If the lag is still too large, maybe the above coupled with Peter's
> delayed dequeue patch can help [2] (Note: tree is prone to force
> updates)
>
> [1] https://lore.kernel.org/lkml/20240130080643.1828-1-xuewen.yan@xxxxxxxxxx/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e62ef63a888c97188a977daddb72b61548da8417

I tried Peter's patches a while ago. Unfortunately, reducing the lag
is not sufficient in this particular case. The calling entity expects
the woken up kworker to run instantly.

In order to have a chance that the woken up kworker is scheduled right
away, the kworker must not have any negative lag. To guarantee it being
scheduled it should even have a positive lag which allows it to pass
all other entities on the queue.

Therefore I proposed to just wipe the negative lag in these cases,
which seems to map to strategy #2 of the underlying paper.

The other way to think about this would be:
The assumption, that woken up tasks get a high probability to run
is no longer valid. In that case, the entity triggering the wake
up has to explicitly give up the CPU. If there are no other tasks,
apart from the 2 involved so far, has good chances of being
scheduled. If the runqueue is busy, other tasks might intervene.

I keep playing around with these options, but potential side effects
are worrying me.

>
<...>