Re: [RFC PATCH v3 00/16] Core scheduling v3

From: Vineeth Remanan Pillai
Date: Mon Oct 21 2019 - 08:30:35 EST


On Mon, Oct 14, 2019 at 5:57 AM Aaron Lu <aaron.lu@xxxxxxxxxxxxxxxxx> wrote:
>
> I now remembered why I used max().
>
> Assume rq1 and rq2's min_vruntime are both at 2000 and the core wide
> min_vruntime is also 2000. Also assume both runqueues are empty at the
> moment. Then task t1 is queued to rq1 and runs for a long time while rq2
> keeps empty. rq1's min_vruntime will be incremented all the time while
> the core wide min_vruntime stays at 2000 if min() is used. Then when
> another task gets queued to rq2, it will get really large unfair boost
> by using a much smaller min_vruntime as its base.
>
> To fix this, either max() is used as is done in my patch, or adjust
> rq2's min_vruntime to be the same as rq1's on each
> update_core_cfs_min_vruntime() when rq2 is found empty and then use
> min() to get the core wide min_vruntime. Looks not worth the trouble to
> use min().

Understood. I think this case is a special case where one runqueue is empty
and hence the min_vruntime of the core should match the progressing vruntime
of the active runqueue. If we use max as the core wide min_vruntime, I think
we may hit starvation elsewhere. On quick example I can think of is during
force idle. When a sibling is forced idle, and if a new task gets enqueued
in the force idled runq, it would inherit the max vruntime and would starve
until the other tasks in the forced idle sibling catches up. While this might
be okay, we are deviating from the concept that all new tasks inherits the
min_vruntime of the cpu(core in our case). I have not tested deeply to see
if there are any assumptions which may fail if we use max.

The modified patch actually takes care of syncing the min_vruntime across
the siblings so that, core wide min_vruntime and per cpu min_vruntime
always stays in sync.

Thanks,
Vineeth