Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

From: Abel Wu
Date: Mon Nov 20 2023 - 07:06:31 EST

Next message: Peter Zijlstra: "Re: [PATCH 5/5] x86/tsc: Make __use_tsc __ro_after_init"
Previous message: Vladimir Oltean: "Re: [PATCH net-next v7 15/16] net: ethtool: ts: Let the active time stamping layer be selectable"
In reply to: Peter Zijlstra: "Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)"
Next in thread: Abel Wu: "Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 11/20/23 6:56 PM, Peter Zijlstra Wrote:

On Sat, Nov 18, 2023 at 01:14:32PM +0800, Abel Wu wrote:

Hi Peter, I'm a little confused here. As we adopt placement strategy #1
when PLACE_LAG is enabled, the lag of that entity needs to be preserved.
Given that the weight doesn't change, we have:

vl' = vl

But in fact it is scaled on placement:

vl' = vl * W/(W + w)

(W+w)/W

Ah, right. I misunderstood (again) the comment which says:

vl_i = (W + w_i)*vl'_i / W

So the current implementation is:

v' = V - vl'

and what I was proposing is:

v' = V' - vl

and they are equal in fact.

Does this intended?

The scaling, yes that's intended and the comment explains why. So now
you have me confused too :-)

Specifically, I want the lag after placement to be equal to the lag we
come in with. Since placement will affect avg_vruntime (adding one
element to the average changes the average etc..) the placement also
affects the lag as measured after placement.

Yes. You did the math in an iterative fashion and mine is facing the
final state:

v' = V' - vlag
V' = (WV + wv') / (W + w)

which gives:

V' = V - w * vlag / W

Or rather, if you enqueue and dequeue, I want the lag to be preserved.
If you do not take placement into consideration, lag will dissipate real
quick.

And to illustrate my understanding of strategy #1:

@@ -5162,41 +5165,17 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
* vl_i is given by:
*
* V' = (\Sum w_j*v_j + w_i*v_i) / (W + w_i)
- * = (W*V + w_i*(V - vl_i)) / (W + w_i)
- * = (W*V + w_i*V - w_i*vl_i) / (W + w_i)
- * = (V*(W + w_i) - w_i*l) / (W + w_i)
- * = V - w_i*vl_i / (W + w_i)
- *
- * And the actual lag after adding an entity with vl_i is:
- *
- * vl'_i = V' - v_i
- * = V - w_i*vl_i / (W + w_i) - (V - vl_i)
- * = vl_i - w_i*vl_i / (W + w_i)
- *
- * Which is strictly less than vl_i. So in order to preserve lag
- * we should inflate the lag before placement such that the
- * effective lag after placement comes out right.
- *
- * As such, invert the above relation for vl'_i to get the vl_i
- * we need to use such that the lag after placement is the lag
- * we computed before dequeue.
+ * = (W*V + w_i*(V' - vl_i)) / (W + w_i)
+ * = V - w_i*vl_i / W
*
- * vl'_i = vl_i - w_i*vl_i / (W + w_i)
- * = ((W + w_i)*vl_i - w_i*vl_i) / (W + w_i)
- *
- * (W + w_i)*vl'_i = (W + w_i)*vl_i - w_i*vl_i
- * = W*vl_i
- *
- * vl_i = (W + w_i)*vl'_i / W
*/
load = cfs_rq->avg_load;
if (curr && curr->on_rq)
load += scale_load_down(curr->load.weight);
-
- lag *= load + scale_load_down(se->load.weight);
if (WARN_ON_ONCE(!load))
load = 1;
- lag = div_s64(lag, load);
+
+ vruntime -= div_s64(lag * scale_load_down(se->load.weight), load);
}
se->vruntime = vruntime - lag;

So you're proposing we do:

v = V - (lag * w) / (W + w) - lag

What I 'm proposing is:

V' = V - w * vlag / W

so we have:

v' = V' - vlag
= V - vlag * w/W - vlag
= V - vlag * (W + w)/W

which is exactly the same as current implementation.

?

That can be written like:

v = V - (lag * w) / (W+w) - (lag * (W+w)) / (W+w)
= V - (lag * (W+w) + lag * w) / (W+w)
= V - (lag * (W+2w)) / (W+w)

And that turns into a mess AFAICT.

Let me repeat my earlier argument. Suppose v,w,l are the new element.
V,W are the old avg_vruntime and sum-weight.

Then: V = V*W / W, and by extention: V' = (V*W + v*w) / (W + w).

The new lag, after placement:

l' = V' - v = (V*W + v*w) / (W+w) - v
= (V*W + v*w) / (W+w) - v * (W+w) / (W+v)
= (V*W + v*w -v*W - v*w) / (W+w)
= (V*W - v*W) / (W+w)
= W*(V-v) / (W+w)
= W/(W+w) * (V-v)

Substitute: v = V - (W+w)/W * l, my scaling thing, to obtain:

l' = W/(W+w) * (V - (V - (W+w)/W * l))
= W/(W+w) * (V - V + (W+w)/W * l)
= W/(W+w) * (W+w)/W * l
= l

So by scaling, we've preserved lag across placement.

That make sense?

Yes, I think I won't misunderstand again for the 3rd time :)

Thanks!
Abel

Next message: Peter Zijlstra: "Re: [PATCH 5/5] x86/tsc: Make __use_tsc __ro_after_init"
Previous message: Vladimir Oltean: "Re: [PATCH net-next v7 15/16] net: ethtool: ts: Let the active time stamping layer be selectable"
In reply to: Peter Zijlstra: "Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)"
Next in thread: Abel Wu: "Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]