Re: [PATCH] psi: fix possible trigger missing in the window

From: Suren Baghdasaryan
Date: Mon Dec 20 2021 - 14:58:55 EST


On Fri, Dec 17, 2021 at 10:03 PM Zhaoyang Huang <huangzhaoyang@xxxxxxxxx> wrote:
>
> loop Suren

Thanks.


>
> On Fri, Dec 17, 2021 at 2:08 PM Huangzhaoyang <huangzhaoyang@xxxxxxxxx> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> >
> > There could be missing wake up if the rest of the window remain the
> > same stall states as the polling_total updates for every polling_min_period.

Could you please expand on this description? I'm unclear what the
problem is. I assume "polling_min_period" in this description refers
to the group->poll_min_period.

>From the code, looks like the change results in update_triggers()
calling window_update() once there was a new stall recorded for the
trigger state and until the tracking window is complete. I don't see
the point of calling window_update() if there was no stall change
since the last call to window_update(). The resulting growth will not
increase if there is no new stall.
Maybe what you want to achieve here is more than one trigger per
window if the stall limit was breached? If so, then this goes against
the design for psi triggers in which we want to rate-limit the number
of generated triggers per tracking window (see:
https://elixir.bootlin.com/linux/latest/source/kernel/sched/psi.c#L545).
Please clarify the issue and the intentions here.
Thanks!

> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > ---
> > include/linux/psi_types.h | 2 ++
> > kernel/sched/psi.c | 30 ++++++++++++++++++------------
> > 2 files changed, 20 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
> > index 0a23300..9533d2e 100644
> > --- a/include/linux/psi_types.h
> > +++ b/include/linux/psi_types.h
> > @@ -132,6 +132,8 @@ struct psi_trigger {
> >
> > /* Refcounting to prevent premature destruction */
> > struct kref refcount;
> > +
> > + bool new_stall;
> > };
> >
> > struct psi_group {
> > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> > index 1652f2b..402718c 100644
> > --- a/kernel/sched/psi.c
> > +++ b/kernel/sched/psi.c
> > @@ -458,9 +458,12 @@ static void psi_avgs_work(struct work_struct *work)
> > static void window_reset(struct psi_window *win, u64 now, u64 value,
> > u64 prev_growth)
> > {
> > + struct psi_trigger *t = container_of(win, struct psi_trigger, win);
> > +
> > win->start_time = now;
> > win->start_value = value;
> > win->prev_growth = prev_growth;
> > + t->new_stall = false;
> > }
> >
> > /*
> > @@ -515,7 +518,6 @@ static void init_triggers(struct psi_group *group, u64 now)
> > static u64 update_triggers(struct psi_group *group, u64 now)
> > {
> > struct psi_trigger *t;
> > - bool new_stall = false;
> > u64 *total = group->total[PSI_POLL];
> >
> > /*
> > @@ -523,19 +525,26 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > * watchers know when their specified thresholds are exceeded.
> > */
> > list_for_each_entry(t, &group->triggers, node) {
> > - u64 growth;
> > -
> > /* Check for stall activity */
> > if (group->polling_total[t->state] == total[t->state])
> > continue;
> >
> > /*
> > - * Multiple triggers might be looking at the same state,
> > - * remember to update group->polling_total[] once we've
> > - * been through all of them. Also remember to extend the
> > - * polling time if we see new stall activity.
> > + * update the trigger if there is new stall which will be
> > + * reset when run out of the window
> > */
> > - new_stall = true;
> > + t->new_stall = true;
> > +
> > + memcpy(&group->polling_total[t->state], &total[t->state],
> > + sizeof(group->polling_total[t->state]));
> > + }
> > +
> > + list_for_each_entry(t, &group->triggers, node) {
> > + u64 growth;
> > +
> > + /* check if new stall happened during this window*/
> > + if (!t->new_stall)
> > + continue;
> >
> > /* Calculate growth since last update */
> > growth = window_update(&t->win, now, total[t->state]);
> > @@ -552,10 +561,6 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > t->last_event_time = now;
> > }
> >
> > - if (new_stall)
> > - memcpy(group->polling_total, total,
> > - sizeof(group->polling_total));
> > -
> > return now + group->poll_min_period;
> > }
> >
> > @@ -1152,6 +1157,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> > t->last_event_time = 0;
> > init_waitqueue_head(&t->event_wait);
> > kref_init(&t->refcount);
> > + t->new_stall = false;
> >
> > mutex_lock(&group->trigger_lock);
> >
> > --
> > 1.9.1
> >