Re: [RESEND PATCH v4 1/1] psi: stop relying on timer_pending for poll_work rescheduling

From: Suren Baghdasaryan
Date: Fri Oct 21 2022 - 15:54:56 EST


On Thu, Oct 20, 2022 at 7:11 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On 10 Oct 2022 15:57:44 -0700 Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > Psi polling mechanism is trying to minimize the number of wakeups to
> > run psi_poll_work and is currently relying on timer_pending() to detect
> > when this work is already scheduled. This provides a window of opportunity
> > for psi_group_change to schedule an immediate psi_poll_work after
> > poll_timer_fn got called but before psi_poll_work could reschedule itself.
> > Below is the depiction of this entire window:
> >
> > poll_timer_fn
> > wake_up_interruptible(&group->poll_wait);
> >
> > psi_poll_worker
> > wait_event_interruptible(group->poll_wait, ...)
> > psi_poll_work
> > psi_schedule_poll_work
> > if (timer_pending(&group->poll_timer)) return;
> > ...
> > mod_timer(&group->poll_timer, jiffies + delay);
>
> [...]
>
> >
> > -/* Schedule polling if it's not already scheduled. */
> > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > +/* Schedule polling if it's not already scheduled or forced. */
> > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > + bool force)
> > {
> > struct task_struct *task;
> >
> > /*
> > - * Do not reschedule if already scheduled.
> > - * Possible race with a timer scheduled after this check but before
> > - * mod_timer below can be tolerated because group->polling_next_update
> > - * will keep updates on schedule.
> > + * atomic_xchg should be called even when !force to provide a
> > + * full memory barrier (see the comment inside psi_poll_work).
> > */
> > - if (timer_pending(&group->poll_timer))
> > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > return;
>
> If poll_scheduled works, turning poll_timer, which only wakes up poll
> worker, to a delayed work also works because schedule_delayed_work()
> takes care of pending work, with the bonus of cutting poll worker.

Thanks for the suggestion, Hillf.
psi_poll_worker runs at a low FIFO priority to prevent normal tasks
from preempting PSI signal generation (see sched_set_fifo_low() call
inside psi_poll_worker()), so schedule_delayed_work() would not be
usable as is I think, since it uses normal priority system_wq. I would
probably need to use queue_delayed_work() with a dedicated workqueue
that uses a worker with worker->task set to the same FIFO priority.
However I'm not sure it's worth creating a workqueue for only one task
that might be scheduled in it...
Thanks,
Suren.

>
> Hillf