Re: [PATCH 6/6] eventpoll: add support for min-wait

From: Soheil Hassas Yeganeh
Date: Tue Nov 08 2022 - 17:42:29 EST


On Tue, Nov 8, 2022 at 5:20 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
> > Is there a way to short cut the wait if the process is being terminated?
> >
> > We issues in production systems in the past where too many threads were
> > in epoll_wait and the process got terminated. It'd be nice if these
> > threads could exit the syscall as fast as possible.
>
> Good point, it'd be a bit racy though as this is called from the waitq
> callback and hence not in the task itself. But probably Good Enough for
> most use cases?

Sounds good. We can definitely do that as a follow up later.

> This should probably be a separate patch though, as it seems this
> affects regular waits too without min_wait set?
>
> >> @@ -1845,6 +1891,18 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> >> ewq.timed_out = true;
> >> }
> >>
> >> + /*
> >> + * If min_wait is set for this epoll instance, note the min_wait
> >> + * time. Ensure the lowest bit is set in ewq.min_wait_ts, that's
> >> + * the state bit for whether or not min_wait is enabled.
> >> + */
> >> + if (ep->min_wait_ts) {
> >
> > Can we limit this block to "ewq.timed_out && ep->min_wait_ts"?
> > AFAICT, the code we run here is completely wasted if timeout is 0.
>
> Yep certainly, I can gate it on both of those conditions.

Thanks. I think that would help. You might also want to restructure the if/else
condition above but it's your call.

On Tue, Nov 8, 2022 at 5:29 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 11/8/22 3:25 PM, Willem de Bruijn wrote:
> >>> This would be similar to the approach that willemb@xxxxxxxxxx used
> >>> when introducing epoll_pwait2.
> >>
> >> I have, see other replies in this thread, notably the ones with Stefan
> >> today. Happy to do that, and my current branch does split out the ctl
> >> addition from the meat of the min_wait support for this reason. Can't
> >> seem to find a great way to do it, as we'd need to move to a struct
> >> argument for this as epoll_pwait2() is already at max arguments for a
> >> syscall. Suggestions more than welcome.
> >
> > Expect an array of two timespecs as fourth argument?
>
> Unfortunately even epoll_pwait2() doesn't have any kind of flags
> argument to be able to do tricks like that... But I guess we could do
> that with epoll_pwait3(), but it'd be an extra indirection for the copy
> at that point (copy array of pointers, copy pointer if not NULL), which
> would be unfortunate. I'd hate to have to argue that API to anyone, let
> alone Linus, when pushing the series.

I personally like what Willem suggested. It feels more natural to me
and as you suggested previously it can be a struct argument.

The overheads would be similar to any syscall that accepts itimerspec.

I understand your concern on "epoll_pwait3". I wish Linus would weigh
in here. :-)