Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

From: Andrew Morton
Date: Fri Jul 20 2018 - 16:44:35 EST


On Fri, 20 Jul 2018 13:05:59 -0700 Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:

> On Fri, 20 Jul 2018, Andrew Morton wrote:
>
> >On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:
> >
> >> Hi,
> >>
> >> Both patches replace saving+restoring interrupts when taking the
> >> ep->lock (now the waitqueue lock), with just disabling local irqs.
> >> This shows immediate performance benefits in patch 1 for an epoll
> >> workload running on Xen.
> >
> >I'm surprised. Is spin_lock_irqsave() significantly more expensive
> >than spin_lock_irq()? Relative to all the other stuff those functions
> >are doing? If so, how come? Some architectural thing makes
> >local_irq_save() much more costly than local_irq_disable()?
>
> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
> the cost of Xen is much higher.
>
> And at least considering ep_scan_ready_list(), the lock is taken/released
> twice, to deal with the ovflist when the ep->wq.lock is not held. To the
> point that it yields measurable results (see patch 1) across incremental
> thread counts.

Did you try measuring it on bare hardware?

> >
> >> The main concern we need to have with this
> >> sort of changes in epoll is the ep_poll_callback() which is passed
> >> to the wait queue wakeup and is done very often under irq context,
> >> this patch does not touch this call.
> >
> >Yeah, these changes are scary. For the code as it stands now, and for
> >the code as it evolves.
>
> Yes which is why I've been throwing lots of epoll workloads at it.

I'm sure. It's the "as it evolves" that is worrisome, and has caught
us in the past.

> >
> >I'd have more confidence if we had some warning mechanism if we run
> >spin_lock_irq() when IRQs are disabled, which is probably-a-bug. But
> >afaict we don't have that. Probably for good reasons - I wonder what
> >they are?

Well ignored ;)

We could open-code it locally. Add a couple of
WARN_ON_ONCE(irqs_disabled())? That might need re-benchmarking with
Xen but surely just reading the thing isn't too expensive?