Re: [PATCHSET v3 0/5] Add support for epoll min_wait

From: Jens Axboe
Date: Mon Nov 07 2022 - 16:39:01 EST


On 11/7/22 1:56 PM, Stefan Hajnoczi wrote:
> Hi Jens,
> NICs and storage controllers have interrupt mitigation/coalescing
> mechanisms that are similar.

Yep

> NVMe has an Aggregation Time (timeout) and an Aggregation Threshold
> (counter) value. When a completion occurs, the device waits until the
> timeout or until the completion counter value is reached.
>
> If I've read the code correctly, min_wait is computed at the beginning
> of epoll_wait(2). NVMe's Aggregation Time is computed from the first
> completion.
>
> It makes me wonder which approach is more useful for applications. With
> the Aggregation Time approach applications can control how much extra
> latency is added. What do you think about that approach?

We only tested the current approach, which is time noted from entry, not
from when the first event arrives. I suspect the nvme approach is better
suited to the hw side, the epoll timeout helps ensure that we batch
within xx usec rather than xx usec + whatever the delay until the first
one arrives. Which is why it's handled that way currently. That gives
you a fixed batch latency.

--
Jens Axboe