Re: [PATCH] eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization

From: Christian Brauner
Date: Wed Apr 19 2023 - 05:12:37 EST


On Tue, Apr 18, 2023 at 08:15:03PM -0600, Jens Axboe wrote:
> On 4/17/23 10:32?AM, Wen Yang wrote:
> >
> > ? 2023/4/17 22:38, Jens Axboe ??:
> >> On 4/16/23 5:31?AM, wenyang.linux@xxxxxxxxxxx wrote:
> >>> From: Wen Yang <wenyang.linux@xxxxxxxxxxx>
> >>>
> >>> For the NON SEMAPHORE eventfd, if it's counter has a nonzero value,
> >>> then a read(2) returns 8 bytes containing that value, and the counter's
> >>> value is reset to zero. Therefore, in the NON SEMAPHORE scenario,
> >>> N event_writes vs ONE event_read is possible.
> >>>
> >>> However, the current implementation wakes up the read thread immediately
> >>> in eventfd_write so that the cpu utilization increases unnecessarily.
> >>>
> >>> By adding a configurable delay after eventfd_write, these unnecessary
> >>> wakeup operations are avoided, thereby reducing cpu utilization.
> >> What's the real world use case of this, and what would the expected
> >> delay be there? With using a delayed work item for this, there's
> >> certainly a pretty wide grey zone in terms of delay where this would
> >> perform considerably worse than not doing any delayed wakeups at all.
> >
> >
> > Thanks for your comments.
> >
> > We have found that the CPU usage of the message middleware is high in
> > our environment, because sensor messages from MCU are very frequent
> > and constantly reported, possibly several hundred thousand times per
> > second. As a result, the message receiving thread is frequently
> > awakened to process short messages.
> >
> > The following is the simplified test code:
> > https://github.com/w-simon/tests/blob/master/src/test.c
> >
> > And the test code in this patch is further simplified.
> >
> > Finally, only a configuration item has been added here, allowing users
> > to make more choices.
>
> I think you'd have a higher chance of getting this in if the delay
> setting was per eventfd context, rather than a global thing.

That patch seems really weird. Is that an established paradigm to
address problems like this through a configured wakeup delay? Because
naively this looks like a pretty brutal hack.