Re: [PATCH] eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization

From: Wen Yang
Date: Wed Apr 19 2023 - 11:30:10 EST



在 2023/4/19 17:12, Christian Brauner 写道:
On Tue, Apr 18, 2023 at 08:15:03PM -0600, Jens Axboe wrote:
On 4/17/23 10:32?AM, Wen Yang wrote:
? 2023/4/17 22:38, Jens Axboe ??:
On 4/16/23 5:31?AM, wenyang.linux@xxxxxxxxxxx wrote:
From: Wen Yang <wenyang.linux@xxxxxxxxxxx>

For the NON SEMAPHORE eventfd, if it's counter has a nonzero value,
then a read(2) returns 8 bytes containing that value, and the counter's
value is reset to zero. Therefore, in the NON SEMAPHORE scenario,
N event_writes vs ONE event_read is possible.

However, the current implementation wakes up the read thread immediately
in eventfd_write so that the cpu utilization increases unnecessarily.

By adding a configurable delay after eventfd_write, these unnecessary
wakeup operations are avoided, thereby reducing cpu utilization.
What's the real world use case of this, and what would the expected
delay be there? With using a delayed work item for this, there's
certainly a pretty wide grey zone in terms of delay where this would
perform considerably worse than not doing any delayed wakeups at all.

Thanks for your comments.

We have found that the CPU usage of the message middleware is high in
our environment, because sensor messages from MCU are very frequent
and constantly reported, possibly several hundred thousand times per
second. As a result, the message receiving thread is frequently
awakened to process short messages.

The following is the simplified test code:
https://github.com/w-simon/tests/blob/master/src/test.c

And the test code in this patch is further simplified.

Finally, only a configuration item has been added here, allowing users
to make more choices.
I think you'd have a higher chance of getting this in if the delay
setting was per eventfd context, rather than a global thing.

Thank you.
We will follow your suggestion to change the global configuration to per eventfd.

That patch seems really weird. Is that an established paradigm to
address problems like this through a configured wakeup delay? Because
naively this looks like a pretty brutal hack.

Thanks.

Well, what you are concerned about may be that the rough delay may cause additional problems, which is indeed worth considering.

Meanwhile, prolonged and frequent write_eventfd calls are actually another type of attack.

If we change it to this:

When a continuous write_eventfd reaches a certain threshold in a short period of time, a delay is added as a penalty.

Do you think this is acceptable?


--

Best wishes,

Wen