Re: [RFC PATCH] poll(): add poll_wait_set_exclusive()

From: Mathieu Desnoyers
Date: Wed Oct 06 2010 - 15:04:42 EST


* Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx) wrote:
> On Wed, Oct 6, 2010 at 10:56 AM, Mathieu Desnoyers
> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> > Executive summary:
> >
> > Addition of the new internal API:
>
> Executive summary: no.
>
> You need to explain first how you could _ever_ use this without
> breaking select/poll semantics totally.
>
> IOW, you need to explain the user space interface first. Before you do
> that, this patch is total and utter crap that is expressly designed to
> used only in a manner that is a pure bug.

You are right. My approach breaks the select/poll semantics. This is why I'm
asking for input if we want to solve the more general problem. For the moment,
the poll_wait_set_exclusive() solution was only meant to be used for debugfs
kernel pseudo-files, which fall out of the POSIX scope.

Maybe what I am trying to do is too far from the poll() semantic and does not
apply in the general case, but I clearly see the need, at least in the use-case
detailed below, to wake up only one thread at a time, whether we call this
"poll" or something else. One way to make it available more generally might be
to add a new open() flag and require that all open() of a given file should use
the flag to provide the "wakeup only one thread" behavior.

For reference, here is the use-case: The user-space daemon runs typically one
thread per cpu, each with a handle on many file descriptors. Each thread waits
for data to be available using poll(). In order to follow the poll semantic,
when data becomes available on a file descriptor, the kernel wakes up all
threads at once, but in my case only one of them will successfully consume the
data (all other thread's splice or read will fail with -ENODATA). With many
threads, these useless wakeups add an unwanted overhead and scalability
limitation.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/