Re: Linux's implementation of poll() not scalable?

From: Andi Kleen (ak@suse.de)
Date: Tue Oct 24 2000 - 03:54:53 EST


On Mon, Oct 23, 2000 at 09:06:11PM -0700, Linus Torvalds wrote:
>
>
> On Tue, 24 Oct 2000, Andi Kleen wrote:
> >
> > I don't see the problem. You have the poll table allocated in the kernel,
> > the drivers directly change it and the user mmaps it (I was not proposing
> > to let poll make a kiobuf out of the passed array)
>
> That's _not_ how poll() works at all.

But how /dev/poll would work. Sorry for not marking it clearly.

>
> We don't _have_ a poll table in the kernel, and no way to mmap it. The
> poll() tables gets created dynamically based on the stuff that the user
> has set up in the table. And the user can obviously change the fd's etc in
> the table directly, so in order for the caching to work you need to do
> various games with page table dirty or writable bits, or at least test
> dynamically whether the poll table is the same as it was before.

Actually assuming the user does not duplicate fds in the poll array it
would always work out the lazy way (and when he duplicates fds I would
say the behaviour is undefined -- no spec says in what order poll
must process fds in the supplied poll table)

You either find the correct fd in the offset you cached, and if not
the user has changed the table and you need to recompute all cached offsets.

No need to play pte tricks.

>
> Sure, it's doable, and apparently Solaris does something like this. But
> what _is_ the overhead of the Solaris code for small number of fd's? I bet
> it really is quite noticeable. I also suspect it is very optimized toward
> an unchangning poll-table.

For small number of fds you use the fast_poll/fast_select that I implemented
in the patch I sent to you.

>
> > What is your favourite interface then ?
>
> I suspect a good interface that can easily be done efficiently would
> basically be something where the user _does_ do the equivalent of a
> read-only mmap() of poll entries - and explicit and controlled
> "add_entry()" and "remove_entry()" controls, so that the kernel can
> maintain the cache without playing tricks.

I think when you say the case of duplicated fd entries in an poll array
as undefined you don't need the controlled add/remove_entry.
[and from a quick glimpse at Single Unix it does not say anything about
the order of fd processing in poll, so the spec is definitely flexible
enough for that]

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:13 EST