Re: Timings for optimised poll(2)

Richard Gooch (rgooch@atnf.CSIRO.AU)
Tue, 26 Aug 1997 21:54:42 +1000


Gerhard Mack writes:
> After watching this debate on poll(2) vs select(2) go in circles I have
> some questions.
>
> 1: is the proposed poll(2) faster
> 2: any disadvantages in implemanting it? What changes would apps need to
> make to accomodate it?

OK, I'll answer this, since you asked (I was hoping to be able to wait
until my other suggestion about optimising polling was adopted, since
it will make improvements more impressive:-).
These tests are done on a Pentium 100, with a patched 2.1.51. The
patches disable wait queue manipulation for zero timeout polls. Note
that the bulk of the time (2 milliseconds for 1000 descriptors) is
taken up by calling indirect poll functions in the file operations
structure: once that is fixed the improvements with poll2(2) will be
more significant.
Note that these tests *do not* include the time taken for the
application to scan the results.

The following syscalls are tested:

1) select(2), where the inner loop is:
memcpy (&i_fds, &input_fds, sizeof i_fds);
memcpy (&o_fds, &output_fds, sizeof i_fds);
memcpy (&e_fds, &exception_fds, sizeof i_fds);
nready = select (max_fd + 1, &i_fds, &o_fds, &e_fds, &tv);

2) poll(2), where the inner loop is:
nready = poll (pollfd_array + start_index, num_to_poll, 0);

3) poll2(s), where the inner loop is:
nready = poll2 (poll2ifd_array + start_index, user_ptrs +
start_index, poll2ofd_array, num_to_poll, 0);

Test 1: checking descriptors 24-1023:
select: 2202 microseconds
poll: 2680 microseconds
poll2: 2140 microseconds

Test 2: checking descriptors 924-1023:
select: 513 microseconds
poll: 278 microseconds
poll2: 224 microseconds

Test 3: checking descriptors 1014-1023:
select: 323 microseconds
poll: 60 microseconds
poll2: 47 microseconds

One thing I noticed was that once the wait queue manipulation was
removed, poll2(2) improved its absolute performance compared with
select(2): I guess that removing those wait queue manipulations
reduced the number of cache misses, since poll2(2) has a larger memory
access footprint (the input structure is 8 bytes per fd, compared with
3 bits for select(2)).

Regards,

Richard....