Re: RFC: A revised timerfd API

From: Michael Kerrisk
Date: Tue Sep 18 2007 - 05:30:25 EST

Next message: Peter Zijlstra: "Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)"
Previous message: Andrew Morton: "Re: 2.6.23-rc6-mm1"
In reply to: Thomas Gleixner: "Re: RFC: A revised timerfd API"
Next in thread: Thomas Gleixner: "Re: RFC: A revised timerfd API"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Thomas,

> On Tue, 2007-09-18 at 09:30 +0200, Michael Kerrisk wrote:
> > ====> a) Add an argument (a multiplexing timerfd() system call)
> > Disadvantage:
> > Jon Corbet pointed out
> > (http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
> > that this interface was starting to look like a multiplexing syscall,
> > because there is no case where all of the arguments are used (see
> > the use-case descriptions in the earlier mail).
> >
> > I'm inclined to agree with Jon; therefore one of the remaining
> > solutions may be preferable
>
> I agree. It's ugly.

Fair enough. I mainly tried to do things that way to minimize
the change from the Davide's original interface.

> > ====> b) Create a timerfd interface analogous to POSIX timers
> >
> > Create an interface analogous to POSIX timers:
> >
> > fd = timerfd_create(clockid, flags);
> >
> > timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);
> >
> > timerfd_gettime(fd, &time_to_next_expire);
> >
> > Under this proposal, the manner of making a timer that does not
> > need "get-while-set" functionality remains fairly simple:
> >
> > fd = timerfd_create(clockid);
> >
> > timerfd_settime(fd, flags, newtimervalue, NULL);
> >
> > Advantage: this would be a clean, fully functional API, and well
> > understood by virtue of its analogy with the POSIX timers API.
> >
> > Disadvantage: 3 new system calls, rather than 1.
> >
> > This solution would be sufficient, IMO, but one of the
> > next solutions might be better.
>
> I'm not scared by the 3 system calls. I rather fear that we end up
> reimplementing half of the existing posix timer code.

Yes. Perhaps some refactoring might be required, if we went
down this route.

> > ====> c) Integrate timerfd with POSIX timers
> >
> > Make a very simple timerfd call that is integrated with the
> > POSIX timers API. The POSIX timers API is detailed here:
> > http://linux.die.net/man/3/timer_create
> > http://linux.die.net/man/3/timer_settime
> >
> > Under the POSIX timers API, a new timer is created using:
> >
> > int timer_create(clockid_t clockid, struct sigevent *evp,
> > timer_t *timerid);
> >
> > We could then have a timerfd() call that returns a file descriptor
> > for the newly created 'timerid':
> >
> > fd = timerfd(timer_t timerid);
> >
> > We could then use the POSIX timers API to operate on the timer
> > (start it / modify it / fetch timer value):
> >
> > int timer_settime(timer_t timerid, int flags,
> > const struct itimerspec *value,
> > struct itimerspec *ovalue);
> > int timer_gettime(timer_t timerid, struct itimerspec *value);
> >
> > And then read() from 'fd' as before.
> >
> > In the simple case (no "get" or "get-while-setting" functionality),
> > the use of API (c) would be:
> >
> > timer_create(clockid, &evp, &timerid);
> >
> > fd = timerfd(timerid);
> >
> > timer_settime(timerid, flags, &newvalue, NULL));
> >
> > Advantages:
> > 1. Integration with an existing API.
> > 2. Adds just a single system call.
> > 3. It _might_ be possible to construct an interface that allows
> > userland programs to do things like creating a timer fd for
> > a POSIX timer that was created via some library that doesn't
> > actually know about timer fds. (I can already see problems with
> > this, since that library will already expect to be delivering
> > timer notifications somehow (via threads or signals), and it may
> > be difficult to make the two notification mechanisms play
> > together in a sane way. But maybe someone else has a take on
> > this that can rescue this idea.)
> >
> > Disadvantages:
> > 1. Starts to get a little more clunky to use in the simple
> > case shown above.
> >
> > This strikes me as a more attractive solution than (b), if we can do
> > it properly -- that means: if we can achieve advantage 3
> > in some reasonable way. If we can't achieve that, then probably
> > the next solution is better.
>
> The main problem here is, that there is no way to tell the posix timer
> code that the delivery of the timer is through the file descriptor and
> not via the usual posix timer mechanisms. We need something like the
> SIGEV_TIMERFD flag to make the posix timer code aware of that.

Well, I left it it kind of open whether the expiration
notification might be delivered via both the traditional
mechanism, and via the tiemrfd. But I realize that all
may get overly complex.

> > ====> d) extend the POSIX timers API
> >
> > Under the POSIX timers API, the evp argument of timer_create() is a
> > structure that allows the caller to specify how timer expirations
> > should be notified. There are the following possibilities
> > (differentiated by the value assigned to evp.sigev_notify):
> >
> > i) notify via a signal: the caller specifies which signal the
> > kernel should deliver when the timer expires.
> > (SIGEV_SIGNAL)
> > ii) notify by delivering a signal to the thread whose thread ID
> > is specified in evp. (This is Linux specific.)
> > (SIGEV_THREAD_ID)
> > iii) notify via a thread: when the timer expires, the system starts
> > a new thread which receives an argument that was specified in
> > the evp structure. (SIGEV_THREAD)
> > iv) no notification: the caller can monitor the timer state using
> > timer_gettime(). (SIGEV_NONE)
> >
> > In all of the above cases, the return value from timer_create()
> > is 0 for success, or -1 for failure.
> >
> > We could extend the interface as follows:
> >
> > 1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
> > This flag indicates that the caller wants timer
> > notification via a file descriptor.
> > 2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
> > timer_create() call return a file descriptor (i.e., an
> > integer >= 0).
> >
> > Advantages:
> > 1. Integration with an existing API.
> > 2. No new system calls are required.
> > 3. This idea might even have a chance of getting standardized in
> > POSIX one day, since (IMO) it integrates fairly cleanly with
> > an existing API.
> >
> > Disadvantages:
> > 1. The fact that the return value of a successful timer_create()
> > is different for the SIGEV_TIMERFD case is a bit of a wart.
>
> What happens on close(fd) ? Is the posix timer automatically destroyed ?

I would say not (see also my reply to David Härdeman.)

> Is the file descriptor invalidated when the timer is destroyed via
> timer_delete(timer_id) ? The automatic file descriptor creation is a bit
> ugly.

Yes, it is a little ugly.

> I'd rather see a combination of c) and d) as a solution:
>
> Notify the posix timer code that the timer delivery is done via the file
> descriptor mechanism (SIGEV_TIMERFD).
>
> Use a new syscall to open a file descriptor on that timer.
>
> When the file descriptor is closed the timer is not destroyed, but
> delivery disabled (analogous to the SIGEV_NONE case), so you can reopen
> and reactivate it later on.
>
> This way we have it nicely integrated into the posix timer code and keep
> the existing semantics of posix timers intact.
>
> We need to think about the open file descriptor in the timer_delete()
> case as well, but this should be not too hard to sort out.

This seems like a workable idea also. But note David Härdeman's
critique of options c & d: the existence of a coupled timerfd
and a timerid means that the application must maintain a mapping
between the two, so that after an epoll call (for example) that
says the timerfd is ready, the timer can be manipulated using
the corresponding timerfd. This isn't IMO a fatal flaw, but
it does make the API a little more clumsy.

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Peter Zijlstra: "Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)"
Previous message: Andrew Morton: "Re: 2.6.23-rc6-mm1"
In reply to: Thomas Gleixner: "Re: RFC: A revised timerfd API"
Next in thread: Thomas Gleixner: "Re: RFC: A revised timerfd API"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]