Re: [RFCv4] timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clockchanges

From: Thomas Gleixner
Date: Thu Mar 10 2011 - 16:58:13 EST


On Thu, 10 Mar 2011, Alexander Shishkin wrote:
> On Thu, Mar 10, 2011 at 10:52:18AM +0100, Thomas Gleixner wrote:
> Sure. The time daemon that we have here has to stop automatic time updates
> when some other program changes system time *and* keep that setting
> effective. Currently, when "the other program" changes the system time
> right before time daemon changes it, this time setting will be overwritten
> and lost. I'm thinking that it could be solved with something like
>
> clock_swaptime(clockid, new_timespec, old_timespec);
>
> but something tells me that it will not be welcome either.

Aside of that it wont work. You don't have a reference what
old_timespec means.

The whole problem space is full of race conditions and always will be
a horrible hackery when we try to piggy pack on clock_was_set() as we
have no idea what and when it actually happened. clock_was_set() is
async. While we can somehow get an event on a counter which tells us
that the clock was set, any attempt to return useful information aside
of the fact that the counter changed is going to be inconsistent one
way or the other.

It really takes some more to make this consistent for all the use
cases which are interested in notifications and unconditional timer
cancellation when the underlying clock was set.

After twisting my brain around the corner cases for a while I think
the only feasible approach to avoid all the lurking races is to:

1) Provide a syscall which returns the current offset of
CLOCK_REALTIME vs. CLOCK_MONOTONIC. That offset is changed when
CLOCK_REALTIME is set.

2) Provide a mechanism to check consistently the CLOCK_REALTIME
vs. CLOCK_MONOTONIC offset and notify about changes.

3) Extend the clock_nanosleep() flags with TIMER_CANCEL_ON_CLOCK_SET

When the flag is set, then the rmtp pointer, which is currently
used to copy the remaining time to user space must contain a valid
pointer to the previously retrieved CLOCK_REALTIME offset.

clock_nanosleep() then checks that user space provided offset under
#2 and hooks the caller into the notification mechanism. If the
offset has changed before the timer is enqueued the syscall returns
immediately with an appropriate error code. If the offset changes
after the check, then an eventually enqueued timer will be
cancelled and an appropriate error code returned.

Note: This wont work for signal based timers as we have no sane way
to notify user space about a forced cancellation of the timer. Even
if we could think about some extra signal for this, it's not worth
the trouble and the mess it's going to create.

4) Extend timerfd_settime() as #3 if necessary

I'd prefer to avoid that, but I can see the charm of the poll
facility which is provided by timerfd.

Again we could reuse the omtr pointer of timerfd_settime() to
provide the offset as an incoming parameter when the corresponing
flag is set and basically do the same thing as clock_nanosleep() in
the setup path - check the offset consistently.

It needs some thought on the return values from poll and how to
handle read, but that's a solvable problem as we can reasonably
restrict this functionality to non self rearming timers.

That should solve the most urgent problem of cron alike battery
wasters. It also should be a reasonable notification mechanism for
others who are just interested in the fact that clock was set as those
can simply arm a timer which expires somewhere in the next decade. If
clock is not set within that time frame then battery life wont suffer
from that once in a decade regular timer expiry wakeup.

It's not going to solve the "stop updating time when something else
set the clock" requirement, but as I argued before there is no point
to even think about that at all.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/