Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

From: Thomas Gleixner
Date: Wed Apr 22 2015 - 04:45:57 EST


On Tue, 21 Apr 2015, Thomas Gleixner wrote:
> On Tue, 21 Apr 2015, Arnd Bergmann wrote:
> > I know there are concerns about this, in particular because C11 and
> > POSIX both require tv_nsec to be 'long', unlike timeval->tv_usec,
> > which is a 'suseconds_t' and can be defined as 'long long'.
> >
> > a)
> >
> > struct timespec {
> > time_t tv_sec;
> > long long tv_nsec; /* or typedef long long snseconds_t */
> > };
> >
> > This is not directly compatible with C11 or POSIX.1-2008, but it
> > matches what we do inside of 64-bit kernels, so probably has the
> > highest chance of working correctly in practice
>
> After reading Linus rant in the x32 thread again (thanks for the
> reminder), and looking at b/c/d - which rate between ugly and butt
> ugly - I think we should go for a) and screw POSIX and C11 as those
> committee dinosaurs seem to completely ignore the 2038 problem on
> 32bit machines. At least I have not found any hint that these folks
> care at all. So why should we comply to something which is completely
> useless?
>
> That also makes the question about the upper 32bits check moot, so
> it's the simplest and clearest of the possible solutions.

Second thoughts after some sleep.

So the outcome of this is going to be that user space libraries will
not expose the syscall variant of

syscall_timespec64 {
s64 tv_sec;
s64 tv_nsec;
};

to applications. The libs will translate them to spec conforming

timespec {
time_t tv_sec;
long tv_nsec;
};

anyway. That means we have two translation steps on 32bit systems:

1) user space timespec -> syscall timespec64

2) syscall timespec64 -> scalar nsec s64 (ktime_t)

and the other way round. The kernel internal representation is simply
s64 (nsec) based all over the place.

So we could save one translation step if we implement new syscalls
which have a scalar nsec interface instead of the timespec/timeval
cruft and let user space do the translation to whatever it wants.

So

sys_clock_nanosleep(const clockid_t which_clock, int flags,
const struct timespec __user *expires,
struct timespec __user *reminder)

would get the new syscall variant:

sys_clock_nanosleep_ns(const clockid_t which_clock, int flags,
const s64 expires, s64 __user *reminder)

I personally would welcome such an interface as it makes user space
programming simpler. Just (re)arming a periodic nanosleep based on
absolute expiry time is horrible stupid today:

struct timespec expires;
....
while ()
expires.tv_nsec += period.tv_nsec;
expires.tv_sec += period.tv_sec;
normalize_timespec(&expires);
sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL);

So with a scalar interface this would reduce to:

s64 expires;
....
while ()
expires += period;
sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL);

There is a difference both in text and storage size plus the avoidance
of the two translation steps (one translation step on 64bit).

I know that this is non portable, but OTOH if I look at the non
portable mechanisms which are used by data bases, java VMs and other
apps which exist to squeeze the last cycles out of the system, there
is certainly some value to that.

The portable/spec conforming apps can still use the user space
assisted translated timespec/timeval mechanisms.

There is one caveat though: sys_clock_gettime and sys_gettimeofday
will still need a syscall_timespec64 variant. We have no double
translation steps there because we maintain the timespec
representation in the timekeeping code for performance reasons to
avoid the division in the syscall interface. But everything else can
do nicely without the timespec cruft.

We really should talk to libc folks and high performance users about
this before blindly adding a gazillion of new timespec64 based
interfaces.

Thoughts?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/