Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

From: Arnd Bergmann
Date: Tue Apr 21 2015 - 05:00:34 EST


On Monday 20 April 2015 22:40:57 Thomas Gleixner wrote:
> On Mon, 20 Apr 2015, Baolin Wang wrote:
> > @@ -771,6 +771,7 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
> > struct itimerspec __user *, setting)
> > {
> > struct itimerspec cur_setting;
> > + struct itimerspec64 cur_setting64;
> > struct k_itimer *timr;
> > struct k_clock *kc;
> > unsigned long flags;
> > @@ -781,10 +782,16 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
> > return -EINVAL;
> >
> > kc = clockid_to_kclock(timr->it_clock);
> > - if (WARN_ON_ONCE(!kc || !kc->timer_get))
> > + if (WARN_ON_ONCE(!kc || (!kc->timer_get && !kc->timer_get64))) {
> > ret = -EINVAL;
> > - else
> > - kc->timer_get(timr, &cur_setting);
> > + } else {
> > + if (kc->timer_get64) {
> > + kc->timer_get64(timr, &cur_setting64);
> > + cur_setting = itimerspec64_to_itimerspec(cur_setting64);
> > + } else {
> > + kc->timer_get(timr, &cur_setting);
> > + }
> > + }
>
> This is really horrible. You add a metric ton of conditionals to every
> syscall just to remove them later again. I have not yet checked the
> end result, but this approach is error prone as hell and just
> introduces completely useless code churn.
>
> It's useless because you do not factor out the guts of the syscall
> functions so we can reuse the very same logic for the future 2038 safe
> syscalls which we need to introduce for 32bit machines.

Hi Thomas,

I should probably go ahead and introduce all the new syscalls in a
separate patch series. I have my own ideas about how to get there,
in a slightly different way from what you propose:

> Take a look at the compat syscalls. They do the right thing.
>
> COMPAT_SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
> struct compat_itimerspec __user *, setting)
> {
> long err;
> mm_segment_t oldfs;
> struct itimerspec ts;
>
> oldfs = get_fs();
> set_fs(KERNEL_DS);
> err = sys_timer_gettime(timer_id,
> (struct itimerspec __user *) &ts);
> set_fs(oldfs);
> if (!err && put_compat_itimerspec(setting, &ts))
> return -EFAULT;
> return err;
> }

As a side note, I want to kill off the get_fs()/set_fs() calls in
the process. These always make me dizzy when I try to work out whether
there is a potential security hole (there is not in this case), and
they can be slow on some architectures.

> So we can be clever and do the following:
>
> 1) Preparatory work in posix-timer.c (Patch #1)
>
> 2) Do the 64bit infrastructure work in posix-timer.c (Patch #2)
>
> The result is two simple to review patches with minimal code churn.

This all sounds great, good idea.

> The nice thing is that once we introduce new syscalls for 32bit
> machines, e.g. sys_timer_gettime64(), all we need to do is:
>
> /* Get the time remaining on a POSIX.1b interval timer. */
> SYSCALL_DEFINE2(timer_gettime64, timer_t, timer_id,
> struct itimerspec64 __user *, setting)
> {
> struct itimerspec64 cur_setting64;
> int ret = __timer_gettime(timer_id, &cur_setting64);
>
> if (!ret && copy_to_user(setting, &cur_setting, sizeof (cur_setting)))
> return -EFAULT;
>
> return ret;
> }
>
> And on 64bit timer_gettime64() and timer_gettime() are the same, so we
> just need to do a clever mapping of timer_gettime() to
> timer_gettime64(). Not rocket science....
>
> For 32 bit we provide the old timer_gettime() for non converted
> applications:
>
> #ifdef CONFIG_32BIT_OLD_TIMESPEC_SYSCALLS
> /* Get the time remaining on a POSIX.1b interval timer in the old timespec format. */
> SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
> struct itimerspec __user *, setting)
> {
> struct itimerspec64 cur_setting64;
> struct itimerspec cur_setting;
> int ret = __timer_gettime(timer_id, &cur_setting64);
>
> if (!ret) {
> cur_setting = itimerspec64_to_itimerspec(&cur_setting64);
>
> if (copy_to_user(setting, &cur_setting, sizeof (cur_setting)))
> return -EFAULT;
> }
> return ret;
> }
> #endif
>
> Simple, isn't it? No useless churn. Proper refactoring for the next
> step. No useless copying for 64 bit.

My preferred solution is one where we end up with the same syscalls for
both 32-bit and 64-bit, and basically use the compat_sys_timer_gettime()
implementation (or a simplified version) for the existing , something like this:

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index ef8187f9d28d..71e74a47a2cf 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -267,7 +267,7 @@
258 i386 set_tid_address sys_set_tid_address
259 i386 timer_create sys_timer_create compat_sys_timer_create
260 i386 timer_settime sys_timer_settime compat_sys_timer_settime
-261 i386 timer_gettime sys_timer_gettime compat_sys_timer_gettime
+261 i386 timer_gettime compat_sys_timer_gettime
262 i386 timer_getoverrun sys_timer_getoverrun
263 i386 timer_delete sys_timer_delete
264 i386 clock_settime sys_clock_settime compat_sys_clock_settime
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+359 i386 timer_gettime64 sys_timer_gettime
diff --git a/kernel/compat.c b/kernel/compat.c
index 24f00610c575..15d4f5abb492 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -723,18 +723,14 @@ COMPAT_SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, flags,
COMPAT_SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
struct compat_itimerspec __user *, setting)
{
- long err;
- mm_segment_t oldfs;
- struct itimerspec ts;
+ struct itimerspec64 ts;
+ long ret;

- oldfs = get_fs();
- set_fs(KERNEL_DS);
- err = sys_timer_gettime(timer_id,
- (struct itimerspec __user *) &ts);
- set_fs(oldfs);
- if (!err && put_compat_itimerspec(setting, &ts))
- return -EFAULT;
- return err;
+ ret = __timer_gettime(timer_id, &ts);
+ if (ret)
+ return ret;
+
+ return put_compat_itimerspec(setting, &ts);
}

COMPAT_SYSCALL_DEFINE2(clock_settime, clockid_t, which_clock,
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 31ea01f42e1f..c601f1969f5a 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -766,32 +766,27 @@ common_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting)
cur_setting->it_value = ktime_to_timespec(remaining);
}

+static int put_compat_itimerspec(struct __kernel_itimerspec64 __user *dst,
+ const struct itimerspec64 *src)
+{
+ if (__put_timespec64(&src->it_interval, &dst->it_interval) ||
+ __put_timespec64(&src->it_value, &dst->it_value))
+ return -EFAULT;
+ return 0;
+}
+
/* Get the time remaining on a POSIX.1b interval timer. */
SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
- struct itimerspec __user *, setting)
+ struct __kernel_itimerspec64 __user *, setting)
{
- struct itimerspec cur_setting;
- struct k_itimer *timr;
- struct k_clock *kc;
- unsigned long flags;
- int ret = 0;
-
- timr = lock_timer(timer_id, &flags);
- if (!timr)
- return -EINVAL;
+ struct itimerspec64 cur_setting;
+ int ret;

- kc = clockid_to_kclock(timr->it_clock);
- if (WARN_ON_ONCE(!kc || !kc->timer_get))
- ret = -EINVAL;
- else
- kc->timer_get(timr, &cur_setting);
+ ret = __timer_gettime(timer_id, &cur_setting);
+ if (ret)
+ return ret;

- unlock_timer(timr, flags);
-
- if (!ret && copy_to_user(setting, &cur_setting, sizeof (cur_setting)))
- return -EFAULT;
-
- return ret;
+ return put_itimerspec(setting, &cur_setting);
}

/*


Note the use of a separate __kernel_itimerspec64 for the user interface
here, which I think will be needed to hide the differences between the
normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit
platforms that will be defined differently (using 'long long').

My plan for migration here is to temporarily #define __kernel_itimerspec64
as itimerspec on 32-bit architectures (which may be a bit confusing),
and then introduce a new CONFIG_COMPAT_TIME option that is set by
each architecture that has been converted over to use the new syscalls.
This way we can do one syscall at a time at first, followed by one
architecture at a time.

Unless you have serious objections to that plan, I'd like to work
on a first version of this myself and send that out for clarifications.

I would also prefer not too many people to work on the syscalls, and
would rather have Baolin not touch any of the syscall prototypes for
the moment.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/