Re: [PATCH v2] posix-timers: add multi_clock_gettime system call

From: Thomas Gleixner
Date: Fri Dec 15 2023 - 13:05:14 EST


On Mon, Nov 27 2023 at 17:39, Sagi Maimon wrote:
> Some user space applications need to read some clocks.
> Each read requires moving from user space to kernel space.
> This asymmetry causes the measured offset to have a significant
> error.

I can't figure out what you want to tell me here. Where is an asymmetry?

> Introduce a new system call multi_clock_gettime, which can be used to measure
> the offset between multiple clocks, from variety of types: PHC, virtual PHC
> and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> The offset includes the total time that the driver needs to read the clock
> timestamp.

What for? You still fail to explain the problem this is trying to solve.

> --- a/include/linux/posix-timers.h
> +++ b/include/linux/posix-timers.h
> @@ -260,4 +260,28 @@ void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
> int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
>
> void posixtimer_rearm(struct kernel_siginfo *info);
> +
> +#define MULTI_PTP_MAX_CLOCKS 12 /* Max number of clocks */
> +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */
> +
> +struct __ptp_multi_clock_get {
> + unsigned int n_clocks; /* Desired number of clocks. */
> + unsigned int n_samples; /* Desired number of measurements per clock. */
> + const clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> + /*
> + * Array of list of n_clocks clocks time samples n_samples times.
> + */
> + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
> +};
> +
> +struct __ptp_multi_clock_get32 {
> + unsigned int n_clocks; /* Desired number of clocks. */
> + unsigned int n_samples; /* Desired number of measurements per clock. */
> + const clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> + /*
> + * Array of list of n_clocks clocks time samples n_samples times.
> + */
> + struct old_timespec32
> ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];

Seriously now. We are not adding new syscalls which take compat
timespecs. Any user space application which wants to use a new syscall
which takes a timespec needs to use the Y2038 safe variant.

Aside of that you define a data structure for a syscall in a kernel only
header. How is user space supposed to know the struct?

>
> +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get)
> +{
> + const struct k_clock *kc;
> + struct timespec64 kernel_tp;
> + struct __ptp_multi_clock_get multi_clk_get;
> + int error;
> + unsigned int i, j;

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations

> +
> + if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, sizeof(multi_clk_get)))
> + return -EFAULT;
> +
> + if (multi_clk_get.n_samples > MULTI_PTP_MAX_SAMPLES)
> + return -EINVAL;
> + if (multi_clk_get.n_clocks > MULTI_PTP_MAX_CLOCKS)
> + return -EINVAL;
> +
> + for (j = 0; j < multi_clk_get.n_samples; j++) {
> + for (i = 0; i < multi_clk_get.n_clocks; i++) {
> + kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
> + if (!kc)
> + return -EINVAL;
> + error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], &kernel_tp);
> + if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec __user *)
> + &ptp_multi_clk_get->ts[j][i]))
> + error = -EFAULT;

So this reads a clock from a specific clock id and stores the timestamp
in that user space array.

And how is this solving any of the claims you make in the changelog:

> Introduce a new system call multi_clock_gettime, which can be used to measure
> the offset between multiple clocks, from variety of types: PHC, virtual PHC
> and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> The offset includes the total time that the driver needs to read the clock
> timestamp.

That whole thing is not really different from N consecutive syscalls as
it does not provide and guarantee vs. the gaps between the readouts.

The common case might be closer to what you try to measure, as it avoids
the syscall overhead (which is marginal) but other than that it's
subject to be interrupted and preempted. So the worst case gaps between
the indiviual clock reads is unspecified.

IOW, this is nothing else than wishful thinking and does not solve any real
world problem at all.

Thanks,

tglx