Re: [PATCH v2] posix-timers: add multi_clock_gettime system call

From: Sagi Maimon
Date: Wed Dec 27 2023 - 10:10:05 EST


On Fri, Dec 15, 2023 at 8:05 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
Hi Thomas
Thanks for your notes.
> On Mon, Nov 27 2023 at 17:39, Sagi Maimon wrote:
> > Some user space applications need to read some clocks.
> > Each read requires moving from user space to kernel space.
> > This asymmetry causes the measured offset to have a significant
> > error.
>
> I can't figure out what you want to tell me here. Where is an asymmetry?
>
You are right the comment is not clear enough.
Some user space applications need to read some clocks.
Each read requires moving from user space to kernel space.
The syscall overhead causes unpredictable delay between N clocks reads
Removing this delay causes better synchronization between N clocks.
> > Introduce a new system call multi_clock_gettime, which can be used to measure
> > the offset between multiple clocks, from variety of types: PHC, virtual PHC
> > and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> > The offset includes the total time that the driver needs to read the clock
> > timestamp.
>
> What for? You still fail to explain the problem this is trying to solve.
>
Explanation above
> > --- a/include/linux/posix-timers.h
> > +++ b/include/linux/posix-timers.h
> > @@ -260,4 +260,28 @@ void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
> > int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
> >
> > void posixtimer_rearm(struct kernel_siginfo *info);
> > +
> > +#define MULTI_PTP_MAX_CLOCKS 12 /* Max number of clocks */
> > +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */
> > +
> > +struct __ptp_multi_clock_get {
> > + unsigned int n_clocks; /* Desired number of clocks. */
> > + unsigned int n_samples; /* Desired number of measurements per clock. */
> > + const clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> > + /*
> > + * Array of list of n_clocks clocks time samples n_samples times.
> > + */
> > + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
> > +};
> > +
> > +struct __ptp_multi_clock_get32 {
> > + unsigned int n_clocks; /* Desired number of clocks. */
> > + unsigned int n_samples; /* Desired number of measurements per clock. */
> > + const clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> > + /*
> > + * Array of list of n_clocks clocks time samples n_samples times.
> > + */
> > + struct old_timespec32
> > ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
>
> Seriously now. We are not adding new syscalls which take compat
> timespecs. Any user space application which wants to use a new syscall
> which takes a timespec needs to use the Y2038 safe variant.
>
you are right - will be fixed on patch V3
> Aside of that you define a data structure for a syscall in a kernel only
> header. How is user space supposed to know the struct?
>
you are right - will be fixed on patch V3
> >
> > +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get)
> > +{
> > + const struct k_clock *kc;
> > + struct timespec64 kernel_tp;
> > + struct __ptp_multi_clock_get multi_clk_get;
> > + int error;
> > + unsigned int i, j;
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations
>
you are right - will be fixed on patch V3
> > +
> > + if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, sizeof(multi_clk_get)))
> > + return -EFAULT;
> > +
> > + if (multi_clk_get.n_samples > MULTI_PTP_MAX_SAMPLES)
> > + return -EINVAL;
> > + if (multi_clk_get.n_clocks > MULTI_PTP_MAX_CLOCKS)
> > + return -EINVAL;
> > +
> > + for (j = 0; j < multi_clk_get.n_samples; j++) {
> > + for (i = 0; i < multi_clk_get.n_clocks; i++) {
> > + kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
> > + if (!kc)
> > + return -EINVAL;
> > + error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], &kernel_tp);
> > + if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec __user *)
> > + &ptp_multi_clk_get->ts[j][i]))
> > + error = -EFAULT;
>
> So this reads a clock from a specific clock id and stores the timestamp
> in that user space array.
>
> And how is this solving any of the claims you make in the changelog:
>
> > Introduce a new system call multi_clock_gettime, which can be used to measure
> > the offset between multiple clocks, from variety of types: PHC, virtual PHC
> > and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> > The offset includes the total time that the driver needs to read the clock
> > timestamp.
>
> That whole thing is not really different from N consecutive syscalls as
> it does not provide and guarantee vs. the gaps between the readouts.
>
> The common case might be closer to what you try to measure, as it avoids
> the syscall overhead (which is marginal) but other than that it's
> subject to be interrupted and preempted. So the worst case gaps between
> the indiviual clock reads is unspecified.
>
> IOW, this is nothing else than wishful thinking and does not solve any real
> world problem at all.
>
preemption or interruption delays will still occur, but at least we
are removing the syscall overhead.
Plus the preemption issue can be reduced by using 99 RT priority while
calling this system call.
We have conducted an experiment that proved that the system call
overhead is not marginal at all.
A process with NICE 0 priority reading PHC twice and measuring the
time delay between two reads 1000 times.
The first is done by two consecutive calls to clock_gettime system
call and the other with
one call to multi_clock_gettime system call.
In the system with multi_clock_gettime system call, the delay of 990
calls was under 100 ns.
In the system with clock_gettime system call the delay of 580 calls
were under 100 ns
72 between 100-500ns 322 between 500-1000ns and some over 1000-5000ns.
> Thanks,
>
> tglx