Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.

From: Arul Jeniston
Date: Mon Aug 19 2019 - 02:08:18 EST


Hi tgls,


> BIOS is the more likely candidate.

We would check BIOS and update.


> We expect TSC not to go backwards. If it does it's broken and not usable as
> a clocksource for the kernel. The problem is that this is not necessarily
> easy to detect.

We used relative timer with CLOCK_REALTIME which behaves similar to
CLOCK_MONOTONIC.
We are unable to recreate this problem. So, we have instrumented
kernel code to find the possibility.
During normal run, we do see small amount (~1000 cycles) of backward
time drifts one in a while.
This is likely happens due to the race between multiple processors and
ISR routines.
We have added a hook to read_tsc() and observed backward time drift
when isr comes between reading tsc register and returning the value.
This drifting time differs based on the number of isr handled and the
time taken to service each isr.
During our testing, we observed, if the drifting time goes beyond 5000
cycles, timerfd_read() returns 0 sometimes.
We don't see any problem, if the drifting time is lesser than 5000 cycles.


> Again:
>
> You cannot fix a hardware problem by hacking around it at exactly one place
> where you can observe it. If that problem exists on your machine, then any
> other time related function is affected as well.


The system is up and running more than 6 months.
We don't see any log in dmesg to comment on whether it is a hardware issue.
Other than timerfd_read(), we see no functionality issue . Do we have
any data collected in the kernel to help us in analyzing in the
direction of BIOS or hardware?

> Are you going to submit patches against _ALL_ time{r} related syscalls to
> fix^Wpaper over this? Either against the kernel or against the man pages
>
> i.e. time went backwards since the timer was expired. That's absolutely
> unexpected behaviour and no, we are not papering over it.


Agreed. Our intention is not to put a workaround. Intention is to
write a reliable application that handles all values returned by a
system call.
At present, the application doesn't know whether 0 return value is a
bug or valid case.


> Did you ever quantify how much time goes backwards in that case?

It should be more than 5000 cycles. Found it through kernel instrumentation.


> Is the timer expiry and the timerfd_read() on the same CPU or on different
> ones?


We don't have data to answer this. However, the kernel is configured
to allow timer migration.
So, we believe, the timer expiry and timerfd_read happens on different CPUs.

> Can you please provide a full dmesg from boot to after the point where this
> failure happens?

We don't see any logs in dmesg during the occurrence of this problem.
We may not be able to share complete dmesg logs due to security reasons.
We haven't seen any time drifting related messages too.
Let us know, if you are looking for any specific log message.

Thanks,
Arul