Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot

From: Mark Salyzyn
Date: Tue Nov 28 2017 - 14:10:13 EST


On 11/28/2017 10:43 AM, Prarit Bhargava wrote:

On 11/23/2017 07:58 AM, Petr Mladek wrote:
On Wed 2017-11-15 19:15:38, Thomas Gleixner wrote:
For demonstration purposes only.

Add a disgusting hack to work around the fact that high resolution clock
MONOTONIC accessors are not available during early boot and return stale
time stamps accross suspend/resume when the current clocksource is not
flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.

Use local_clock() to provide timestamps in early boot and when the
clocksource is not accessible after timekeeping_suspend(). In the
suspend/resume case this might cause non monotonic timestamps.
I get the non-monotonic times even during boot:

[ 0.026709] smp: Bringing up secondary CPUs ...
[ 0.027973] x86: Booting SMP configuration:
[ 0.028006] .... node #0, CPUs: #1
[ 0.004000] kvm-clock: cpu 1, msr 1:3ff51041, secondary cpu clock
^^^^^^^^
[ 0.032097] KVM setup async PF for cpu 1
[ 0.032766] kvm-stealtime: cpu 1, msr 13b00dc40
[ 0.036502] #2
[ 0.004000] kvm-clock: cpu 2, msr 1:3ff51081, secondary cpu clock
^^^^^^^^
[ 0.040062] KVM setup async PF for cpu 2
[ 0.040576] kvm-stealtime: cpu 2, msr 13b20dc40
[ 0.041304] #3
[ 0.004000] kvm-clock: cpu 3, msr 1:3ff510c1, secondary cpu clock
^^^^^^^^
[ 0.048051] KVM setup async PF for cpu 3
[ 0.048554] kvm-stealtime: cpu 3, msr 13b40dc40


To be honest, I do not feel experienced enough to decide which
solution is acceptable. I would say that only few people care
about timestamps during boot. On the other hand, some tools
It is extremely important to know what happened and how long it took. I agree
with Petr, we should figure out a way to guarantee that the timestamp is monotonic.

P.

There is no guarantee of sequential order of delivery for kernel prints.

[TL;DR]

The timestamp may very well be monotonic in the general case, and yet it may take longer to _compose_ or _deliver_ the message (multiprocessor, via vsprint, etc). To get sequential order the timestamp _has_ to be reliably monotonic and _collected_ at incident _before_ anything else, before message is composed, and then (land in-place, post-sorted, what evz?) be presented in time sorted order.

User space can also inject into the kernel print buffer, it certainly does not support at-incident timestamp collection as it is delayed already by a syscall or two. We would need to allow user space (via fast vdso) collect timestamp(s), then make it part of the payload of the report to even get close.

It is out-of-scope for this RFC to consider rewriting the timestamp acquisition and delivery infrastructure for kernel prints IMHO. For now we all have accepted they will be _somewhat_ sequential within the confines of races and composition.

Sincerely -- Mark Salyzyn