Or did you have something else in mind?
I understand Waiman's concern here, and in our test patch, the 2Thank you both for the data!I'm not sure the detail in Waiman's cases, and in our cases (stress-ng)On one of the test system, I had measured that normal delay
the delay between watchdog's (HPET here) read were not linear, that
from debug data, sometimes the 3-2 difference could be bigger or much
bigger than the 2-1 difference.
The reason could be the gap between 2 reads depends hugely on the system
pressure at that time that 3 HPET read happens. On our test box (a
2-Socket Cascade Lake AP server), the 2-1 and 3-2 difference are stably
about 2.5 us, while under the stress it could be bumped to from 6 us
to 2800 us.
So I think checking the 3-2 difference plus increasing the max retries
to 10 may be a simple way, if the watchdog read is found to be
abnormally long, we skip this round of check.
(hpet->tsc->hpet) was normally a bit over 2us. It was a bit more than 4us at
bootup time. However, the same system under stress could have a delay of
over 200us at bootup time. When I measured the consecutive hpet delay, it
was about 180us. So hpet read did dominate the total clocksource read delay.
I would not suggest increasing the max retries as it may still fail in mostIf anything, adding that extra read would cause me to -reduce- the number
cases because the system stress will likely not be going away within a short
time. So we are likely just wasting cpu times. I believe we should just skip
it if it is the watchdog read that is causing most of the delay.
of retries to avoid increasing the per-watchdog overhead.
consecutive watchdog read delay check is done inside this retrying
loop accompanying the 'cs' read, and once an abnormal delay is found,
the watchdog check is skipped without waiting for the max-retries to
complete.
Our test data shows the consecutive delay is not always big even when
the system is much stressed, that's why I suggest to increase the
retries.