Re: [PATCH] x86/PCI: Convert force_disable_hpet() to standard quirk

From: Xiongfeng Wang
Date: Thu Sep 29 2022 - 21:06:03 EST




On 2022/9/30 8:38, Feng Tang wrote:
> On Thu, Sep 29, 2022 at 11:52:28PM +0800, Yu Liao wrote:
>> On 2020/12/2 15:28, Zhang Rui wrote:
>>> On Mon, 2020-11-30 at 20:21 +0100, Thomas Gleixner wrote:
>>>> Feng,
>>>>
>>>> On Fri, Nov 27 2020 at 14:11, Feng Tang wrote:
>>>>> On Fri, Nov 27, 2020 at 12:27:34AM +0100, Thomas Gleixner wrote:
>>>>>> On Thu, Nov 26 2020 at 09:24, Feng Tang wrote:
>>>>>> Yes, that can happen. But OTOH, we should start to think about
>>>>>> the
>>>>>> requirements for using the TSC watchdog.
>>>
>>> My original proposal is to disable jiffies and refined-jiffies as the
>>> clocksource watchdog, because they are not reliable and it's better to
>>> use clocksource that has a hardware counter as watchdog, like the patch
>>> below, which I didn't sent out for upstream.
>>>
>>> >From cf9ce0ecab8851a3745edcad92e072022af3dbd9 Mon Sep 17 00:00:00 2001
>>> From: Zhang Rui <rui.zhang@xxxxxxxxx>
>>> Date: Fri, 19 Jun 2020 22:03:23 +0800
>>> Subject: [RFC PATCH] time/clocksource: do not use refined-jiffies as watchdog
>>>
>>> On IA platforms, if HPET is disabled, either via x86 early-quirks, or
>>> via kernel commandline, refined-jiffies will be used as clocksource
>>> watchdog in early boot phase, before acpi_pm timer registered.
>>>
>>> This is not a problem if jiffies are accurate.
>>> But in some cases, for example, when serial console is enabled, it may
>>> take several milliseconds to write to the console, with irq disabled,
>>> frequently. Thus many ticks may become longer than it should be.
>>>
>>> Using refined-jiffies as watchdog in this case breaks the system because
>>> a) duration calculated by refined-jiffies watchdog is always consistent
>>> with the watchdog timeout issued using add_timer(), say, around 500ms.
>>> b) duration calculated by the running clocksource, usually TSC on IA
>>> platforms, reflects the real time cost, which may be much larger.
>>> This results in the running clocksource being disabled erroneously.
>>>
>>> This is reproduced on ICL because HPET is disabled in x86 early-quirks,
>>> and also reproduced on a KBL and a WHL platform when HPET is disabled
>>> via command line.
>>>
>>> BTW, commit fd329f276eca
>>> ("x86/mtrr: Skip cache flushes on CPUs with cache self-snooping") is
>>> another example that refined-jiffies causes the same problem when ticks
>>> become slow for some other reason.
>>
>> Hi, Zhang Rui, we have met the same problem as you mentioned above. I have
>> tested the following modification. It can solve the problem. Do you have plan
>> to push it to upstream ?
>
> Hi Liao Yu,
>
> Could you provoide more details? Like, what ARCH is the platform (x86
> or others), client or sever, if sever, how many sockets (2S/4S/8S)?
>
> The error kernel log will also be helpful.

Hi, Feng Tang,

It's a X86 Sever. lscpu print the following information:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 224
On-line CPU(s) list: 0-223
Thread(s) per core: 2
Core(s) per socket: 28
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
Stepping: 4
CPU MHz: 3199.379
CPU max MHz: 3800.0000
CPU min MHz: 1000.0000
BogoMIPS: 5000.00
Virtualization: VT-x
L1d cache: 3.5 MiB
L1i cache: 3.5 MiB
L2 cache: 112 MiB
L3 cache: 154 MiB
NUMA node0 CPU(s): 0-27,112-139
NUMA node1 CPU(s): 28-55,140-167
NUMA node2 CPU(s): 56-83,168-195
NUMA node3 CPU(s): 84-111,196-223

Part of the kernel log is as follows.

[ 1.144402] smp: Brought up 4 nodes, 224 CPUs
[ 1.144402] smpboot: Max logical packages: 4
[ 1.144402] smpboot: Total of 224 processors activated (1121097.93 BogoMIPS)
[ 1.520003] clocksource: timekeeping watchdog on CPU2: Marking clocksource
'tsc-early' as unstable because the skew is too large:
[ 1.520010] clocksource: 'refined-jiffies' wd_now:
fffb7210 wd_last: fffb7018 mask: ffffffff
[ 1.520013] clocksource: 'tsc-early' cs_now:
6606717afddd0 cs_last: 66065eff88ad4 mask: ffffffffffffffff
[ 1.520015] tsc: Marking TSC unstable due to clocksource watchdog
[ 5.164635] node 0 initialised, 98233092 pages in 4013ms
[ 5.209294] node 3 initialised, 98923232 pages in 4057ms
[ 5.220001] node 2 initialised, 99054870 pages in 4068ms
[ 5.222282] node 1 initialised, 99054870 pages in 4070ms

Thanks,
Xiongfeng

>
> Thanks,
> Feng
>
>> Thanks,
>> Liao Yu
>>
>>>
>>> IMO, the right solution is to only use hardware clocksource as watchdog.
>>> Then even if ticks are slow, both the running clocksource and the watchdog
>>> returns real time cost, and they still match.
>>>
>>> Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
>>> ---
>>> kernel/time/clocksource.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>> index 02441ead3c3b..e7e703858fa6 100644
>>> --- a/kernel/time/clocksource.c
>>> +++ b/kernel/time/clocksource.c
>>> @@ -364,6 +364,10 @@ static void clocksource_select_watchdog(bool fallback)
>>> watchdog = NULL;
>>>
>>> list_for_each_entry(cs, &clocksource_list, list) {
>>> + /* Do not use refined-jiffies as clocksource watchdog */
>>> + if (cs->rating <= 2)
>>> + continue;
>>> +
>>> /* cs is a clocksource to be watched. */
>>> if (cs->flags & CLOCK_SOURCE_MUST_VERIFY)
>>> continue;
>>
> .
>