Re: [PATCH v2] x86/tsc: Use topology_max_packages() to get package number

From: Feng Tang
Date: Mon Mar 25 2024 - 21:28:24 EST


On Mon, Mar 25, 2024 at 09:17:38PM -0400, Waiman Long wrote:
> On 3/24/24 23:09, Feng Tang wrote:
> > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> > on qualified platorms") was introduced to solve problem that
> > sometimes TSC clocksource is wrongly judged as unstable by watchdog
> > like 'jiffies', HPET, etc.
> >
> > In it, the hardware package number is a key factor for judging whether
> > to disable the watchdog for TSC, and 'nr_online_nodes' was chosen due
> > to it is available in early boot phase before registering 'tsc-early'
> > clocksource, where all non-boot CPUs are not brought up yet.
> >
> > Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
> > is cheated and not accurate, like:
> >
> > * numa emulation (numa=fake=8 etc.)
> > * numa=off
> > * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
> > * SNC (sub-numa cluster) mode enabled
> > * 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
> > * 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
> > not be onlined after boot
> >
> > Thomas' recent patchset of refactoring x86 topology code improves
> > topology_max_packages(), by making it more accurate and available in
> > early boot phase, which works well in most of the above cases.
> >
> > The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup. And
> > the reason is, during topology setup, the boot CPU iterates through
> > all enumerated APIC ids and either accepts or rejects the APIC id.
> > For accepted ids, it figures out which bits of the id map to the
> > package number. It tracks which package numbers have been seen in a
> > bitmap. topology_max_packages() just returns the number of bits set
> > in that bitmap.
> >
> > 'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
> > and can artificially lower the number of bits in the package bitmap
> > and thus topology_max_packages(). This means that, for example, a
> > system with 8 physical packages might reject all the CPUs on 6 of those
> > packages and be left with only 2 packages and 2 bits set in the package
> > bitmap. It needs the TSC watchdog, but would disable it anyway. This
> > isn't ideal, but this only happens for debug-oriented options. This is
> > fixable by tracking the package numbers for rejected CPUs. But it's
> > not worth the trouble for debugging.
> >
> > So use topology_max_packages() to replace nr_online_nodes().
> >
> > Reported-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > Closes: https://lore.kernel.org/lkml/a4860054-0f16-6513-f121-501048431086@xxxxxxxxx/
> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
> > ---
> >
> > Hi all,
> >
> > For warning about possible compromise due to 'nr_cpus=' and 'possible_cpus=',
> > another alternative could be checking whether these has been setup in cmdline
> > inside tsc.c and warn there.
> >
> > Changelog:
> >
> > Since v1:
> >
> > * Use Dave's detailed elaboration about 'nr_cpus=', 'possible_cpus='
> > possibly compromising '__max_logical_packages' in commit log
> > * Fix typos and inaccuracy pointed out by Rui and Longman
> >
> > arch/x86/kernel/cpu/topology.c | 5 ++++-
> > arch/x86/kernel/tsc.c | 7 ++-----
> > 2 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> > index 3259b1d4fefe..2db03b00e29b 100644
> > --- a/arch/x86/kernel/cpu/topology.c
> > +++ b/arch/x86/kernel/cpu/topology.c
> > @@ -460,8 +460,11 @@ void __init topology_init_possible_cpus(void)
> > pr_info("Num. threads per package: %3u\n", __num_threads_per_package);
> > pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
> > - if (topo_info.nr_rejected_cpus)
> > + if (topo_info.nr_rejected_cpus) {
> > pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
> > + if (__max_logical_packages <= 4)
> > + pr_warn("TSC might be buggered due to the rejected CPUs\n");
> > + }
>
> People may sometimes use kernel option like "panic_on_warn=1" to cause a
> crash dump on warning. Maybe we should just use pr_info() as the presence of
> rejected CPUs are unlikely to be a real problem from the TSC stability point
> of view. To emphasize it a bit more, we could add a "WARNING: " prefix, for
> example.

Good catch! will chagne it to pr_info. The possible hurt of these
debug options are relatively small. In the past several years, I
haven't got a real case that TSC is really broken.

Thanks,
Feng

>
> Cheers,
> Longman
>