Re: [PATCH v9 05/14] mm: multi-gen LRU: groundwork

From: Prarit Bhargava
Date: Mon Mar 21 2022 - 15:18:13 EST


On 3/21/22 14:58, Justin Forbes wrote:
On Mon, Mar 14, 2022 at 4:30 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:

On Mon, Mar 14, 2022 at 2:09 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote:

Hi, Yu,

Yu Zhao <yuzhao@xxxxxxxxxx> writes:
diff --git a/mm/Kconfig b/mm/Kconfig
index 3326ee3903f3..747ab1690bcf 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -892,6 +892,16 @@ config ANON_VMA_NAME
area from being merged with adjacent virtual memory areas due to the
difference in their name.

+# the multi-gen LRU {
+config LRU_GEN
+ bool "Multi-Gen LRU"
+ depends on MMU
+ # the following options can use up the spare bits in page flags
+ depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP)

LRU_GEN depends on !MAXSMP. So, What is the maximum NR_CPUS supported
by LRU_GEN?

LRU_GEN doesn't really care about NR_CPUS. IOW, it doesn't impose a
max number. The dependency is with NODES_SHIFT selected by MAXSMP:
default "10" if MAXSMP
This combined with LAST_CPUPID_SHIFT can exhaust the spare bits in page flags.

MAXSMP is meant for kernel developers to test their code, and it
should not be used in production [1]. But some distros unfortunately
ship kernels built with this option, e.g., Fedora and Ubuntu. And
their users reported build errors to me after they applied MGLRU on
those kernels ("Not enough bits in page flags"). Let me add Fedora and
Ubuntu to this thread.

Fedora and Ubuntu,

Could you please clarify if there is a reason to ship kernels built
with MAXSMP? Otherwise, please consider disabling this option. Thanks.

As per above, MAXSMP enables ridiculously large numbers of CPUs and
NUMA nodes for testing purposes. It is detrimental to performance,
e.g., CPUMASK_OFFSTACK.

It was enabled for Fedora, and RHEL because we did need more than 512
CPUs, originally only in RHEL until SGI (years ago) complained that
they were testing very large machines with Fedora. The testing done
on RHEL showed that the performance impact was minimal. For a very
long time we had MAXSMP off and carried a patch which allowed us to
turn on CPUMASK_OFFSTACK without debugging because there was supposed
to be "something else" coming. In 2019 we gave up, dropped that patch
and just turned on MAXSMP.

I do not have any metrics for how often someone runs Fedora on a
ridiculously large machine these days, but I would guess that number
is not 0.

It is not 0. I've seen data from large systems (1000+ logical threads) that are running Fedora albeit with a modified Fedora kernel.

Additionally the max limit for CPUS in RHEL is 1792, however, we have recently had a request to *double* that to 3584. You should just assume that number will continue to increase.

P.



Justin

[1] https://lore.kernel.org/lkml/20131106055634.GA24044@xxxxxxxxx/

_______________________________________________
kernel mailing list -- kernel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to kernel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/kernel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure