Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

From: Yang Shi
Date: Wed Apr 17 2019 - 16:43:59 EST




I would also not touch the numa balancing logic at this stage and rather
see how the current implementation behaves.
I agree we would prefer start from something simpler and see how it works.

The "twice access" optimization is aimed to reduce the PMEM bandwidth burden
since the bandwidth of PMEM is scarce resource. I did compare "twice access"
to "no twice access", it does save a lot bandwidth for some once-off access
pattern. For example, when running stress test with mmtest's
usemem-stress-numa-compact. The kernel would promote ~600,000 pages with
"twice access" in 4 hours, but it would promote ~80,000,000 pages without
"twice access".
I pressume this is a result of a synthetic workload, right? Or do you
have any numbers for a real life usecase?

The test just uses usemem.

I tried to run some more real life like usecases, the below shows the result by running mmtest's db-sysbench-mariadb-oltp-rw-medium test, which is a typical database workload, with and w/o "twice access" optimization.

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Â ÂÂÂÂÂÂÂÂÂÂ w/ ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ w/o
promotionÂÂÂÂÂÂÂÂÂ 32771ÂÂÂÂÂÂÂÂÂÂ 312250

We can see the kernel did 10x promotion w/o "twice access" optimization.

I also tried kernel-devel and redis tests in mmtest, but they can't generate enough memory pressure, so I had to run usemem test to generate memory pressure. However, this brought in huge noise, particularly for the w/o "twice access" case. But, the mysql test should be able to demonstrate the improvement achieved by this optimization.

And, I'm wondering whether this optimization is also suitable to general NUMA balancing or not.