On Wed, Apr 26, 2023 at 05:10:14PM +0200, Vlastimil Babka wrote:
On 4/26/23 17:03, Baolin Wang wrote:
This reverts commit 95e7a450b8190673675836bfef236262ceff084a.
When I tested thpscale with v6.3 kernel, I found the compaction efficiency
had a great regression compared to v6.2-rc1 kernel. See below numbers:
v6.2-rc v6.3
Percentage huge-3 81.35 ( 0.00%) 32.97 ( -59.47%)
Percentage huge-5 89.92 ( 0.00%) 41.70 ( -53.63%)
Percentage huge-7 92.41 ( 0.00%) 34.08 ( -63.12%)
Percentage huge-12 90.29 ( 0.00%) 41.10 ( -54.49%)
Percentage huge-18 82.38 ( 0.00%) 41.24 ( -49.95%)
Percentage huge-24 80.34 ( 0.00%) 35.99 ( -55.20%)
Percentage huge-30 88.90 ( 0.00%) 44.20 ( -50.28%)
Percentage huge-32 90.69 ( 0.00%) 79.57 ( -12.25%)
Ops Compaction stalls 113790.00 207099.00
Ops Compaction success 33983.00 19488.00
Ops Compaction failures 79807.00 187611.00
Ops Compaction efficiency 29.86 9.41
After some investigation, I found the commit 95e7a450b819
("Revert mm/compaction: fix set skip in fast_find_migrateblock") caused
the regression. This commit revert the commit 7efc3b726103 ("mm/compaction:
fix set skip in fast_find_migrateblock") to fix a CPU stalling issue, which
is caused by compaction stucked in repeating fast_find_migrateblock().
And now the compaction stalling issue is addressed by commit cfccd2e63e7e
("mm, compaction: finish pageblocks on complete migration failure"). So
IIRC at that time I was pointing out some scenarios that could make the
problem appear even after that commit, and we wanted to revisit that
when Mel is back.
Yes, I've prototyped the fix against 6.3-rc7 and the revert is at the
end but the revert on its own has the potential for causing problems. The
series needs to be rebased, retested and posted. What I last tested
should show up shortly at
https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/ mm-follupfastmigrate-v1r1