Re: [PATCH] Avoiding external fragmentation with a placement policyVersion 10

From: Joel Schopp
Date: Tue May 03 2005 - 14:31:52 EST


Comments inline below.

o Tightened what pools are used for fallbacks, less likely to fragment
o Many micro-optimisations to have the same performance as the standard allocator. Modified allocator now faster than standard allocator using
gcc 3.3.5

Nice.

o Increased the size of reserve for fallbacks from 10% to 12.5%.

This just screams out for a tunable. Systems with different workloads and different amounts of memory will behave better with different values. It would be even better if it would self tune, but that might prove difficult.

Difference in performance operations report generated by diff-aim9.sh from VMRegress 0.14
N Test Standard MBuddy V10 Diff % diff Test description
Ops/sec Ops/sec Ops/sec
-- ---------- --------- ---------- -------- ------ ----------------
1 add_double 460569.72 465222.46 4652.74 1.01% Thousand Double Precision Additions/second
2 add_float 460523.25 465322.45 4799.20 1.04% Thousand Single Precision Additions/second
3 add_long 1421763.04 1436042.64 14279.60 1.00% Thousand Long Integer Additions/second
4 add_int 1421763.04 1436042.64 14279.60 1.00% Thousand Integer Additions/second
5 add_short 1421363.11 1435760.71 14397.60 1.01% Thousand Short Integer Additions/second
7 page_test 121048.16 123059.49 2011.33 1.66% System Allocations & Pages/second
8 brk_test 445743.79 452407.93 6664.14 1.50% System Memory Allocations/second
9 jmp_test 4158416.67 4232083.33 73666.66 1.77% Non-local gotos/second
10 signal_test 94417.60 94584.24 166.64 0.18% Signal Traps/second
11 exec_test 65.04 66.69 1.65 2.54% Program Loads/second
12 fork_test 1537.82 1730.51 192.69 12.53% Task Creations/second
13 link_test 6411.28 6477.45 66.17 1.03% Link/Unlink Pairs/second

The aim9 results show that there are consistent improvements for common
page-related operations. The results are compiler dependant and there are
variances of 1-2% between versions.

Any explanation for why fork_test shows markedly better improvement compared to the others?

-#define __GFP_BITS_SHIFT 16 /* Room for 16 __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 18 /* Room for 16 __GFP_FOO bits */

Comment should have the new 18, not the old 16.

+#ifdef CONFIG_ALLOCSTATS
+ /*
+ * These are beancounters that track how the placement policy
+ * of the buddy allocator is performing
+ */
+ unsigned long fallback_count[ALLOC_TYPES];
+ unsigned long alloc_count[ALLOC_TYPES];
+ unsigned long reserve_count[ALLOC_TYPES];
+ unsigned long kernnorclm_full_steal;
+ unsigned long kernnorclm_partial_steal;
+ unsigned long bulk_requests[MAX_ORDER];
+ unsigned long bulk_alloced[MAX_ORDER];
+#endif

It would be nice if all of the CONFIG_ALLOCSTATS stuff was broken out as a second patch. It would make this patch much smaller and more readable.

+int fallback_allocs[ALLOC_TYPES][ALLOC_TYPES] = { + {ALLOC_KERNNORCLM, ALLOC_FALLBACK, ALLOC_KERNRCLM, ALLOC_USERRCLM},
+ {ALLOC_KERNRCLM, ALLOC_FALLBACK, ALLOC_KERNNORCLM, ALLOC_USERRCLM},

I would have thought that KernRclm would want to choose USERRCLM over KERNNOCRLM.

+ {ALLOC_USERRCLM, ALLOC_FALLBACK, ALLOC_KERNNORCLM, ALLOC_KERNRCLM},

I'm almost certain the UserRclm type should prefer KERNRCLM over KERNNORCLM.


+ * Here, the alloc type lists has been depleted as well as the global
+ * pool, so fallback. When falling back, the largest possible block
+ * will be taken to keep the fallbacks clustered if possible
+ */

I was curious if you had tried taking the smallest possible block. I would think that it would reduce the amount of fallback needed, and thus increase the amount available for the 3 allocation types. I would expect a net win, despite not clustering fallbacks particularly well.

+ alloctype = fallback_list[retry_count];
+
+ /* Find a block to allocate */
+ area = zone->free_area_lists[alloctype] + (MAX_ORDER-1);
+ current_order=MAX_ORDER;
+ do {
+ current_order--;
+ if (list_empty(&area->free_list)) {
+ area--;
+ continue;
+ }
+
+ goto remove_page;
+ } while (current_order != order);
+ }

This loop is a bit hard to understand. I think it would be easier to understand if it looked something like this (totally untested):

+ current_order=MAX_ORDER - 1 ;
+ do {
+ if (!list_empty(&area->free_list)) {
+ goto remove_page;
+ }
+
+ area--;
+ current_order--;
+ } while (current_order >= order);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/