Re: PATCH: Further aacraid work

From: William Lee Irwin III
Date: Fri Jun 18 2004 - 10:08:32 EST


On Thu, Jun 17, 2004 at 04:38:42PM -0400, Alan Cox wrote:
>> What do the stats look like with the patch Andrew Morton (I think) posted
>> to reverse the page order from the allocator ?

On Thu, Jun 17, 2004 at 01:48:28PM -0700, William Lee Irwin III wrote:
> Say, could you guys try this? jejb seemed to get decent results with it.

Proper changelog this time, and comments, too. Adaptec et al, please
verify this resolves the issues you've been having.
Someone say _something_.

---

Based on Arjan van de Ven's idea, with guidance and testing from
James Bottomley.

The physical ordering of pages delivered to the IO subsystem is
strongly related to the order in which fragments are subdivided from
larger blocks of memory tracked by the page allocator. Consider a
single MAX_ORDER block of memory in isolation acted on by a sequence of
order 0 allocations in an otherwise empty buddy system. Subdividing
the block beginning at the highest addresses will yield all the pages
of the block in reverse, and subdividing the block begining at the
lowest addresses will yield all the pages of the block in physical
address order. Empirical tests demonstrate this ordering is preserved,
and that changing the order of subdivision so that the lowest page is
split off first resolves the sglist merging difficulties encountered by
driver authors at Adaptec and others in James Bottomley's testing.
James found that before this patch, there were 40 merges out of about
32K segments. Afterward, there were 24007 merges out of 19513 segments,
for a merge rate of about 55%. Merges of 128 segments, the maximum
allowed, were observed afterward, where beforehand they never occurred.
It also improves dbench on my workstation and works fine there.

Signed-off-by: William Lee Irwin III <wli@xxxxxxxxxxxxxx>


diff -prauN linux-2.6.7/mm/page_alloc.c linux-2.6.7/mm/page_alloc.c
--- linux-2.6.7/mm/page_alloc.c Sat Jun 12 20:52:26 2004
+++ linux-2.6.7/mm/page_alloc.c Fri Jun 18 07:45:05 2004
@@ -290,6 +290,20 @@
#define MARK_USED(index, order, area) \
__change_bit((index) >> (1+(order)), (area)->map)

+/*
+ * The order of subdivision here is critical for the IO subsystem.
+ * Please do not alter this order without good reasons and regression
+ * testing. Specifically, as large blocks of memory are subdivided,
+ * the order in which smaller blocks are delivered depends on the order
+ * they're subdivided in this function. This is the primary factor
+ * influencing the order in which pages are delivered to the IO
+ * subsystem according to empirical testing, and this is also justified
+ * by considering the behavior of a buddy system containing a single
+ * large block of memory acted on by a series of small allocations.
+ * This behavior is a critical factor in sglist merging's success.
+ *
+ * -- wli
+ */
static inline struct page *
expand(struct zone *zone, struct page *page,
unsigned long index, int low, int high, struct free_area *area)
@@ -297,14 +311,12 @@
unsigned long size = 1 << high;

while (high > low) {
- BUG_ON(bad_range(zone, page));
area--;
high--;
size >>= 1;
- list_add(&page->lru, &area->free_list);
- MARK_USED(index, high, area);
- index += size;
- page += size;
+ BUG_ON(bad_range(zone, &page[size]));
+ list_add(&page[size].lru, &area->free_list);
+ MARK_USED(index + size, high, area);
}
return page;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/