Re: 2.1.76, nfs client, and memory fragmentation

kwrohrer@enteract.com
Sat, 3 Jan 1998 21:45:39 -0600 (CST)


And lo, Dean Gaudet saith unto me:
> On Fri, 2 Jan 1998, Linus Torvalds wrote:
> > One approach is to have just a "hint-map", which might indeed be good
> > enough. The hint-map would contain entries only for private un-shared
> > pages - which is often a large fraction of the pages. HOWEVER, I suspect
> > that the UNIX fork()/exec() semantics would make even this be impractical:
> > on average a lot of pages are not shared, but most pages are shared at
> > least for a short while every once in a while. And even if they become
> > unshared quickly after being shared, they will have lost the information
> > about what single mapping they had.
> On a system running Apache there is a small set of pages that is shared
> between potentially hundreds of processes. I've seen httpds with SZ of
> 800k and SHARED of 720k on a particular busy server with 500 httpd
> children. This is a static-only server, on servers with a more dynamic
> mix the SZ will increase. On servers with mod_perl both SZ and SHARED can
> increase (there's more stuff which is constant after config time). In
> these cases the sharing stays in effect long after the fork().
For these situations, you want a good fragmentation avoidance strategy
anyway; order-0 and order-1 (4k, 8k) allocations really ought to gravitate
as far away from DMA-capable space (or otherwise towards one end of memory)
as we can manage, regardless of how we attempt to cope with fragmentation
after the fact. Frankly, even if I write the best defragmenter ever
written, the less it gets used the happier I'll be.

What about a patch like the following? It puts DMA-capable pages last on
the free lists, then searches backwards when it needs a big or DMA-capable
request. Obviously this doesn't make sense (or do anything useful) on all
architectures, and I haven't studied its effect on things that need DMA-
capable memory; it's proposed more as food for thought. Perhaps halfway
through physical memory ought to be the breakpoint on all architectures,
or the free area lists could be turned into binary search trees...

(Side note: are the macros, e.g. RMQUEUE, still better than inline
functions? Or are they just not-broken-don't-fix? Also, for platforms
which don't have the DMA distinction, should CAN_DMA(x) be 1, and
thus optimized out entirely?)

[patch relative to 2.1.76 plus Linus's ration-fragments patch]
```````````````````````````````````````````````````````````````````````````````
*** mm/page_alloc.c.old Thu Jan 1 17:30:55 1998
--- mm/page_alloc.c Sat Jan 3 21:33:00 1998
***************
*** 62,75 ****
head->prev = memory_head(head);
}

static inline void add_mem_queue(struct free_area_struct * head, struct page * entry)
{
! struct page * next = head->next;

! entry->prev = memory_head(head);
! entry->next = next;
! next->prev = entry;
! head->next = entry;
}

static inline void remove_mem_queue(struct page * entry)
--- 62,87 ----
head->prev = memory_head(head);
}

+
+ /* NOTE: this should be #ifdefed to make sense on all architectures */
+ #define KeepTogether(entry) PageDMA(entry)
static inline void add_mem_queue(struct free_area_struct * head, struct page * entry)
{
! if (!KeepTogether(entry)) {
! struct page * next = head->next;

! entry->prev = memory_head(head);
! entry->next = next;
! next->prev = entry;
! head->next = entry;
! } else {
! struct page * prev = head->prev;
!
! entry->next = memory_head(head);
! entry->prev = prev;
! prev->next = entry;
! head->prev = entry;
! }
}

static inline void remove_mem_queue(struct page * entry)
***************
*** 161,173 ****
change_bit((index) >> (1+(order)), (area)->map)
#define CAN_DMA(x) (PageDMA(x))
#define ADDRESS(x) (PAGE_OFFSET + ((x) << PAGE_SHIFT))
! #define RMQUEUE(order, maxorder, dma) \
! do { struct free_area_struct * area = free_area+order; \
! unsigned long new_order = order; \
do { struct page *prev = memory_head(area), *ret = prev->next; \
while (memory_head(area) != ret) { \
if (new_order >= maxorder && ret->next == prev) \
break; \
if (!dma || CAN_DMA(ret)) { \
unsigned long map_nr = ret->map_nr; \
(prev->next = ret->next)->prev = prev; \
--- 173,201 ----
change_bit((index) >> (1+(order)), (area)->map)
#define CAN_DMA(x) (PageDMA(x))
#define ADDRESS(x) (PAGE_OFFSET + ((x) << PAGE_SHIFT))
! #define RMQUEUE(order, maxorder, dma) { \
! struct free_area_struct * area = free_area+order; \
! unsigned long new_order = order; \
! if (order<2 && !dma) { \
do { struct page *prev = memory_head(area), *ret = prev->next; \
while (memory_head(area) != ret) { \
if (new_order >= maxorder && ret->next == prev) \
break; \
+ unsigned long map_nr = ret->map_nr; \
+ (prev->next = ret->next)->prev = prev; \
+ MARK_USED(map_nr, new_order, area); \
+ nr_free_pages -= 1 << order; \
+ EXPAND(ret, map_nr, order, new_order, area); \
+ spin_unlock_irqrestore(&page_alloc_lock, flags); \
+ return ADDRESS(map_nr); \
+ } \
+ new_order++; area++; \
+ } while (new_order < NR_MEM_LISTS); \
+ } else { \
+ do { struct page *prev = memory_head(area), *ret = prev->prev; \
+ while (memory_head(area) != ret) { \
+ if (new_order >= maxorder && ret->next == prev) \
+ break; \
if (!dma || CAN_DMA(ret)) { \
unsigned long map_nr = ret->map_nr; \
(prev->next = ret->next)->prev = prev; \
***************
*** 178,188 ****
return ADDRESS(map_nr); \
} \
prev = ret; \
! ret = ret->next; \
} \
new_order++; area++; \
} while (new_order < NR_MEM_LISTS); \
! } while (0)

#define EXPAND(map,index,low,high,area) \
do { unsigned long size = 1 << high; \
--- 206,217 ----
return ADDRESS(map_nr); \
} \
prev = ret; \
! ret = ret->prev; \
} \
new_order++; area++; \
} while (new_order < NR_MEM_LISTS); \
! } }
!

#define EXPAND(map,index,low,high,area) \
do { unsigned long size = 1 << high; \

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Keith