Re: [RFC] memory defragmentation to satisfy high order allocations

From: Hirokazu Takahashi
Date: Mon Oct 04 2004 - 21:55:28 EST


Hi, Marcelo

> > > > > Why's that? You can copy anonymous pages without adding them to swap (thats
> > > > > what the patch I posted does).
> > > >
> > > > The reason is to guarantee that any anonymous page can be migrated anytime.
> > > > I want to block newly occurred accesses to the page during the migration
> > > > because it can't be migrated if there remain some references on it by
> > > > system calls, direct I/O and page faults.
> > >
> > > It would be nice if we could block pte faults in a way such to not need
> > > adding each anonymous page to swap. It can be too costly if you have a lot memory
> > > and it makes the whole operation dependable on swap size (if you dont have enough
> > > swap, you're dead).
> > >
> > > Maybe hold mm->page_table_lock (might be too costly in terms of CPU time, but since
> > > migration is not a common operation anyway), or create a semaphore?
> >
> > I think the problem of the holding mm->page_table_lock approach is
> > that it doesn't allow the migration code blocked. the semaphore
> > approach would be better.
>
> OK, I think the problem is that can be more than one thread with different address
> spaces (ie different "current->mm", after fork) accessing the page.

Yes.

> Adding a waitqueue to "anon_vma" structure (to be slept at do_swap_page time,
> and awake after copy-page-and-flags/unlock), can be do the job I think.
>
> > I have another idea that each anonymous page can detach its swap entry
> > after its migration.
>
> Yes thats a nice idea.

I'll do it in a week.

> > It can be done by remove_exclusive_swap_page()
> > if the page is remapped to the same spaces forcibly by
> > touch_unmapped_address() I made.
>
> touch_unmapped_address() ?

I've made two functions.

record_unmapped_address() holds mm and address where a target page
has been mapped. touch_unmapped_address() remaps the page again.

> > > > Your approach will work fine on most of anonymous pages, which aren't
> > > > heavily accessed. I think it will be enough for memory defragmentation.
> > >
> > > Yes...
> > >
> > > > > 3) At migrate_page_common you assume additional page references
> > > > > (page_migratable returning -EAGAIN) means the code should try to writeout
> > > > > the page.
> > > > >
> > > > > Is that assumption always valid?
> > > >
> > > > -EAGAIN means that the page may require to be written back
> > >
> > > But why is it needed to writeout pages? We shouldnt need to. At least
> > > from what I can understand.
> >
> > The migration code allows each filesystem to implement its own
> > migration code or just use migrate_page_buffer() or
> > migrate_page_common().
> >
> > migrate_page_common() is a default function if filesystem doesn't
> > implement anything. The function is the most generic and it tries
> > to writeback pages only if they are dirty and have buffers.
>
> The thing is: What is the point of writing out pages?

It was the easiest way to handle pages with buffers when Iwamoto
and I started to implement it. We thought it was slow but it would
work for all kinds of filesystems.

> We're just trying to migrate pages to another zone.
>
> If its under writeout, wait, if its dirty, just move it to the other
> zone.
>
> Can you enlight me?

Yes, I also realize that.
migrate_page_buffer() will do this, but I'm not certain it will work
for all kinds of filesystems. I guess there might be some exceptions.
We may need a special operation to handle pages on a filesystem,
which has releasepage method.


Thanks,
Hirokazu Takahashi.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/