Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30

From: Daniel Phillips
Date: Mon Jul 26 2004 - 22:31:50 EST


On Monday 12 July 2004 22:31, Nick Piggin wrote:
> Daniel Phillips wrote:
> > On Monday 12 July 2004 16:54, Andrew Morton wrote:
> >>Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> >>>I don't see why it would be a problem to implement a "this task
> >>>facilitates page reclaim" flag for userspace tasks that would take
> >>>care of this as well as the kernel does.
> >>
> >>Yes, that has been done before, and it works - userspace "block drivers"
> >>which permanently mark themselves as PF_MEMALLOC to avoid the obvious
> >>deadlocks.
> >>
> >>Note that you can achieve a similar thing in current 2.6 by acquiring
> >>realtime scheduling policy, but that's an artifact of some brainwave
> >> which a VM hacker happened to have and isn't a thing which should be
> >> relied upon.
> >
> > Do you have a pointer to the brainwave?
>
> Search for rt_task in mm/page_alloc.c

Ah, interesting idea: realtime tasks get to dip into the PF_MEMALLOC reserve,
until it gets down to some threshold, then they have to give up and wait like
any other unwashed nobody of a process. _But_ if there's a user space
process sitting in the writeout path and some other realtime process eats the
entire realtime reserve, everything can still grind to a halt.

So it's interesting for realtime, but does not solve the userspace PF_MEMALLOC
inversion.

> >>A privileged syscall which allows a task to mark itself as one which
> >>cleans memory would make sense.
> >
> > For now we can do it with an ioctl, and we pretty much have to do it for
> > pvmove. But that's when user space drives the kernel by syscalls; there
> > is also the nasty (and common) case where the kernel needs userspace to
> > do something for it while it's in PF_MEMALLOC. I'm playing with ideas
> > there, but nothing I'm proud of yet. For now I see the in-kernel
> > approach as the conservative one, for anything that could possibly find
> > itself on the VM writeout path.
>
> You'd obviously want to make the PF_MEMALLOC task as tight as possible,
> and running mlocked:

Not just tight, but bounded. And tight too, of course.

> I don't particularly see why such a task would be any safer in-kernel.

The PF_MEMALLOC flag is inherited down a call chain, not across a pipe or
similar IPC to user space.

> PF_MEMALLOC tasks won't enter page reclaim at all. The only way they
> will reach the writeout path is if you are write(2)ing stuff (you may
> hit synch writeout).

That's the problem.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/