Re: [GIT PULL] kdbus for 4.1-rc1

From: Michal Hocko
Date: Mon Apr 20 2015 - 08:43:18 EST


On Fri 17-04-15 11:54:42, Andy Lutomirski wrote:
> On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
> > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
> >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
> >> > Hi
> >> >
> >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> >> >> Whose memcg does the pool use?
> >> >
> >> > The pool-owner's (i.e., the receiver's).
> >> >
> >> >> If it's the receiver's, and if the
> >> >> receiver can configure a memcg, then it seems that even a single
> >> >> receiver could probably cause the sender to block for an unlimited
> >> >> amount of time.
> >> >
> >> > How? Which of those calls can block? I don't see how that can happen.
> >>
> >> I admit I don't fully understand memcg, but vfs_iter_write is
> >> presumably going to need to get write access to the target pool page,
> >> and that, in turn, will need that page to exist in memory and to be
> >> writable, which may need to page it in and/or allocate a page. If
> >> that uses the receiver's memcg (as it should), then the receiver can
> >> make it block. Even if it doesn't use the receiver's memcg, it can
> >> trigger direct reclaim, I think.
> >
> > Yes, memcg direct reclaim might trigger but we are no longer waiting for
> > the OOM victim from non page fault paths so the time is bounded. It
> > still might a quite some time, though, depending on the amount of work
> > done in the direct reclaim.
>
> Is that still true if OOM notifiers are involved? I've lost track of
> what changed there.

memcg OOM is not triggered from get_user_pages. See 519e52473ebe (mm:
memcg: enable memcg OOM killer only for user faults)

> Any any event, I'm not entirely convinced that having a broadcast send
> cause, say, PID 1 to block until an unbounded number of pages in a
> potentially unbounded number of memcgs are reclaimed is a good idea.

This deserves a clarification I guess. It is the memcg of the current
task which gets charged during the page fault normally. So if PID1 tries
to fault the memory in it will be its (most probably root) memcg which
gets charged. If the memory was already charged to a different task's
memcg and then it got swapped out, though, the PID1 would indeed wait
for the reclaim in the target memcg to swap the page back in.

In either case this sounds like a potential problem, because tasks
could hide their memory charges from the limit or PID1 context could
be blocked. But maybe I just misunderstood the and an uncharged memory
cannot be used for the buffer.

> In the kdbus model's favor, I think that allowing pages of data in the
> receive queue to be swapped out is potentially quite nice, but I'm
> less convinced about non-full pages in the receive queue. There's a
> resource management tradeoff here, and one nice thing about AF_UNIX is
> that sends are genuinely non-blocking.
>
> --Andy

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/