Re: [PATCH 0/2] memcgroup: work better with tmpfs

From: Hugh Dickins
Date: Thu Dec 20 2007 - 11:38:27 EST


On Wed, 19 Dec 2007, Balbir Singh wrote:
> Hugh Dickins wrote:
>
> We always call mem_cgroup_isolate_pages() from shrink_(in)active_pages
> under spin_lock_irq of the zone's lru lock. That's the reason that we
> don't explicitly use it in the routine.

Indeed, thanks.

>
> > 2. There's mem_cgroup_charge and mem_cgroup_cache_charge (wouldn't the
> > former be better called mem_cgroup_charge_mapped? why does the latter
>
> Yes, it would be. After we've refactored the code, the new name makes sense.
>
> > test MEM_CGROUP_TYPE_ALL instead of MEM_CGROUP_TYPE_CACHED? I still don't
> > understand your enums there).
>
> We do that to ensure that we charge page cache pages only when the
> accounting type is set to MEM_CGROUP_TYPE_ALL. If the type is anything
> else, we ignore cached pages, we did not have MEM_CGROUP_TYPE_CACHED
> initially when the patches went in.

I think you've given yourself far too many degrees of freedom here.

Please explain to me how the system is supposed to behave when I
echo -n 2 >/cg/0/memory.control_type

>From the name in the enum (MEM_CGROUP_TYPE_CACHED, doesn't even have
an "= 2", yet this is a part of the API?), I think it's supposed to
account pages into and out of the page cache, but not as they're
mapped into and out of userspace. But from the code, no references
whatever to MEM_CGROUP_TYPE_CACHED or MEM_CGROUP_TYPE_MAPPED,
it looks as if it'll behave just like MEM_CGROUP_TYPE_MAPPED
(which we hope = 1).

As you say, you didn't have MEM_CGROUP_TYPE_CACHED originally:
it looks like it got added to the straightforward case of MAPPED,
in an apparently flexible but not fully thought out way.

>
> But there's only mem_cgroup_uncharge.
> > So when, for example, an add_to_page_cache fails, the uncharge may not
> > balance the charge?
> >
>
> We use mem_cgroup_uncharge() everywhere. The reason being, we might
> switch control type, we uncharge pages that have a page_cgroup
> associated with them, hence once we;ve charged, uncharge does not
> distinguish between charge types.

Ah, so this is the meaning of that
/*
* This can handle cases when a page is not charged at all and we
* are switching between handling the control_type.
*/
if (!pc)
return;

if (atomic_dec_and_test(&pc->ref_cnt)) {
at the beginning of mem_cgroup_uncharge.

Sorry, that doesn't fly. You've no locking between acquiring pc from
the page->page_cgroup, testing pc here, and decrementing its ref_cnt.
When that ref_cnt goes down to zero, clear_page_cgroup may NULLify
page->page_cgroup and kfree the pc.

So if you cannot really keep track of the ref_cnt (because of
changing control_type), that atomic_dec_and_test is in danger
of corrupting someone else's memory - isn't it?

I can just about imagine that some admins will want control_type
MAPPED and others CACHED and others MAPPED+CACHED. But is there
actually a need for one cgroup to be controlling MAPPED while
another on the same machine is controlling MAPPED+CACHED? And
does that make sense - there'll be weirdnesses, won't there?
And is there actually a need to change a cgroup's control_type
on the fly while it's alive?

Of course it's nice to be flexible and allow for such possibilities;
but not if that just opens windows for corruption. My own view is
that at present you should just account mapped and cached for all,
and strip out these extra degrees of freedom: add them back in some
future when the locking is worked out.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/