Re: [BUG] 4.4.x-rt - memcg: refill_stock() use get_cpu_light() has data corruption issue

From: Steven Rostedt
Date: Wed Nov 22 2017 - 07:00:05 EST


On Wed, 22 Nov 2017 06:36:45 +0100
Mike Galbraith <efault@xxxxxx> wrote:

> On Tue, 2017-11-21 at 22:50 -0500, Steven Rostedt wrote:
> >
> > Does it work if you revert the patch?
>
> That would restore the gripe. ÂHow about this..

Would it?

The gripe you report is:

refill_stock()
get_cpu_var()
drain_stock()
res_counter_uncharge()
res_counter_uncharge_until()
spin_lock() <== boom


But commit 3e32cb2e0a1 ("mm: memcontrol: lockless page counters")
changed that code to this:

static void drain_stock(struct memcg_stock_pcp *stock)
{
struct mem_cgroup *old = stock->cached;

if (stock->nr_pages) {
- unsigned long bytes = stock->nr_pages * PAGE_SIZE;
-
- res_counter_uncharge(&old->res, bytes);
+ page_counter_uncharge(&old->memory, stock->nr_pages);
if (do_swap_account)
- res_counter_uncharge(&old->memsw, bytes);
+ page_counter_uncharge(&old->memsw, stock->nr_pages);
stock->nr_pages = 0;
}

Where we replaced res_counter_uncharge() which is this:

u64 res_counter_uncharge_until(struct res_counter *counter,
struct res_counter *top,
unsigned long val)
{
unsigned long flags;
struct res_counter *c;
u64 ret = 0;

local_irq_save(flags);
for (c = counter; c != top; c = c->parent) {
u64 r;
spin_lock(&c->lock);
r = res_counter_uncharge_locked(c, val);
if (c == counter)
ret = r;
spin_unlock(&c->lock);
}
local_irq_restore(flags);
return ret;
}

u64 res_counter_uncharge(struct res_counter *counter, unsigned long val)
{
return res_counter_uncharge_until(counter, NULL, val);
}

and has that spin lock, to this:

void page_counter_cancel(struct page_counter *counter, unsigned long nr_pages)
{
long new;

new = atomic_long_sub_return(nr_pages, &counter->count);
/* More uncharges than charges? */
WARN_ON_ONCE(new < 0);
}

void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages)
{
struct page_counter *c;

for (c = counter; c; c = c->parent)
page_counter_cancel(c, nr_pages);
}

You see. No more spin lock to gripe about. No boom in your scenario.

-- Steve