Re: [PATCH v6 7/8] mm: only IPI CPUs to drain local pages if they exist

From: Gilad Ben-Yossef
Date: Wed Jan 11 2012 - 11:10:25 EST


On Wed, Jan 11, 2012 at 9:04 AM, Milton Miller <miltonm@xxxxxxx> wrote:

>> > > @@ -1097,7 +1105,19 @@ void drain_local_pages(void *arg)
>> > >   */
>> > >  void drain_all_pages(void)
>> > >  {
>> > > - on_each_cpu(drain_local_pages, NULL, 1);
>> > > + int cpu;
>> > > + struct per_cpu_pageset *pcp;
>> > > + struct zone *zone;
>> > > +
>> > > + for_each_online_cpu(cpu)
>> > > +         for_each_populated_zone(zone) {
>> > > +                 pcp = per_cpu_ptr(zone->pageset, cpu);
>> > > +                 if (pcp->pcp.count)
>> > > +                         cpumask_set_cpu(cpu, cpus_with_pcps);
>> > > +                 else
>> > > +                         cpumask_clear_cpu(cpu, cpus_with_pcps);
>> > > +         }
>> > > + on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
>> >
>
> This will only select cpus that have pcp pages in the last zone,
> not pages in any populated zone.

Oy! you are very right.

>
> Call cpu_mask_clear before the for_each_online_cpu loop, and only
> set cpus in the loop.
>
> Hmm, that means we actually need a lock for using the static mask.

We don't have to clear before the loop or lock. We can do something like this:

int cpu;
struct per_cpu_pageset *pcp;
struct zone *zone;

for_each_online_cpu(cpu) {
bool has_pcps = false;
for_each_populated_zone(zone) {
pcp = per_cpu_ptr(zone->pageset, cpu);
if (pcp->pcp.count) {
has_pcps = true;
break;
}
}
if (has_pcps)
cpumask_set_cpu(cpu, cpus_with_pcps);
else
cpumask_clear_cpu(cpu, cpus_with_pcps);
}
on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);


>
> -- what would happen without any lock?
> [The last to store would likely see all the cpus and send them IPIs
> but the earlier, racing callers would miss some cpus (It will test
> our smp_call_function_many locking!), and would retry before all the
> pages were drained from pcp pools].  The locking should be better
> than racing -- one cpu will peek and pull all per-cpu data to itself,
> then copy the mask to its smp_call_function_many element, then wait
> for all the cpus it found cpus to process their list pulling their
> pcp data back to the owners.   Not much sense in the racing cpus
> figighting to bring the per-cpu data to themselves to write to the
> now contended static mask while pulling the zone pcp data from the
> owning cpus that are trying to push to the buddy lists.
>

That is a very good point and I guess you are right that the locking
saves cache bounces and probably also some IPIs.

I am not 100% it is a win though. The lockless approach is simpler,
has zero risk of dead locks. I also fear that any non interruptable lock
(be it spinlock or interruptable lock non mutex) might delay an OOM
in progress.

Having the lock there is not a correctness issue - it is a performance
enhancement. Because this is not a fast path, i would go for simple
and less risk and not have the lock there at all.

> The benefit numbers in the changelog need to be measured again
> after this correction.
>

Yes, of course. Preliminary numbers with the lockless version above
still looks good.

I am guessing though that Mel's patch will make all this moot anyway
since if we're not doing a drain_all in the direct reclaim we're left with
very rare code paths (memory error and memory migration) that call
the drain_all and they wont tend to do that concurrently like the direct
reclaim.

>
>
> Disabling preemption around online loop and the call will prevent
> races with cpus going offline, but we do not that we care as the
> offline notification will cause the notified cpu to drain that pool.
> on_each_cpu_mask disables preemption as part of its processing.
>
> If we document the o-e-c-m behaviour then we should consider putting a
> comment here, or at least put the previous paragraph in the change log.

I noticed that Mel's patch already added get_online_cpus() in drain_all.


>
>> > >  }
>> > >
>> > >  #ifdef CONFIG_HIBERNATION
>> > > @@ -3601,6 +3621,10 @@ static void setup_zone_pageset(struct zone *zone)
>> > >  void __init setup_per_cpu_pageset(void)
>> > >  {
>> > >   struct zone *zone;
>> > > + int ret;
>> > > +
>> > > + ret = zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL);
>> > > + BUG_ON(!ret);
>> > >
>
> Switching to cpumask_t will eliminate this hunk.   Even if we decide
> to keep it a cpumask_var_t we don't need to pre-zero it as we set
> the entire mask so alloc_cpumask_var would be sufficient.
>

hmm... then we can make it a static local variable and it doesn't even
make the kernel image really bigger since it's in the BSS. Cool.

Thanks,

Gilad



--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@xxxxxxxxxxxxx
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/