Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

From: Ingo Molnar
Date: Thu Jun 11 2015 - 11:26:12 EST



* Mel Gorman <mgorman@xxxxxxx> wrote:

> > > I made a really clear and unambiguous chain of arguments:
> > >
> > > - I'm unconvinced about the benefits of INVLPG in general, and your patches adds
> > > a whole new bunch of them. [...]
> >
> > ... and note that your claim that 'we were doing them before, this is just an
> > equivalent transformation' is utter bullsh*t technically: what we were doing
> > previously was a hideously expensive IPI combined with an INVLPG.
>
> And replacing it with an INVLPG without excessive IPI transmission is changing
> one major variable. Going straight to a full TLB flush is changing two major
> variables. [...]

But this argument employs the fallacy that the transition to 'batching with PFN
tracking' is not a major variable in itself. In reality it is a major variable: it
adds extra complexity, such as the cross CPU data flow (the pfn tracking), and it
also changes the distribution of the flushes and related caching patterns.

> > [...]
> >
> > The batching limit (which you set to 32) should then be tuned by comparing it
> > to a working full-flushing batching logic, not by comparing it to the previous
> > single IPI per single flush approach!
>
> We can decrease it easily but increasing it means we also have to change
> SWAP_CLUSTER_MAX because otherwise enough pages are not unmapped for flushes and
> it is a requirement that we flush before freeing the pages. That changes another
> complex variable because at the very least, it alters LRU lock hold times.

('should then be' implied 'as a separate patch/series', obviously.)

I.e. all I wanted to observe is that I think the series did not explore the
performance impact of the batching limit, because it was too focused on the INVLPG
approach which has an internal API limit of 33.

Now that the TLB flushing side is essentially limit-less, a future enhancement
would be to further increase batching.

My suspicion is that say doubling SWAP_CLUSTER_MAX would possibly further reduce
the IPI rate, at a negligible cost.

But this observation does not impact the current series in any case.

> > ... and if the benefits of a complex algorithm are not measurable and if there
> > are doubts about the cost/benefit tradeoff then frankly it should not exist in
> > the kernel in the first place. It's not like the Linux TLB flushing code is
> > too boring due to overwhelming simplicity.
> >
> > and yes, it's my job as a maintainer to request measurements justifying
> > complexity and your ad hominem attacks against me are disgusting - you should
> > know better.
>
> It was not intended as an ad hominem attack and my apologies for that. I wanted
> to express my frustration that a series that adjusted one variable with known
> benefit will be rejected for a series that adjusts two major variables instead
> with the second variable being very sensitive to workload and CPU.

... but that's not what it did: it adjusted multiple complex variables already,
with a questionable rationale for more complexity.

And my argument was and continues to be: start with the simplest variant and
iterate from there. Which you seem to have adapted in your latest series, so my
concerns are addressed:

Acked-by: Ingo Molnar <mingo@xxxxxxxxxx>

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/