Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

From: David Rientjes
Date: Mon Apr 16 2007 - 14:52:33 EST


On Mon, 16 Apr 2007, Hugh Dickins wrote:

> When, as now, core mm people make some change, failing to take your
> case into account.
>

That's exactly what happened here: include/asm-i386/pgtable.h is
advertising both __HAVE_ARCH_PTEP_TEST_AND_CLEAR_{DIRTY,YOUNG} without
actually having them. No consideration is going to be given to such a
hack when clearing A/D bits in generic code. If the optimization for a
specific architecture is not possible for an atomic test and clear of A/D
bits generically, then it's not an optimization at all. It's a bug.

> You're right to want to defer your pte_updates, David is right to want
> to batch his TLB flushes. It bothers me that you have a surprising case,
> and that unless you abandon your optimization, it imposes a new constraint
> on how to proceed in common code (without #ifdef'ing around).
>

Batching the TLB flushes was really a no brainer: there was a significant
speed-up in /proc/pid/clear_refs by using flush_tlb_mm() over
flush_tlb_page(). If there's an "optimization" for paravirt that gets in
the way by defering the updated A/D bit, then it needs to be eliminated.
It was merged because there were no generic uses of
ptep_test_and_clear_{dirty,young}, but now there are.

If it was really considered to be an "optimization," which is was
advertised to be in the changelog associated with a very trivial patch,
then removing it by definition would not introduce new constraints that
would need to be patched. I'd like to see a patch without the defer and
see how invasive it truly is on generic code because it was not originally
merged that invasively.

> Compromise patch below: would that be satisfactory to you, David?
>

I really like the patch, but for perhaps a slightly different reason:
we're only flushing ranges that have been shown to need it. We aren't
completely flushing the entire mm which is likely to be excessive in
situations where we're actually using /proc/pid/clear_refs in combination
with /proc/pid/smaps for memory footprint approximation (i.e. it's on a
fine granularity).

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/