Re: [PATCH 2/9] mm: arm ready for split ptlock

From: Hugh Dickins
Date: Sun Oct 23 2005 - 03:36:47 EST


Many thanks for your rapid response...

On Sat, 22 Oct 2005, Russell King wrote:
> On Sat, Oct 22, 2005 at 05:22:20PM +0100, Hugh Dickins wrote:
> > Signal handling's preserve and restore of iwmmxt context currently
> > involves reading and writing that context to and from user space, while
> > holding page_table_lock to secure the user page(s) against kswapd. If
> > we split the lock, then the structure might span two pages, secured by
> > different locks. That would be manageable; but it seems simpler just
> > to read into and write from a kernel stack buffer, copying that out and
> > in without locking (the structure is 160 bytes in size, and here we're
> > near the top of the kernel stack). Or would the overhead be noticeable?
>
> Please contact Nicolas Pitre about that - that was my suggestion,
> but ISTR apparantly the overhead is too high.

Right, I've CC'ed him, and will forward the original patch in a moment.

It'd be perfectly possible to lock two extents in there, just seems like
over-engineering, and yet more strange mm-dependent code down in the arch
than I'd like to put there.

If Nicolas insists on avoiding the copies, then the simplest answer is
to disable split ptlock in the CONFIG_IWMMXT case too, and replace my
signal.c mods by just a comment highlighting the issue.

ARM is not the architecture (ia64) which prompted this page fault
scalability business. I've no idea of its benefit or otherwise on ARM.
Just didn't want ARM to be the only one not invited to the party.

> > arm_syscall's cmpxchg emulation use pte_offset_map_lock, instead of
> > pte_offset_map and mm-wide page_table_lock; and strictly, it should now
> > also take mmap_sem before descending to pmd, to guard against another
> > thread munmapping, and the page table pulled out beneath this thread.
>
> Now that I look at it, it's probably buggy - if the page isn't already
> dirty, it will modify without the COW action. Again, please contact
> Nicolas about this.

That hadn't crossed my mind, good catch. (Though bad both before and
after my change.) I did wonder about the alignment, the assumption that
*(unsigned long *)addr all fits within the same page - but assumed ARM's
instructions are suitably aligned?

It does look like that cmpxchg emulation needs to be done another way,
quite unrelated to my changes.

> > Updated two comments in fault-armv.c. adjust_pte is interesting, since
> > its modification of a pte in one part of the mm depends on the lock held
> > when calling update_mmu_cache for a pte in some other part of that mm.
> > This can't be done with a split page_table_lock (and we've already taken
> > the lowest lock in the hierarchy here): so we'll have to disable split
> > on arm, unless CONFIG_CPU_CACHE_VIPT to ensures adjust_pte never used.
>
> Well, adjust_pte is extremely critical to ensure correct cache behaviour
> (and therefore data integrity) so if split ptlock is incompatible with
> this, split ptlock loses.

Unquestionably.

> As far as adjust_pte being called, it's only called for VIVT caches,
> which means the configuration has to do if VIVT, disable split ptlock.

Yes, patch 7/9 disables it on ARM unless CPU_CACHE_VIPT
(since in other cases cache_is_vivt may be determined at runtime).

config SPLIT_PTLOCK_CPUS
int
default "4096" if ARM && !CPU_CACHE_VIPT
default "4096" if PARISC && DEBUG_SPINLOCK && !64BIT
default "4"

Please let me know when ARM supports 4096 cpus, I'll change that then!

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/