Re: Question about PRIVATE_FUTEX

From: Darren Hart
Date: Fri Mar 27 2009 - 11:43:33 EST


Minchan Kim wrote:
On Fri, Mar 27, 2009 at 8:14 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
On Fri, 2009-03-27 at 19:56 +0900, Minchan Kim wrote:

Then, get_futex_value_locked calls __cpy_from_user_inatomic with
pagefault_disable.

Who make sure the user page is mapped at app's page table ?
Nobody, all uses of get_futex_value_locked() have to deal with it
returning -EFAULT.
Does It mean that __copy_from_user_inatomic in get_futex_value_locked
would be failed rather than sleep?
Correct.

In fact, I don't make sure _copy_from_user_inatomic function's meaning.
As far as I understand, It never sleep. It just can be failed in case
of user page isn't mapped. Is right ?
Correct.

Otherwise, it can be scheduled with pagefault_disable which increments
preempt_count. It is a atomic bug.
If my assume is right, it can be failed rather than sleep.
At this case, other architecture implements __copy_from_user_inatomic
with __copy_from_user which can be scheduled. It also can be bug.

Hmm, Now I am confusing.
Confused I guess ;-)
The trick is in the in_atomic() check in the pagefault handler and the
fixup section of the copy routines.

Whew~, There was good hidden trick.
I will dive into this assembly.
I always thanks for your kindness. :)

#define __copy_user(to, from, size) \
do { \
int __d0, __d1, __d2; \
__asm__ __volatile__( \
" cmp $7,%0\n" \
" jbe 1f\n" \
" movl %1,%0\n" \
" negl %0\n" \
" andl $7,%0\n" \
" subl %0,%3\n" \
"4: rep; movsb\n" \
" movl %3,%0\n" \
" shrl $2,%0\n" \
" andl $3,%3\n" \
" .align 2,0x90\n" \
"0: rep; movsl\n" \
" movl %3,%0\n" \
"1: rep; movsb\n" \
"2:\n" \
".section .fixup,\"ax\"\n" \
"5: addl %3,%0\n" \
" jmp 2b\n" \
"3: lea 0(%3,%0,4),%0\n" \
" jmp 2b\n" \
".previous\n" \
".section __ex_table,\"a\"\n" \
" .align 4\n" \
" .long 4b,5b\n" \
" .long 0b,3b\n" \
" .long 1b,2b\n" \
".previous" \
: "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2) \
: "3"(size), "0"(size), "1"(to), "2"(from) \
: "memory"); \
} while (0)

see that __ex_table section, it tells the fault handler where to
continue in case of an atomic fault.

Most of this is legacy btw, from when futex ops were done under the
mmap_sem. Back then we couldn't fault because that would cause mmap_sem
recursion. Howver, now that we don't hold mmap_sem anymore we could use
a faulting user access like get_user().
Darren has been working on patches to clean that up, some of those are
already merged in the -tip tree.

I'm a little late to the party I guess. Minchan, a lot of the fault logic has been cleaned up in the tip tree, core/futexes branch. The removes a lot of the legacy complication from the faulting paths. However, the get_futex_key code remains the same if I remember correctly.

Thanks for good information.
It will be very desirable way to enhance kernel performance.
I doubt it'll make a measurable difference, if you need to fault
performance sucks anyway. If you don't, the current code is just as
fast.


Agreed. If you are suffering performance hits from excessive paging, consider locking your memory.


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/