Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY

From: Ying Han
Date: Thu Dec 04 2008 - 17:28:20 EST

Next message: Geoffrey McRae: "Re: New Security Features, Please Comment"
Previous message: Yinghai Lu: "Re: 2.6.28-rc6: early panic with boot option "nosmp""
In reply to: Nick Piggin: "Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY"
Next in thread: Török Edwin: "Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I am trying your test program(scalability) in house, but somehow i got
different result as you saw. i created 8 files each with 1G size on
separate drives( to avoid the latency disturbing of disk seek). I got
this number without applying the batch based on 2.6.26. May i ask how
to reproduce the mmap issue you mentioned?

8 CPU
read_worker
1 threads Real time: 101.058262 s (since task start)
2 threads Real time: 50.670456 s (since task start)
4 threads Real time: 25.904657 s (since task start)
8 threads Real time: 20.090677 s (since task start)
--------------------------------------------------------------------------------
mmap_worker
1 threads Real time: 101.340662 s (since task start)
2 threads Real time: 51.484646 s (since task start)
4 threads Real time: 28.414534 s (since task start)
8 threads Real time: 21.785818 s (since task start)

--Ying

On Thu, Nov 27, 2008 at 4:52 AM, Török Edwin <edwintorok@xxxxxxxxx> wrote:
> On 2008-11-27 14:39, Nick Piggin wrote:
>> On Thu, Nov 27, 2008 at 02:21:16PM +0200, Török Edwin wrote:
>>
>>> On 2008-11-27 14:03, Nick Piggin wrote:
>>>
>>>>> Running my testcase shows no significant performance difference. What am
>>>>> I doing wrong?
>>>>>
>>>>>
>>>>
>>>> Software may just be doing a lot of mmap/munmap activity. threads +
>>>> mmap is never going to be pretty because it is always going to involve
>>>> broadcasting tlb flushes to other cores... Software writers shouldn't
>>>> be scared of using processes (possibly with some shared memory).
>>>>
>>>>
>>> It would be interesting to compare the performance of a threaded clamd,
>>> and of a clamd that uses multiple processes.
>>> Distributing tasks will be a bit more tricky, since it would need to use
>>> IPC, instead of mutexes and condition variables.
>>>
>>
>> Yes, although you could use PTHREAD_PROCESS_SHARED pthread mutexes on
>> the shared memory I believe (having never tried it myself).
>>
>>
>>
>>>> Actually, a lot of things get faster (like malloc, or file descriptor
>>>> operations) because locks aren't needed.
>>>>
>>>> Despite common perception, processes are actually much *faster* than
>>>> threads when doing common operations like these. They are slightly slower
>>>> sometimes with things like creation and exit, or context switching, but
>>>> if you're doing huge numbers of those operations, then it is unlikely
>>>> to be a performance critical app... :)
>>>>
>>>>
>>> How about distributing tasks to a set of worked threads, is the overhead
>>> of using IPC instead of
>>> mutexes/cond variables acceptable?
>>>
>>
>> It is really going to depend on a lot of things. What is involved in
>> distributing tasks, how many cores and cache/TLB architecture of the
>> system running on, etc.
>>
>> You want to distribute as much work as possible while touching as
>> little memory as possible, in general.
>>
>> But if you're distributing threads over cores, and shared caches are
>> physically tagged (which I think all x86 CPUs are), then you should
>> be able to have multiple processes operate on shared memory just as
>> efficiently as multiple threads I think.
>>
>> And then you also get the advantages of reduced contention on other
>> shared locks and resources.
>>
>
> Thanks for the tips, but lets get back to the original question:
> why don't I see any performance improvement with the fault-retry patches?
>
> My testcase only compares reads file with mmap, vs. reading files with
> read, with different number of threads.
> Leaving aside other reasons why mmap is slower, there should be some
> speedup by running 4 threads vs 1 thread, but:
>
> 1 thread: read:27,18 28.76
> 1 thread: mmap: 25.45, 25.24
> 2 thread: read: 16.03, 15.66
> 2 thread: mmap: 22.20, 20.99
> 4 thread: read: 9.15, 9.12
> 4 thread: mmap: 20.38, 20.47
>
> The speed of 4 threads is about the same as for 2 threads with mmap, yet
> with read it scales nicely.
> And the patch doesn't seem to improve scalability.
> How can I find out if the patch works as expected? [i.e. verify that
> faults are actually retried, and that they don't keep the semaphore locked]
>
>> OK, I'll see if I can find them (am overseas at the moment, and I suspect
>> they are stranded on some stationary rust back home, but I might be able
>> to find them on the web).
>
> Ok.
>
> Best regards,
> --Edwin
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Geoffrey McRae: "Re: New Security Features, Please Comment"
Previous message: Yinghai Lu: "Re: 2.6.28-rc6: early panic with boot option "nosmp""
In reply to: Nick Piggin: "Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY"
Next in thread: Török Edwin: "Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]