Re: [PATCH] cfq-iosched: non-rot devices do not need read queue merging

From: Corrado Zoccolo
Date: Sun Jan 10 2010 - 07:56:00 EST


Hi,

On Tue, Jan 5, 2010 at 10:48 PM, Corrado Zoccolo <czoccolo@xxxxxxxxx> wrote:
> On Tue, Jan 5, 2010 at 10:19 PM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
>> Vivek Goyal <vgoyal@xxxxxxxxxx> writes:
>>
>>> Thanks Jeff, one thing comes to mind. Now with recent changes, we drive deeper
>>> depths on SSD with NCQ and there are not many pending cfqq on service tree
>>> until and unless number of parallel threads exceed NCQ depth (32). If
>>> that's the case, then I think we might not be seeing lot of queue merging
>>> happening in this test case until and unless dump utility is creating more
>>> than 32 threads.
>>>
>>> If time permits, it might also be interesting to run the same test with queue
>>> depth 1 and see if SSDs without NCQ will suffer or not.
>>
>> Corrado, I think what Vivek is getting at is that you should check for
>> both blk_queue_nonrot and cfqd->hw_tag (like in cfq_arm_slice_timer).
>> Do you agree?
> Well, actually I didn't want to distinguish on hw_tag here. I had to
> still allow merging of writes exactly because a write merge can save
> hundreds of ms on a non-NCQ SSD.
>
> Vivek is right that on non-NCQ SSDs a successful merge would increase
> the performance, but I still think that the likelyhood of a merge is
> so low that maintaining the RB-tree is superfluous. Usually, those
> devices are coupled with low-end CPUs, so saving the code execution
> could be a win there too. I'll run some tests on my netbook.
>
> BTW, I'm looking at read-test2 right now. I see it doesn't use direct
> I/O, so it relies also on page cache. I think page cache can detect
> the hidden sequential pattern, and thus send big readahead requests to
> the device, making merging impossible (on my SSD, readahead size and
> max hw request size match).
>
I did some tests, and found a surprising thing.
Simply running the test script, the BW levels to a high BW value,
regardless of queue merging in CFQ is enabled or disabled.
I suspected something odd was going on, so I modified the script to
drop caches before each run, and now I found that with queue merging
it is 3 times faster than without, so on non-ncq SSD it is better to
have queue merging enabled, after all.
I'm wondering why the page cache being full can give so large results.
My disk is 4 times larger than the available RAM, so it should do just
a 1/4 boost not clearing it.
I have to do more tests to understand what's going on...

Thanks,
Corrado
>
>>
>> Cheers,
>> Jeff
>>
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/