Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdi variable

From: Namjae Jeon
Date: Mon Nov 19 2012 - 18:18:50 EST

Next message: Elliott, Robert (Server Storage): "RE: [PATCH 3/3] target/iblock: Add WRITE_SAME w/ UNMAP=0 emulationsupport"
Previous message: Jason Gunthorpe: "Re: [PATCH 407/493] infiniband: remove use of __devexit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

2012/10/22, Dave Chinner <david@xxxxxxxxxxxxx>:
> On Fri, Oct 19, 2012 at 04:51:05PM +0900, Namjae Jeon wrote:
>> Hi Dave.
>>
>> Test Procedure:
>>
>> 1) Local USB disk WRITE speed on NFS server is ~25 MB/s
>>
>> 2) Run WRITE test(create 1 GB file) on NFS Client with default
>> writeback settings on NFS Server. By default
>> bdi->dirty_background_bytes = 0, that means no change in default
>> writeback behaviour
>>
>> 3) Next we change bdi->dirty_background_bytes = 25 MB (almost equal to
>> local USB disk write speed on NFS Server)
>> *** only on NFS Server - not on NFS Client ***
>
> Ok, so the results look good, but it's not really addressing what I
> was asking, though. A typical desktop PC has a disk that can do
> 100MB/s and GbE, so I was expecting a test that showed throughput
> close to GbE maximums at least (ie. around that 100MB/s). I have 3
> year old, low end, low power hardware (atom) that hanles twice the
> throughput you are testing here, and most current consumer NAS
> devices are more powerful than this. IOWs, I think the rates you are
> testing at are probably too low even for the consumer NAS market to
> consider relevant...
>
>> ----------------------------------------------------------------------------------
>> Multiple NFS Client test:
>> -----------------------------------------------------------------------------------
>> Sorry - We could not arrange multiple PCs to verify this.
>> So, we tried 1 NFS Server + 2 NFS Clients using 3 target boards:
>> ARM Target + 512 MB RAM + ethernet - 100 Mbits/s, create 1 GB File
>
> But this really doesn't tells us anything - it's still only 100Mb/s,
> which we'd expect is already getting very close to line rate even
> with low powered client hardware.
>
> What I'm concerned about the NFS server "sweet spot" - a $10k server
> that exports 20TB of storage and can sustain close to a GB/s of NFS
> traffic over a single 10GbE link with tens to hundreds of clients.
> 100MB/s and 10 clients is about the minimum needed to be able to
> extrapolate a litle and make an informed guess of how it will scale
> up....
>
>> > 1. what's the comparison in performance to typical NFS
>> > server writeback parameter tuning? i.e. dirty_background_ratio=5,
>> > dirty_ratio=10, dirty_expire_centiseconds=1000,
>> > dirty_writeback_centisecs=1? i.e. does this give change give any
>> > benefit over the current common practice for configuring NFS
>> > servers?
>>
>> Agreed, that above improvement in write speed can be achieved by
>> tuning above write-back parameters.
>> But if we change these settings, it will change write-back behavior
>> system wide.
>> On the other hand, if we change proposed per bdi setting,
>> bdi->dirty_background_bytes it will change write-back behavior for the
>> block device exported on NFS server.
>
> I already know what the difference between global vs per-bdi tuning
> means. What I want to know is how your results compare
> *numerically* to just having a tweaked global setting on a vanilla
> kernel. i.e. is there really any performance benefit to per-bdi
> configuration that cannot be gained by existing methods?
>
>> > 2. what happens when you have 10 clients all writing to the server
>> > at once? Or a 100? NFS servers rarely have a single writer to a
>> > single file at a time, so what impact does this change have on
>> > multiple concurrent file write performance from multiple clients
>>
>> Sorry, we could not arrange more than 2 PCs for verifying this.
>
> Really? Well, perhaps there's some tools that might be useful for
> you here:
>
> http://oss.sgi.com/projects/nfs/testtools/
>
> "Weber
>
> Test load generator for NFS. Uses multiple threads, multiple
> sockets and multiple IP addresses to simulate loads from many
> machines, thus enabling testing of NFS server setups with larger
> client counts than can be tested with physical infrastructure (or
> Virtual Machine clients). Has been useful in automated NFS testing
> and as a pinpoint NFS load generator tool for performance
> development."
>

Hi Dave,
We ran "weber" test on below setup:
1) SATA HDD - Local WRITE speed ~120 MB/s, NFS WRITE speed ~90 MB/s
2) Used 10GbE - network interface to mount NFS

We ran "weber" test with NFS clients ranging from 1 to 100,
below is the % GAIN in NFS WRITE speed with
bdi->dirty_background_bytes = 100 MB at NFS server

-------------------------------------------------
| Number of NFS Clients |% GAIN in WRITE Speed |
|-----------------------------------------------|
| 1 | 19.83 % |
|-----------------------------------------------|
| 2 | 2.97 % |
|-----------------------------------------------|
| 3 | 2.01 % |
|-----------------------------------------------|
| 10 | 0.25 % |
|-----------------------------------------------|
| 20 | 0.23 % |
|-----------------------------------------------|
| 30 | 0.13 % |
|-----------------------------------------------|
| 100 | - 0.60 % |
-------------------------------------------------

with bdi->dirty_background_bytes setting at NFS server, we observed
that NFS WRITE speed improvement is maximum with single NFS client.
But WRITE speed improvement drops when Number of NFS clients increase
from 1 to 100.

So, bdi->dirty_background_bytes setting might be useful where we have
only one NFS client(scenario like ours).
But this is not useful for big NFS Servers which host hundreads of NFS clients.

Let me know your opinion.

Thanks.

>> > 3. Following on from the multiple client test, what difference does it
>> > make to file fragmentation rates? Writing more frequently means
>> > smaller allocations and writes, and that tends to lead to higher
>> > fragmentation rates, especially when multiple files are being
>> > written concurrently. Higher fragmentation also means lower
>> > performance over time as fragmentation accelerates filesystem aging
>> > effects on performance. IOWs, it may be faster when new, but it
>> > will be slower 3 months down the track and that's a bad tradeoff to
>> > make.
>>
>> We agree that there could be bit more framentation. But as you know,
>> we are not changing writeback settings at NFS clients.
>> So, write-back behavior on NFS client will not change - IO requests
>> will be buffered at NFS client as per existing write-back behavior.
>
> I think you misunderstand - writeback settings on the server greatly
> impact the way the server writes data and therefore the way files
> are fragmented. It has nothing to do with client side tuning.
>
> Effectively, what you are presenting is best case numbers - empty
> filesystem, single client, streaming write, no fragmentation, no
> allocation contention, no competing IO load that causes write
> latency occurring. Testing with lots of clients introduces all of
> these things, and that will greatly impact server behaviour.
> Aggregation in memory isolates a lot of this variation from
> writeback and hence smooths out a lot of the variability that leads
> to fragmentation, seeks, latency spikes and preamture filesystem
> aging.
>
> That is, if you set a 100MB dirty_bytes limit on a bdi it will give
> really good buffering for a single client doing a streaming write.
> If you've got 10 clients, then assuming fair distribution of server
> resources, then that is 10MB per client per writeback trigger.
> That's line ball as to whether it will cause fragmentation severe
> enough to impact server throughput. If you've got 100 clients,then
> that's only 1MB per client per writeback trigger, and that's
> definitely too low to maintain decent writeback behaviour. i.e.
> you're now writing 100 files 1MB at a time, and that tends towards
> random IO patterns rather than sequential IO patterns. Seek time
> dertermines throughput, not IO bandwidth limits.
>
> IOWs, as the client count goes up, the writeback patterns will tends
> more towards random IO than sequential IO unless the amount of
> buffering allowed before writeback triggers also grows. That's
> important, because random IO is much slower than sequential IO.
> What I'd like to have is some insight into whether this patch
> changes that inflection point, for better or for worse. The only way
> to find that is to run multi-client testing....
>
>> > 5. Are the improvements consistent across different filesystem
>> > types? We've had writeback changes in the past cause improvements
>> > on one filesystem but significant regressions on others. I'd
>> > suggest that you need to present results for ext4, XFS and btrfs so
>> > that we have a decent idea of what we can expect from the change to
>> > the generic code.
>>
>> As mentioned in the above Table 1 & 2, performance gain in WRITE speed
>> is different on different file systems i.e. different on NFS client
>> over XFS & EXT4.
>> We also tried BTRFS over NFS, but we could not see any WRITE speed
>> performance gain/degrade on BTRFS over NFS, so we are not posting
>> BTRFS results here.
>
> You should post btrfs numbers even if they show no change. It wasn't
> until I got this far that I even realised that you'd even tested
> BTRFS. I don't know what to make of this, because I don't know what
> the throughput rates compared to XFS and EXT4 are....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Elliott, Robert (Server Storage): "RE: [PATCH 3/3] target/iblock: Add WRITE_SAME w/ UNMAP=0 emulationsupport"
Previous message: Jason Gunthorpe: "Re: [PATCH 407/493] infiniband: remove use of __devexit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]