Re: [dm-devel] [PATCH 0/2] Introduce the request handling for dm-crypt

From: Zdenek Kabelac
Date: Thu Dec 03 2015 - 06:07:23 EST


Dne 3.12.2015 v 11:36 Baolin Wang napsal(a):
On 3 December 2015 at 10:56, Baolin Wang <baolin.wang@xxxxxxxxxx> wrote:
On 3 December 2015 at 03:56, Alasdair G Kergon <agk@xxxxxxxxxx> wrote:
On Wed, Dec 02, 2015 at 08:46:54PM +0800, Baolin Wang wrote:
These are the benchmarks for request based dm-crypt. Please check it.

Now please put request-based dm-crypt completely to one side and focus
just on the existing bio-based code. Why is it slower and what can be
adjusted to improve this?


OK. I think I find something need to be point out.
1. From the IO block size test in the performance report, for the
request based, we can find it can not get the corresponding
performance if we just expand the IO size. Because In dm crypt, it
will map the data buffer of one request with scatterlists, and send
all scatterlists of one request to the encryption engine to encrypt or
decrypt. I found if the scatterlist list number is small and each
scatterlist length is bigger, it will improve the encryption speed,
that helps the engine palys best performance. But a big IO size does
not mean bigger scatterlists (maybe many scatterlists with small
length), that's why we can not get the corresponding performance if we
just expand the IO size I think.

2. Why bio based is slower?
If you understand 1, you can obviously understand the crypto engine
likes bigger scatterlists to improve the performance. But for bio
based, it only send one scatterlist (the scatterlist's length is
always '1 << SECTOR_SHIFT' = 512) to the crypto engine at one time. It
means if the bio size is 1M, the bio based will send 2048 times (evey
time the only one scatterlist length is 512 bytes) to crypto engine to
handle, which is more time-consuming and ineffective for the crypto
engine. But for request based, it can map the whole request with many
scatterlists (not just one scatterlist), and send all the scatterlists
to the crypto engine which can improve the performance, is it right?

Another optimization solution I think is we can expand the scatterlist
entry number for bio based.


I did some testing about my assumption of expanding the scatterlist
entry number for bio based. I did some modification for the bio based
to support multiple scatterlists, then it will get the same
performance as the request based things.

1. bio based with expanding the scatterlist entry
time dd if=/dev/dm-0 of=/dev/null bs=64K count=16384 iflag=direct
1073741824 bytes (1.1 GB) copied, 94.5458 s, 11.4 MB/s
real 1m34.562s
user 0m0.030s
sys 0m3.850s

2. Sequential read 1G with requset based:
time dd if=/dev/dm-0 of=/dev/null bs=64K count=16384 iflag=direct
1073741824 bytes (1.1 GB) copied, 94.8922 s, 11.3 MB/s
real 1m34.908s
user 0m0.030s
sys 0m4.000s

From the data, we can find the bio based also can get the same
performance as the request based. So if someone still don't like the
request based things, I think we can optimize the bio based by
expanding the scatterlists number. Thanks.



Hi

Do you see any performance impact if you use with cryptsetup options:

--perf-same_cpu_crypt
--perf-submit_from_crypt_cpus

with your regular unpatched kernel.

Zdenek

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/