Re: [PATCH] zram: add zstd to the supported algorithms list

From: Minchan Kim
Date: Fri Aug 25 2017 - 00:50:47 EST


Hi Sergey,

On Thu, Aug 24, 2017 at 11:04:40PM +0900, Sergey Senozhatsky wrote:
> Hi,
>
> On (08/24/17 13:30), Minchan Kim wrote:
> > Hello Sergey,
> >
> > On Thu, Aug 24, 2017 at 10:49:36AM +0900, Sergey Senozhatsky wrote:
> > > Add ZSTD to the list of supported compression algorithms.
> > >
> > > Official benchmarks [1]:
> >
> > First of all, thanks for the work!
> >
> > I want to ask one thing.
> >
> > Could you add some benchmark(e.g.,) result(comp ratio and speed)
> > compared to (inflate, lzo, lz4)?
> >
> > I want to see how much it's good for small data that ours is 4K.
>
>
> so on my syntetic fio test (with a static buffer):
>
>
> LZO DEFLATE ZSTD
>
> #jobs1
> WRITE: (2180MB/s) (77.2MB/s) (1429MB/s)
> WRITE: (1617MB/s) (77.7MB/s) (1202MB/s)
> READ: (426MB/s) (595MB/s) (1181MB/s)
> READ: (422MB/s) (572MB/s) (1020MB/s)
> READ: (318MB/s) (67.8MB/s) (563MB/s)
> WRITE: (318MB/s) (67.9MB/s) (564MB/s)
> READ: (336MB/s) (68.3MB/s) (583MB/s)
> WRITE: (335MB/s) (68.2MB/s) (582MB/s)
> #jobs2
> WRITE: (3441MB/s) (152MB/s) (2141MB/s)
> WRITE: (2507MB/s) (147MB/s) (1888MB/s)
> READ: (801MB/s) (1146MB/s) (1890MB/s)
> READ: (767MB/s) (1096MB/s) (2073MB/s)
> READ: (621MB/s) (126MB/s) (1009MB/s)
> WRITE: (621MB/s) (126MB/s) (1009MB/s)
> READ: (656MB/s) (125MB/s) (1075MB/s)
> WRITE: (657MB/s) (126MB/s) (1077MB/s)
> #jobs3
> WRITE: (4772MB/s) (225MB/s) (3394MB/s)
> WRITE: (3905MB/s) (211MB/s) (2939MB/s)
> READ: (1216MB/s) (1608MB/s) (3218MB/s)
> READ: (1159MB/s) (1431MB/s) (2981MB/s)
> READ: (906MB/s) (156MB/s) (1457MB/s)
> WRITE: (907MB/s) (156MB/s) (1458MB/s)
> READ: (953MB/s) (158MB/s) (1595MB/s)
> WRITE: (952MB/s) (157MB/s) (1593MB/s)
> #jobs4
> WRITE: (6036MB/s) (265MB/s) (4469MB/s)
> WRITE: (5059MB/s) (263MB/s) (3951MB/s)
> READ: (1618MB/s) (2066MB/s) (4276MB/s)
> READ: (1573MB/s) (1942MB/s) (3830MB/s)
> READ: (1202MB/s) (227MB/s) (1971MB/s)
> WRITE: (1200MB/s) (227MB/s) (1968MB/s)
> READ: (1265MB/s) (226MB/s) (2116MB/s)
> WRITE: (1264MB/s) (226MB/s) (2114MB/s)
> #jobs5
> WRITE: (5339MB/s) (233MB/s) (3781MB/s)
> WRITE: (4298MB/s) (234MB/s) (3276MB/s)
> READ: (1626MB/s) (2048MB/s) (4081MB/s)
> READ: (1567MB/s) (1929MB/s) (3758MB/s)
> READ: (1174MB/s) (205MB/s) (1747MB/s)
> WRITE: (1173MB/s) (204MB/s) (1746MB/s)
> READ: (1214MB/s) (208MB/s) (1890MB/s)
> WRITE: (1215MB/s) (208MB/s) (1892MB/s)
> #jobs6
> WRITE: (5666MB/s) (270MB/s) (4338MB/s)
> WRITE: (4828MB/s) (267MB/s) (3772MB/s)
> READ: (1803MB/s) (2058MB/s) (4946MB/s)
> READ: (1805MB/s) (2156MB/s) (4711MB/s)
> READ: (1334MB/s) (235MB/s) (2135MB/s)
> WRITE: (1335MB/s) (235MB/s) (2137MB/s)
> READ: (1364MB/s) (236MB/s) (2268MB/s)
> WRITE: (1365MB/s) (237MB/s) (2270MB/s)
> #jobs7
> WRITE: (5474MB/s) (270MB/s) (4300MB/s)
> WRITE: (4666MB/s) (266MB/s) (3817MB/s)
> READ: (2022MB/s) (2319MB/s) (5472MB/s)
> READ: (1924MB/s) (2260MB/s) (5031MB/s)
> READ: (1369MB/s) (242MB/s) (2153MB/s)
> WRITE: (1370MB/s) (242MB/s) (2155MB/s)
> READ: (1499MB/s) (246MB/s) (2310MB/s)
> WRITE: (1497MB/s) (246MB/s) (2307MB/s)
> #jobs8
> WRITE: (5558MB/s) (273MB/s) (4439MB/s)
> WRITE: (4763MB/s) (271MB/s) (3918MB/s)
> READ: (2201MB/s) (2599MB/s) (6062MB/s)
> READ: (2105MB/s) (2463MB/s) (5413MB/s)
> READ: (1490MB/s) (252MB/s) (2238MB/s)
> WRITE: (1488MB/s) (252MB/s) (2236MB/s)
> READ: (1566MB/s) (254MB/s) (2434MB/s)
> WRITE: (1568MB/s) (254MB/s) (2437MB/s)
> #jobs9
> WRITE: (5120MB/s) (264MB/s) (4035MB/s)
> WRITE: (4531MB/s) (267MB/s) (3740MB/s)
> READ: (1940MB/s) (2258MB/s) (4986MB/s)
> READ: (2024MB/s) (2387MB/s) (4871MB/s)
> READ: (1343MB/s) (246MB/s) (2038MB/s)
> WRITE: (1342MB/s) (246MB/s) (2037MB/s)
> READ: (1553MB/s) (238MB/s) (2243MB/s)
> WRITE: (1552MB/s) (238MB/s) (2242MB/s)
> #jobs10
> WRITE: (5345MB/s) (271MB/s) (3988MB/s)
> WRITE: (4750MB/s) (254MB/s) (3668MB/s)
> READ: (1876MB/s) (2363MB/s) (5150MB/s)
> READ: (1990MB/s) (2256MB/s) (5080MB/s)
> READ: (1355MB/s) (250MB/s) (2019MB/s)
> WRITE: (1356MB/s) (251MB/s) (2020MB/s)
> READ: (1490MB/s) (252MB/s) (2202MB/s)
> WRITE: (1488MB/s) (252MB/s) (2199MB/s)
>
> jobs1 perfstat
> instructions 52,065,555,710 ( 0.79) 855,731,114,587 ( 2.64) 54,280,709,944 ( 1.40)
> branches 14,020,427,116 ( 725.847) 101,733,449,582 (1074.521) 11,170,591,067 ( 992.869)
> branch-misses 22,626,174 ( 0.16%) 274,197,885 ( 0.27%) 25,915,805 ( 0.23%)
> jobs2 perfstat
> instructions 103,633,110,402 ( 0.75) 1,710,822,100,914 ( 2.59) 107,879,874,104 ( 1.28)
> branches 27,931,237,282 ( 679.203) 203,298,267,479 (1037.326) 22,185,350,842 ( 884.427)
> branch-misses 46,103,811 ( 0.17%) 533,747,204 ( 0.26%) 49,682,483 ( 0.22%)
> jobs3 perfstat
> instructions 154,857,283,657 ( 0.76) 2,565,748,974,197 ( 2.57) 161,515,435,813 ( 1.31)
> branches 41,759,490,355 ( 670.529) 304,905,605,277 ( 978.765) 33,215,805,907 ( 888.003)
> branch-misses 74,263,293 ( 0.18%) 759,746,240 ( 0.25%) 76,841,196 ( 0.23%)
> jobs4 perfstat
> instructions 206,215,849,076 ( 0.75) 3,420,169,460,897 ( 2.60) 215,003,061,664 ( 1.31)
> branches 55,632,141,739 ( 666.501) 406,394,977,433 ( 927.241) 44,214,322,251 ( 883.532)
> branch-misses 102,287,788 ( 0.18%) 1,098,617,314 ( 0.27%) 103,891,040 ( 0.23%)
> jobs5 perfstat
> instructions 258,711,315,588 ( 0.67) 4,275,657,533,244 ( 2.23) 269,332,235,685 ( 1.08)
> branches 69,802,821,166 ( 588.823) 507,996,211,252 ( 797.036) 55,450,846,129 ( 735.095)
> branch-misses 129,217,214 ( 0.19%) 1,243,284,991 ( 0.24%) 173,512,278 ( 0.31%)
> jobs6 perfstat
> instructions 312,796,166,008 ( 0.61) 5,133,896,344,660 ( 2.02) 323,658,769,588 ( 1.04)
> branches 84,372,488,583 ( 520.541) 610,310,494,402 ( 697.642) 66,683,292,992 ( 693.939)
> branch-misses 159,438,978 ( 0.19%) 1,396,368,563 ( 0.23%) 174,406,934 ( 0.26%)
> jobs7 perfstat
> instructions 363,211,372,930 ( 0.56) 5,988,205,600,879 ( 1.75) 377,824,674,156 ( 0.93)
> branches 98,057,013,765 ( 463.117) 711,841,255,974 ( 598.762) 77,879,009,954 ( 600.443)
> branch-misses 199,513,153 ( 0.20%) 1,507,651,077 ( 0.21%) 248,203,369 ( 0.32%)
> jobs8 perfstat
> instructions 413,960,354,615 ( 0.52) 6,842,918,558,378 ( 1.45) 431,938,486,581 ( 0.83)
> branches 111,812,574,884 ( 414.224) 813,299,084,518 ( 491.173) 89,062,699,827 ( 517.795)
> branch-misses 233,584,845 ( 0.21%) 1,531,593,921 ( 0.19%) 286,818,489 ( 0.32%)
> jobs9 perfstat
> instructions 465,976,220,300 ( 0.53) 7,698,467,237,372 ( 1.47) 486,352,600,321 ( 0.84)
> branches 125,931,456,162 ( 424.063) 915,207,005,715 ( 498.192) 100,370,404,090 ( 517.439)
> branch-misses 256,992,445 ( 0.20%) 1,782,809,816 ( 0.19%) 345,239,380 ( 0.34%)
> jobs10 perfstat
> instructions 517,406,372,715 ( 0.53) 8,553,527,312,900 ( 1.48) 540,732,653,094 ( 0.84)
> branches 139,839,780,676 ( 427.732) 1,016,737,699,389 ( 503.172) 111,696,557,638 ( 516.750)
> branch-misses 259,595,561 ( 0.19%) 1,952,570,279 ( 0.19%) 357,818,661 ( 0.32%)
>
>
> seconds elapsed 20.630411534 96.084546565 12.743373571
> seconds elapsed 22.292627625 100.984155001 14.407413560
> seconds elapsed 22.396016966 110.344880848 14.032201392
> seconds elapsed 22.517330949 113.351459170 14.243074935
> seconds elapsed 28.548305104 156.515193765 19.159286861
> seconds elapsed 30.453538116 164.559937678 19.362492717
> seconds elapsed 33.467108086 188.486827481 21.492612173
> seconds elapsed 35.617727591 209.602677783 23.256422492
> seconds elapsed 42.584239509 243.959902566 28.458540338
> seconds elapsed 47.683632526 269.635248851 31.542404137
>
>
> over all, ZSTD has slower WRITE, but much faster READ (perhaps a static
> compression buffer helps ZSTD a lot), which results in faster test results.
>
> now, memory consumption (zram mm_stat file)
>
> zram-LZO-mm_stat
> mm_stat (jobs1): 2147483648 23068672 33558528 0 33558528 0 0
> mm_stat (jobs2): 2147483648 23068672 33558528 0 33558528 0 0
> mm_stat (jobs3): 2147483648 23068672 33558528 0 33562624 0 0
> mm_stat (jobs4): 2147483648 23068672 33558528 0 33558528 0 0
> mm_stat (jobs5): 2147483648 23068672 33558528 0 33558528 0 0
> mm_stat (jobs6): 2147483648 23068672 33558528 0 33562624 0 0
> mm_stat (jobs7): 2147483648 23068672 33558528 0 33566720 0 0
> mm_stat (jobs8): 2147483648 23068672 33558528 0 33558528 0 0
> mm_stat (jobs9): 2147483648 23068672 33558528 0 33558528 0 0
> mm_stat (jobs10): 2147483648 23068672 33558528 0 33562624 0 0
>
> zram-DEFLATE-mm_stat
> mm_stat (jobs1): 2147483648 16252928 25178112 0 25178112 0 0
> mm_stat (jobs2): 2147483648 16252928 25178112 0 25178112 0 0
> mm_stat (jobs3): 2147483648 16252928 25178112 0 25178112 0 0
> mm_stat (jobs4): 2147483648 16252928 25178112 0 25178112 0 0
> mm_stat (jobs5): 2147483648 16252928 25178112 0 25178112 0 0
> mm_stat (jobs6): 2147483648 16252928 25178112 0 25178112 0 0
> mm_stat (jobs7): 2147483648 16252928 25178112 0 25190400 0 0
> mm_stat (jobs8): 2147483648 16252928 25178112 0 25190400 0 0
> mm_stat (jobs9): 2147483648 16252928 25178112 0 25178112 0 0
> mm_stat (jobs10): 2147483648 16252928 25178112 0 25178112 0 0
>
> zram-ZSTD-mm_stat
> mm_stat (jobs1): 2147483648 11010048 16781312 0 16781312 0 0
> mm_stat (jobs2): 2147483648 11010048 16781312 0 16781312 0 0
> mm_stat (jobs3): 2147483648 11010048 16781312 0 16785408 0 0
> mm_stat (jobs4): 2147483648 11010048 16781312 0 16781312 0 0
> mm_stat (jobs5): 2147483648 11010048 16781312 0 16781312 0 0
> mm_stat (jobs6): 2147483648 11010048 16781312 0 16781312 0 0
> mm_stat (jobs7): 2147483648 11010048 16781312 0 16781312 0 0
> mm_stat (jobs8): 2147483648 11010048 16781312 0 16781312 0 0
> mm_stat (jobs9): 2147483648 11010048 16781312 0 16785408 0 0
> mm_stat (jobs10): 2147483648 11010048 16781312 0 16781312 0 0

Thanks for the testing.
Could you resend the patch with this test result with my acked-by?

Acked-by: Minchan Kim <minchan@xxxxxxxxxx>

Off-topic:

In all ways, zstd beats deflate. Nick, Right?

With zstd, I doubt we should show "deflate" to user. Many options just
make user confused.
The inflate have been there for representing high comp ratio but slower
speed. However, zstd is better unconditionally compared to deflate
so how about replacing deflate with zstd?

Sergey, what do you think about it?