Re: [PATCH v9 0/9] Add latency priority for CFS class

From: Vincent Guittot
Date: Mon Nov 28 2022 - 12:19:53 EST


Hi Prateek,

On Mon, 28 Nov 2022 at 12:52, K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Hello Vincent,
>
> Following are the test results on dual socket Zen3 machine (2 x 64C/128T)
>
> tl;dr
>
> o All benchmarks with DEFAULT_LATENCY_NICE value are comparable to tip.
> There is, however, a noticeable dip for unixbench-spawn test case.
>
> o With the 2 rbtree approach, I do not see much difference in the
> hackbench results with varying latency nice value. Tests on v5 did
> yield noticeable improvements for hackbench.
> (https://lore.kernel.org/lkml/cd48ebbb-9724-985f-28e3-e558dea07827@xxxxxxx/)

The 2 rbtree approach is the one that was already used in v5. I just
rerun hackbench tests with latest tip and v6.2-rc7 and I can see large
performance improvement for pipe tests on my system (8 cores system).
Could you try witha larger number of group ? like 64, 128 and 256
groups

>
> o For hackbench + cyclictest and hackbench + schbench, I see the
> expected behavior with different latency nice values.
>
> o There are a few cases with hackbench and hackbench + cyclictest where
> the results are non-monotonic with different latency nice values.
> (Marked with "^").
>
> I'll leave the detailed results below:
>
> On 11/15/2022 10:48 PM, Vincent Guittot wrote:
> > This patchset restarts the work about adding a latency priority to describe
> > the latency tolerance of cfs tasks.
> >
> > Patch [1] is a new one that has been added with v6. It fixes an
> > unfairness for low prio tasks because of wakeup_gran() being bigger
> > than the maximum vruntime credit that a waking task can keep after
> > sleeping.
> >
> > The patches [2-4] have been done by Parth:
> > https://lore.kernel.org/lkml/20200228090755.22829-1-parth@xxxxxxxxxxxxx/
> >
> > I have just rebased and moved the set of latency priority outside the
> > priority update. I have removed the reviewed tag because the patches
> > are 2 years old.
> >
> > This aims to be a generic interface and the following patches is one use
> > of it to improve the scheduling latency of cfs tasks.
> >
> > Patch [5] uses latency nice priority to define a latency offset
> > and then decide if a cfs task can or should preempt the current
> > running task. The patch gives some tests results with cyclictests and
> > hackbench to highlight the benefit of latency priority for short
> > interactive task or long intensive tasks.
> >
> > Patch [6] adds the support of latency nice priority to task group by
> > adding a cpu.latency.nice field. The range is [-20:19] as for setting task
> > latency priority.
> >
> > Patch [7] makes sched_core taking into account the latency offset.
> >
> > Patch [8] adds a rb tree to cover some corner cases where the latency
> > sensitive task (priority < 0) is preempted by high priority task (RT/DL)
> > or fails to preempt them. This patch ensures that tasks will have at least
> > a slice of sched_min_granularity in priority at wakeup.
> >
> > Patch [9] removes useless check after adding a latency rb tree.
> >
> > I have also backported the patchset on a dragonboard RB3 with an android
> > mainline kernel based on v5.18 for a quick test. I have used the
> > TouchLatency app which is part of AOSP and described to be a very good
> > test to highlight jitter and jank frame sources of a system [1].
> > In addition to the app, I have added some short running tasks waking-up
> > regularly (to use the 8 cpus for 4 ms every 37777us) to stress the system
> > without overloading it (and disabling EAS). The 1st results shows that the
> > patchset helps to reduce the missed deadline frames from 5% to less than
> > 0.1% when the cpu.latency.nice of task group are set. I haven't rerun the
> > test with latest version.
> >
> > I have also tested the patchset with the modified version of the alsa
> > latency test that has been shared by Tim. The test quickly xruns with
> > default latency nice priority 0 but is able to run without underuns with
> > a latency -20 and hackbench running simultaneously.
> >
> > While preparing the version 8, I have evaluated the benefit of using an
> > augmented rbtree instead of adding a rbtree for latency sensitive entities,
> > which was a relevant suggestion done by PeterZ. Although the augmented
> > rbtree enables to sort additional information in the tree with a limited
> > overhead, it has more impact on legacy use cases (latency_nice >= 0)
> > because the augmented callbacks are always called to maintain this
> > additional information even when there is no sensitive tasks. In such
> > cases, the dedicated rbtree remains empty and the overhead is reduced to
> > loading a cached null node pointer. Nevertheless, we might want to
> > reconsider the augmented rbtree once the use of negative latency_nice will
> > be more widlely deployed. At now, the different tests that I have done,
> > have not shown improvements with augmented rbtree.
> >
> > Below are some hackbench results:
> > 2 rbtrees augmented rbtree augmented rbtree
> > sorted by vruntime sorted by wakeup_vruntime
> > sched pipe
> > avg 26311,000 25976,667 25839,556
> > stdev 0,15 % 0,28 % 0,24 %
> > vs tip 0,50 % -0,78 % -1,31 %
> > hackbench 1 group
> > avg 1,315 1,344 1,359
> > stdev 0,88 % 1,55 % 1,82 %
> > vs tip -0,47 % -2,68 % -3,87 %
> > hackbench 4 groups
> > avg 1,339 1,365 1,367
> > stdev 2,39 % 2,26 % 3,58 %
> > vs tip -0,08 % -2,01 % -2,22 %
> > hackbench 8 groups
> > avg 1,233 1,286 1,301
> > stdev 0,74 % 1,09 % 1,52 %
> > vs tip 0,29 % -4,05 % -5,27 %
> > hackbench 16 groups
> > avg 1,268 1,313 1,319
> > stdev 0,85 % 1,60 % 0,68 %
> > vs tip -0,02 % -3,56 % -4,01 %
>
> Following are the results from running standard benchmarks on a
> dual socket Zen3 (2 x 64C/128T) machine configured in different
> NPS modes.
>
> NPS Modes are used to logically divide single socket into
> multiple NUMA region.
> Following is the NUMA configuration for each NPS mode on the system:
>
> NPS1: Each socket is a NUMA node.
> Total 2 NUMA nodes in the dual socket machine.
>
> Node 0: 0-63, 128-191
> Node 1: 64-127, 192-255
>
> NPS2: Each socket is further logically divided into 2 NUMA regions.
> Total 4 NUMA nodes exist over 2 socket.
>
> Node 0: 0-31, 128-159
> Node 1: 32-63, 160-191
> Node 2: 64-95, 192-223
> Node 3: 96-127, 223-255
>
> NPS4: Each socket is logically divided into 4 NUMA regions.
> Total 8 NUMA nodes exist over 2 socket.
>
> Node 0: 0-15, 128-143
> Node 1: 16-31, 144-159
> Node 2: 32-47, 160-175
> Node 3: 48-63, 176-191
> Node 4: 64-79, 192-207
> Node 5: 80-95, 208-223
> Node 6: 96-111, 223-231
> Node 7: 112-127, 232-255
>
> Benchmark Results:
>
> Kernel versions:
> - tip: 6.1.0 tip sched/core
> - latency_nice: 6.1.0 tip sched/core + this series
>
> When we started testing, the tip was at:
> commit d6962c4fe8f9 "sched: Clear ttwu_pending after enqueue_task()"
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ hackbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> Test: tip latency_nice
> 1-groups: 4.25 (0.00 pct) 4.14 (2.58 pct)
> 2-groups: 4.95 (0.00 pct) 4.92 (0.60 pct)
> 4-groups: 5.19 (0.00 pct) 5.18 (0.19 pct)
> 8-groups: 5.45 (0.00 pct) 5.44 (0.18 pct)
> 16-groups: 7.33 (0.00 pct) 7.32 (0.13 pct)
>
> NPS2
>
> Test: tip latency_nice
> 1-groups: 4.09 (0.00 pct) 4.08 (0.24 pct)
> 2-groups: 4.68 (0.00 pct) 4.72 (-0.85 pct)
> 4-groups: 5.05 (0.00 pct) 4.97 (1.58 pct)
> 8-groups: 5.37 (0.00 pct) 5.34 (0.55 pct)
> 16-groups: 6.69 (0.00 pct) 6.74 (-0.74 pct)
>
> NPS4
>
> Test: tip latency_nice
> 1-groups: 4.28 (0.00 pct) 4.35 (-1.63 pct)
> 2-groups: 4.78 (0.00 pct) 4.76 (0.41 pct)
> 4-groups: 5.11 (0.00 pct) 5.06 (0.97 pct)
> 8-groups: 5.48 (0.00 pct) 5.40 (1.45 pct)
> 16-groups: 7.07 (0.00 pct) 6.70 (5.23 pct)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ schbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> #workers: tip latency_nice
> 1: 31.00 (0.00 pct) 32.00 (-3.22 pct)
> 2: 33.00 (0.00 pct) 34.00 (-3.03 pct)
> 4: 39.00 (0.00 pct) 38.00 (2.56 pct)
> 8: 45.00 (0.00 pct) 46.00 (-2.22 pct)
> 16: 61.00 (0.00 pct) 66.00 (-8.19 pct)
> 32: 108.00 (0.00 pct) 110.00 (-1.85 pct)
> 64: 212.00 (0.00 pct) 216.00 (-1.88 pct)
> 128: 475.00 (0.00 pct) 701.00 (-47.57 pct) *
> 128: 429.00 (0.00 pct) 441.00 (-2.79 pct) [Verification Run]
> 256: 44736.00 (0.00 pct) 45632.00 (-2.00 pct)
> 512: 77184.00 (0.00 pct) 78720.00 (-1.99 pct)
>
> NPS2
>
> #workers: tip latency_nice
> 1: 28.00 (0.00 pct) 33.00 (-17.85 pct)
> 2: 34.00 (0.00 pct) 31.00 (8.82 pct)
> 4: 36.00 (0.00 pct) 36.00 (0.00 pct)
> 8: 51.00 (0.00 pct) 49.00 (3.92 pct)
> 16: 68.00 (0.00 pct) 64.00 (5.88 pct)
> 32: 113.00 (0.00 pct) 115.00 (-1.76 pct)
> 64: 221.00 (0.00 pct) 219.00 (0.90 pct)
> 128: 553.00 (0.00 pct) 531.00 (3.97 pct)
> 256: 43840.00 (0.00 pct) 48192.00 (-9.92 pct) *
> 256: 50427.00 (0.00 pct) 48351.00 (4.11 pct) [Verification Run]
> 512: 76672.00 (0.00 pct) 81024.00 (-5.67 pct)
>
> NPS4
>
> #workers: tip latency_nice
> 1: 33.00 (0.00 pct) 28.00 (15.15 pct)
> 2: 29.00 (0.00 pct) 34.00 (-17.24 pct)
> 4: 39.00 (0.00 pct) 36.00 (7.69 pct)
> 8: 58.00 (0.00 pct) 55.00 (5.17 pct)
> 16: 66.00 (0.00 pct) 67.00 (-1.51 pct)
> 32: 112.00 (0.00 pct) 116.00 (-3.57 pct)
> 64: 215.00 (0.00 pct) 213.00 (0.93 pct)
> 128: 689.00 (0.00 pct) 571.00 (17.12 pct)
> 256: 45120.00 (0.00 pct) 46400.00 (-2.83 pct)
> 512: 77440.00 (0.00 pct) 76160.00 (1.65 pct)
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ tbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> Clients: tip latency_nice
> 1 581.75 (0.00 pct) 586.52 (0.81 pct)
> 2 1145.75 (0.00 pct) 1160.69 (1.30 pct)
> 4 2127.94 (0.00 pct) 2141.49 (0.63 pct)
> 8 3838.27 (0.00 pct) 3721.10 (-3.05 pct)
> 16 6272.71 (0.00 pct) 6539.82 (4.25 pct)
> 32 11400.12 (0.00 pct) 12079.49 (5.95 pct)
> 64 21605.96 (0.00 pct) 22908.83 (6.03 pct)
> 128 30715.43 (0.00 pct) 31736.95 (3.32 pct)
> 256 55580.78 (0.00 pct) 54786.29 (-1.42 pct)
> 512 56528.79 (0.00 pct) 56453.54 (-0.13 pct)
> 1024 56520.40 (0.00 pct) 56369.93 (-0.26 pct)
>
> NPS2
>
> Clients: tip latency_nice
> 1 584.13 (0.00 pct) 582.53 (-0.27 pct)
> 2 1153.63 (0.00 pct) 1140.27 (-1.15 pct)
> 4 2212.89 (0.00 pct) 2159.49 (-2.41 pct)
> 8 3871.35 (0.00 pct) 3840.77 (-0.78 pct)
> 16 6216.72 (0.00 pct) 6437.98 (3.55 pct)
> 32 11766.98 (0.00 pct) 11663.53 (-0.87 pct)
> 64 22000.93 (0.00 pct) 21882.88 (-0.53 pct)
> 128 31520.53 (0.00 pct) 31147.05 (-1.18 pct)
> 256 51420.11 (0.00 pct) 55216.39 (7.38 pct)
> 512 53935.90 (0.00 pct) 55407.60 (2.72 pct)
> 1024 55239.73 (0.00 pct) 55997.25 (1.37 pct)
>
> NPS4
>
> Clients: tip latency_nice
> 1 585.83 (0.00 pct) 578.17 (-1.30 pct)
> 2 1141.59 (0.00 pct) 1131.14 (-0.91 pct)
> 4 2174.79 (0.00 pct) 2086.52 (-4.05 pct)
> 8 3887.56 (0.00 pct) 3778.47 (-2.80 pct)
> 16 6441.59 (0.00 pct) 6364.30 (-1.19 pct)
> 32 12133.60 (0.00 pct) 11465.26 (-5.50 pct) *
> 32 11677.16 (0.00 pct) 12662.09 (8.43 pct) [Verification Run]
> 64 21769.15 (0.00 pct) 19488.45 (-10.47 pct) *
> 64 20305.64 (0.00 pct) 21002.90 (3.43 pct) [Verification Run]
> 128 31396.31 (0.00 pct) 31177.37 (-0.69 pct)
> 256 52792.39 (0.00 pct) 52890.41 (0.18 pct)
> 512 55315.44 (0.00 pct) 53572.65 (-3.15 pct)
> 1024 52150.27 (0.00 pct) 54079.48 (3.69 pct)
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ stream - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> 10 Runs:
>
> Test: tip latency_nice
> Copy: 307827.79 (0.00 pct) 330524.48 (7.37 pct)
> Scale: 208872.28 (0.00 pct) 215002.06 (2.93 pct)
> Add: 239404.64 (0.00 pct) 230334.74 (-3.78 pct)
> Triad: 247258.30 (0.00 pct) 238505.06 (-3.54 pct)
>
> 100 Runs:
>
> Test: tip latency_nice
> Copy: 317217.55 (0.00 pct) 314467.62 (-0.86 pct)
> Scale: 208740.82 (0.00 pct) 210452.00 (0.81 pct)
> Add: 240550.63 (0.00 pct) 232376.03 (-3.39 pct)
> Triad: 249594.21 (0.00 pct) 242460.83 (-2.85 pct)
>
> NPS2
>
> 10 Runs:
>
> Test: tip latency_nice
> Copy: 340877.18 (0.00 pct) 339441.26 (-0.42 pct)
> Scale: 217318.16 (0.00 pct) 216905.49 (-0.18 pct)
> Add: 259078.93 (0.00 pct) 261686.67 (1.00 pct)
> Triad: 274500.78 (0.00 pct) 271699.83 (-1.02 pct)
>
> 100 Runs:
>
> Test: tip latency_nice
> Copy: 341860.73 (0.00 pct) 335826.36 (-1.76 pct)
> Scale: 218043.00 (0.00 pct) 216451.84 (-0.72 pct)
> Add: 253698.22 (0.00 pct) 257317.72 (1.42 pct)
> Triad: 265011.84 (0.00 pct) 267769.93 (1.04 pct)
>
> NPS4
>
> 10 Runs:
>
> Test: tip latency_nice
> Copy: 340877.18 (0.00 pct) 365921.51 (7.34 pct)
> Scale: 217318.16 (0.00 pct) 239408.65 (10.16 pct)
> Add: 259078.93 (0.00 pct) 264859.31 (2.23 pct)
> Triad: 274500.78 (0.00 pct) 281543.65 (2.56 pct)
>
> 100 Runs:
>
> Test: tip latency_nice
> Copy: 341860.73 (0.00 pct) 359255.16 (5.08 pct)
> Scale: 218043.00 (0.00 pct) 238154.15 (9.22 pct)
> Add: 253698.22 (0.00 pct) 269223.49 (6.11 pct)
> Triad: 265011.84 (0.00 pct) 278473.85 (5.07 pct)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ ycsb-mongodb - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> o NPS1
>
> tip: 131244.00 (var: 2.67%)
> latency_nice: 132118.00 (var: 3.62%) (+0.66%)
>
> o NPS2
>
> tip: 127663.33 (var: 2.08%)
> latency_nice: 129148.00 (var: 4.29%) (+1.16%)
>
> o NPS4
>
> tip: 133295.00 (var: 1.58%)
> latency_nice: 129975.33 (var: 1.10%) (-2.49%)
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Unixbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> o NPS1
>
> Test Metric Parallelism tip latency_nice
> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 48929419.48 ( 0.00%) 49137039.06 ( 0.42%)
> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6275526953.25 ( 0.00%) 6265580479.15 ( -0.16%)
> unixbench-syscall Amean unixbench-syscall-1 2994319.73 ( 0.00%) 3008596.83 * -0.48%*
> unixbench-syscall Amean unixbench-syscall-512 7349715.87 ( 0.00%) 7420994.50 * -0.97%*
> unixbench-pipe Hmean unixbench-pipe-1 2830206.03 ( 0.00%) 2854405.99 * 0.86%*
> unixbench-pipe Hmean unixbench-pipe-512 326207828.01 ( 0.00%) 328997804.52 * 0.86%*
> unixbench-spawn Hmean unixbench-spawn-1 6394.21 ( 0.00%) 6367.75 ( -0.41%)
> unixbench-spawn Hmean unixbench-spawn-512 72700.64 ( 0.00%) 71454.19 * -1.71%*
> unixbench-execl Hmean unixbench-execl-1 4723.61 ( 0.00%) 4750.59 ( 0.57%)
> unixbench-execl Hmean unixbench-execl-512 11212.05 ( 0.00%) 11262.13 ( 0.45%)
>
> o NPS2
>
> Test Metric Parallelism tip latency_nice
> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49271512.85 ( 0.00%) 49245260.43 ( -0.05%)
> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6267992483.03 ( 0.00%) 6264951100.67 ( -0.05%)
> unixbench-syscall Amean unixbench-syscall-1 2995885.93 ( 0.00%) 3005975.10 * -0.34%*
> unixbench-syscall Amean unixbench-syscall-512 7388865.77 ( 0.00%) 7276275.63 * 1.52%*
> unixbench-pipe Hmean unixbench-pipe-1 2828971.95 ( 0.00%) 2856578.72 * 0.98%*
> unixbench-pipe Hmean unixbench-pipe-512 326225385.37 ( 0.00%) 328941270.81 * 0.83%*
> unixbench-spawn Hmean unixbench-spawn-1 6958.71 ( 0.00%) 6954.21 ( -0.06%)
> unixbench-spawn Hmean unixbench-spawn-512 85443.56 ( 0.00%) 70536.42 * -17.45%* (0.67% vs 0.93% - CoEff var)

I don't expect any perf improvement or regression when the latency
nice is not changed

> unixbench-execl Hmean unixbench-execl-1 4767.99 ( 0.00%) 4752.63 * -0.32%*
> unixbench-execl Hmean unixbench-execl-512 11250.72 ( 0.00%) 11320.97 ( 0.62%)
>
> o NPS4
>
> Test Metric Parallelism tip latency_nice
> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49041932.68 ( 0.00%) 49156671.05 ( 0.23%)
> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6286981589.85 ( 0.00%) 6285248711.40 ( -0.03%)
> unixbench-syscall Amean unixbench-syscall-1 2992405.60 ( 0.00%) 3008933.03 * -0.55%*
> unixbench-syscall Amean unixbench-syscall-512 7971789.70 ( 0.00%) 7814622.23 * 1.97%*
> unixbench-pipe Hmean unixbench-pipe-1 2822892.54 ( 0.00%) 2852615.11 * 1.05%*
> unixbench-pipe Hmean unixbench-pipe-512 326408309.83 ( 0.00%) 329617202.56 * 0.98%*
> unixbench-spawn Hmean unixbench-spawn-1 7685.31 ( 0.00%) 7243.54 ( -5.75%)
> unixbench-spawn Hmean unixbench-spawn-512 72245.56 ( 0.00%) 77000.81 * 6.58%*
> unixbench-execl Hmean unixbench-execl-1 4761.42 ( 0.00%) 4733.12 * -0.59%*
> unixbench-execl Hmean unixbench-execl-512 11533.53 ( 0.00%) 11660.17 ( 1.10%)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Hackbench - Various Latency Nice Values ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> o 100000 loops
>
> - pipe (process)
>
> Test: LN: 0 LN: 19 LN: -20
> 1-groups: 3.91 (0.00 pct) 3.91 (0.00 pct) 3.81 (2.55 pct)
> 2-groups: 4.48 (0.00 pct) 4.52 (-0.89 pct) 4.53 (-1.11 pct)
> 4-groups: 4.83 (0.00 pct) 4.83 (0.00 pct) 4.87 (-0.82 pct)
> 8-groups: 5.09 (0.00 pct) 5.00 (1.76 pct) 5.07 (0.39 pct)
> 16-groups: 6.92 (0.00 pct) 6.79 (1.87 pct) 6.96 (-0.57 pct)
>
> - pipe (thread)
>
> 1-groups: 4.13 (0.00 pct) 4.08 (1.21 pct) 4.11 (0.48 pct)
> 2-groups: 4.78 (0.00 pct) 4.90 (-2.51 pct) 4.79 (-0.20 pct)
> 4-groups: 5.12 (0.00 pct) 5.08 (0.78 pct) 5.16 (-0.78 pct)
> 8-groups: 5.31 (0.00 pct) 5.28 (0.56 pct) 5.33 (-0.37 pct)
> 16-groups: 7.34 (0.00 pct) 7.27 (0.95 pct) 7.33 (0.13 pct)
>
> - socket (process)
>
> Test: LN: 0 LN: 19 LN: -20
> 1-groups: 6.61 (0.00 pct) 6.38 (3.47 pct) 6.54 (1.05 pct)
> 2-groups: 6.59 (0.00 pct) 6.67 (-1.21 pct) 6.11 (7.28 pct)
> 4-groups: 6.77 (0.00 pct) 6.78 (-0.14 pct) 6.79 (-0.29 pct)
> 8-groups: 8.29 (0.00 pct) 8.39 (-1.20 pct) 8.36 (-0.84 pct)
> 16-groups: 12.21 (0.00 pct) 12.03 (1.47 pct) 12.35 (-1.14 pct)
>
> - socket (thread)
>
> Test: LN: 0 LN: 19 LN: -20
> 1-groups: 6.50 (0.00 pct) 5.99 (7.84 pct) 6.02 (7.38 pct) ^
> 2-groups: 6.07 (0.00 pct) 6.20 (-2.14 pct) 6.23 (-2.63 pct)
> 4-groups: 6.61 (0.00 pct) 6.64 (-0.45 pct) 6.63 (-0.30 pct)
> 8-groups: 8.87 (0.00 pct) 8.67 (2.25 pct) 8.78 (1.01 pct)
> 16-groups: 12.63 (0.00 pct) 12.54 (0.71 pct) 12.59 (0.31 pct)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Hackbench + Cyclictest - Various Latency Nice Values ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> - Hackbench: 32 Groups
>
> perf bench sched messaging -p -l 100000 -g 32&
> cyclictest --policy other -D 5 -q -n -h 2000
>
> o NPS1
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench | Cyclictest LN = 19 | Cyclictest LN = 0 | Cyclictest LN = -20 |
> | LN |------------------------------|-------------------------------|---------------------------|
> | | Min | Avg | Max | Min | Avg | Max | Min | Avg | Max |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19 | 52.00 | 71.00 | 5191.00 | 29.00 | 68.00 | 4477.00 | 53.00 | 60.00 | 753.00 |
> | 0 | 53.00 | 150.00 | 7300.00 | 53.00 | 105.00 | 7730.00 | 53.00 | 64.00 | 2067.00 |
> | -20 | 33.00 | 159.00 | 98492.00 | 53.00 | 149.00 | 9608.00 | 53.00 | 91.00 | 5349.00 |
> ----------------------------------------------------------------------------------------------------------
>
> o NPS4
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench | Cyclictest LN = 19 | Cyclictest LN = 0 | Cyclictest LN = -20 |
> | LN |------------------------------|-------------------------------|---------------------------|
> | | Min | Avg | Max | Min | Avg | Max | Min | Avg | Max |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19 | 53.00 | 84.00 | 4790.00 | 53.00 | 72.00 | 3456.00 | 53.00 | 58.00 | 1271.00 |
> | 0 | 53.00 | 99.00 | 5494.00 | 52.00 | 74.00 | 5813.00 | 53.00 | 59.00 | 1004.00 |
> | -20 | 45.00 | 84.00 | 3592.00 | 53.00 | 91.00 | 15222.00 | 53.00 | 74.00 | 5232.00 | ^
> ----------------------------------------------------------------------------------------------------------
>
> - Hackbench: 128 Groups
>
> perf bench sched messaging -p -l 500000 -g 128&
> cyclictest --policy other -D 5 -q -n -h 2000
>
> o NPS1
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench | Cyclictest LN = 19 | Cyclictest LN = 0 | Cyclictest LN = -20 |
> | LN |------------------------------|-------------------------------|---------------------------|
> | | Min | Avg | Max | Min | Avg | Max | Min | Avg | Max |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19 | 53.00 | 274.00 | 11294.00 | 33.00 | 130.00 | 20071.00 | 53.00 | 56.00 | 244.00 | ^
> | 0 | 53.00 | 125.00 | 10014.00 | 53.00 | 113.00 | 15857.00 | 53.00 | 57.00 | 250.00 |
> | -20 | 53.00 | 187.00 | 49565.00 | 53.00 | 230.00 | 73353.00 | 53.00 | 118.00| 8816.00 |
> ----------------------------------------------------------------------------------------------------------
>
> o NPS4
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench | Cyclictest LN = 19 | Cyclictest LN = 0 | Cyclictest LN = -20 |
> | LN |------------------------------|-------------------------------|---------------------------|
> | | Min | Avg | Max | Min | Avg | Max | Min | Avg | Max |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19 | 53.00 | 271.00 | 11411.00 | 53.00 | 82.00 | 5486.00 | 25.00 | 57.00 | 1256.00 |
> | 0 | 53.00 | 148.00 | 8374.00 | 52.00 | 109.00 | 11074.00 | 52.00 | 59.00 | 1068.00 |
> | -20 | 53.00 | 202.00 | 52537.00 | 53.00 | 205.00 | 22265.00 | 52.00 | 87.00 | 14151.00 |
> ----------------------------------------------------------------------------------------------------------
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Hackbench + schbench - Various Latency Nice Values ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> perf bench sched messaging -p -l 400000 -g 128
> schbench -m 2 -t 1 -s 30
>
> o NPS1
>
> -------------------------------------------------------------------------------------------------
> | Hackbench | schbench LN = 19 | schbench LN = 0 | schbench LN = -20 |
> | LN |----------------------------|---------------------------|--------------------------|
> | | 90th | 95th | 99th | 90th | 95th | 99th | 90th | 95th | 99th |
> |-----------|--------|--------|----------|--------|--------|---------|--------|--------|--------|
> | 19 | 38 | 131 | 1458 | 46 | 151 | 2636 | 11 | 19 | 410 | ^
> | 0 | 45 | 98 | 1758 | 25 | 50 | 1670 | 16 | 30 | 1042 |
> | -20 | 47 | 348 | 29280 | 40 | 109 | 16144 | 35 | 63 | 9104 |
> -------------------------------------------------------------------------------------------------
>
> o NPS4
>
> -------------------------------------------------------------------------------------------------
> | Hackbench | schbench LN = 19 | schbench LN = 0 | schbench LN = -20 |
> | LN |----------------------------|---------------------------|--------------------------|
> | | 90th | 95th | 99th | 90th | 95th | 99th | 90th | 95th | 99th |
> |-----------|--------|--------|----------|--------|--------|---------|--------|--------|--------|
> | 19 | 19 | 60 | 1886 | 17 | 29 | 621 | 10 | 18 | 227 |
> | 0 | 51 | 141 | 8120 | 37 | 78 | 8880 | 33 | 55 | 474 | ^
> | -20 | 48 | 1494 | 27296 | 51 | 469 | 40384 | 31 | 64 | 4092 | ^
> -------------------------------------------------------------------------------------------------
>
> ^ Note: There are cases where the Max, 99th percentile latency is
> non-monotonic but I've also seen a good amount of run to run variation
> there with a single bad sample polluting the results. In such cases,
> the averages are more representative.
>
> >
> > [1] https://source.android.com/docs/core/debug/eval_perf#touchlatency
> >
> > [..snip..]
> >
>
> Apart from couple of anomalies, latency nice reduces wait time, especially
> when the system is heavily loaded. If there is any data, or any specific
> workload you would like me to run on the test system, please do let me know.
> Meanwhile, I'll try to get some numbers for larger workloads like SpecJBB
> that did see improvements with latency nice on v5.

Thanks for your tests

Vincent

> --
> Thanks and Regards,
> Prateek