Re: [PATCH v2] srcu: Reduce blocking agressiveness of expedited grace periods further

From: Neeraj Upadhyay
Date: Thu Jun 30 2022 - 05:29:01 EST




On 6/30/2022 2:56 PM, Marc Zyngier wrote:
On Thu, 30 Jun 2022 05:12:01 +0100,
Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx> wrote:

Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited
grace periods") highlights a problem where aggressively blocking
SRCU expedited grace periods, as was introduced in commit
282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
from consuming CPU"), introduces ~2 minutes delay to the overall
~3.5 minutes boot time, when starting VMs with "-bios QEMU_EFI.fd"
cmdline on qemu, which results in very high rate of memslots
add/remove, which causes > ~6000 synchronize_srcu() calls for
kvm->srcu SRCU instance.

Below table captures the experiments done by Zhangfei Gao and Shameer
to measure the boottime impact with various values of non-sleeping
per phase counts, with HZ_250 and preemption enabled:

+──────────────────────────+────────────────+
| SRCU_MAX_NODELAY_PHASE | Boot time (s) |
+──────────────────────────+────────────────+
| 100 | 30.053 |
| 150 | 25.151 |
| 200 | 20.704 |
| 250 | 15.748 |
| 500 | 11.401 |
| 1000 | 11.443 |
| 10000 | 11.258 |
| 1000000 | 11.154 |
+──────────────────────────+────────────────+

Analysis on the experiment results showed improved boot time
with non blocking delays close to one jiffy duration. This
was also seen when number of per-phase iterations were scaled
to one jiffy.

So, this change scales per-grace-period phase number of non-sleeping
polls, such that, non-sleeping polls are done for one jiffy. In addition
to this, srcu_get_delay() call in srcu_gp_end(), which is used to calculate
the delay used for scheduling callbacks, is replaced with the check for
expedited grace period. This is done, to schedule cbs for completed expedited
grace periods immediately, which results in improved boot time seen in
experiments.

In addition to the changes to default per phase delays, this change
adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
srcutree.srcu_max_nodelay_phase, srcutree.srcu_retry_check_delay.
This allows users to configure the srcu grace period scanning delays,
depending on their system configuration requirements.

Signed-off-by: Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx>
Tested-by: Marc Zyngier <maz@xxxxxxxxxx>
---

Change in v2:

- Change srcu_max_nodelay default value to consider phase delay
iterations
- Apply Pauls' feedback
- Add Marc's Tested-by

I gave this a go on the same platform as v1, and the result is
actually much better as I didn't have to add any extra command-line
option to get to a reasonable result (41s). I think we have a winner.


Thank you for testing it!


Thanks
Neeraj

Thanks again,

M.