Re: [patch v11 00/13] extensible prctl task isolation interface and vmstat sync

From: Marcelo Tosatti
Date: Wed Feb 23 2022 - 12:48:09 EST


Hi Oscar,

On Sat, Feb 19, 2022 at 04:02:10PM +0800, Oscar Shiang wrote:
> Hi Marcelo,
>
> I tried to apply your patches to kernel v5.15.18-rt28 and measured
> the latencies through oslat [1].
>
> It turns out that the peak latency (around 100us) can drop to about 90us.
> The result is impressive since I only changed the guest's kernel
> instead of installing the patched kernel to both host and guest.
>
> However, I am still curious about:
> 1) Why did I catch a bigger maximum latency in almost each of the
> results of applying task isolation patches? Or does it come from
> other reasons?

There are a number of things that need to be done in order to have an
"well enough" isolated CPU so you can measure latency reliably:

* Boot a kernel with isolated CPU (or better, use realtime-virtual-host profile of
https://github.com/redhat-performance/tuned.git, which does a bunch of
other things to avoid interruptions to isolated CPUs).
* Apply the userspace patches at https://people.redhat.com/~mtosatti/task-isol-v6-userspace-patches/
to util-linux and rt-tests.

Run oslat with chisol:

chisol -q vmstat_sync -I conf oslat -c ...

Where chisol is from patched util-linux and oslat from patched rt-tests.

If you had "-f 1" (FIFO priority), on oslat, then the vmstat work would be hung.

Are you doing those things?

> 2) Why did we only get a 10us improvement on quiescing vmstat?

If you did not have FIFO priority on oslat, then other daemons
could be interrupting it, so better make sure the 10us improvement
you see is due to vmstat_flush workqueue work not executing anymore.

The testcase i use is:

Stock kernel:

terminal 1:
# oslat -f 1 -c X ...

terminal 2:
# echo 1 > /proc/sys/vm/stat_refresh
(hang)

Patched kernel:

terminal 1:
# chisol -q vmstat_sync -I conf oslat -f 1 -c X ...

terminal 2:
# echo 1 > /proc/sys/vm/stat_refresh
#

> [1]: The result and the test scripts I used can be found at
> https://gist.github.com/OscarShiang/8b530a00f472fd1c39f5979ee601516d#testing-task-isolation-via-oslat

OK, you seem to be doing everything necessary for chisol
to work. Does /proc/pid/task_isolation of the oslat worker thread
(note its not the same pid as the main oslat thread) show "vmstat"
configured and activated for quiesce?

However 100us is really high. You should be able to get < 10us with
realtime-virtual-host (i see 4us on an idle system).

The answer might be: because 10us is what it takes to execute
vmstat_worker on the isolated CPU (you can verify with tracepoints).

That time depends on the number of per-CPU vmstat variables that need flushing,
i suppose...