Re: [PATCH v2 2/3] vmstat: skip periodic vmstat update for nohz full CPUs

From: Michal Hocko
Date: Mon Jun 05 2023 - 11:55:58 EST


On Mon 05-06-23 11:53:56, Marcelo Tosatti wrote:
> On Mon, Jun 05, 2023 at 09:55:57AM +0200, Michal Hocko wrote:
> > On Fri 02-06-23 15:57:59, Marcelo Tosatti wrote:
> > > The interruption caused by vmstat_update is undesirable
> > > for certain aplications:
> > >
> > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
> > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ...
> > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
> > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
> > >
> > > The example above shows an additional 7us for the
> > >
> > > oslat -> kworker -> oslat
> > >
> > > switches. In the case of a virtualized CPU, and the vmstat_update
> > > interruption in the host (of a qemu-kvm vcpu), the latency penalty
> > > observed in the guest is higher than 50us, violating the acceptable
> > > latency threshold.
> >
> > I personally find the above problem description insufficient. I have
> > asked several times and only got piece by piece information each time.
> > Maybe there is a reason to be secretive but it would be great to get at
> > least some basic expectations described and what they are based on.
>
> There is no reason to be secretive.
>
> >
> > E.g. workloads are running on isolated cpus with nohz full mode to
> > shield off any kernel interruption. Yet there are operations that update
> > counters (like mlock, but not mlock alone) that update per cpu counters
> > that will eventually get flushed and that will cause some interference.
> > Now the host/guest transition and intereference. How that happens when
> > the guest is running on an isolated and dedicated cpu?
>
> Follows the updated changelog. Does it contain the information
> requested ?
>
> ----
>
> Performance details for the kworker interruption:
>
> With workloads that are running on isolated cpus with nohz full mode to
> shield off any kernel interruption. For example, a VM running a
> time sensitive application with a 50us maximum acceptable interruption
> (use case: soft PLC).
>
> oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
> oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ...
> oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
> kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
>
> The example above shows an additional 7us for the
>
> oslat -> kworker -> oslat
>
> switches. In the case of a virtualized CPU, and the vmstat_update
> interruption in the host (of a qemu-kvm vcpu), the latency penalty
> observed in the guest is higher than 50us, violating the acceptable
> latency threshold.
>
> The isolated vCPU can perform operations that modify per-CPU page counters,
> for example to complete I/O operations:
>
> CPU 11/KVM-9540 [001] dNh1. 2314.248584: mod_zone_page_state <-__folio_end_writeback
> CPU 11/KVM-9540 [001] dNh1. 2314.248585: <stack trace>
> => 0xffffffffc042b083
> => mod_zone_page_state
> => __folio_end_writeback
> => folio_end_writeback
> => iomap_finish_ioend
> => blk_mq_end_request_batch
> => nvme_irq
> => __handle_irq_event_percpu
> => handle_irq_event
> => handle_edge_irq
> => __common_interrupt
> => common_interrupt
> => asm_common_interrupt
> => vmx_do_interrupt_nmi_irqoff
> => vmx_handle_exit_irqoff
> => vcpu_enter_guest
> => vcpu_run
> => kvm_arch_vcpu_ioctl_run
> => kvm_vcpu_ioctl
> => __x64_sys_ioctl
> => do_syscall_64
> => entry_SYSCALL_64_after_hwframe

OK, this is really useful. It is just not really clear whether the IO
triggered here is from the guest or it a host activity.

overall this is much better!
--
Michal Hocko
SUSE Labs