Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value

From: Cristian Marussi
Date: Mon Feb 07 2022 - 13:55:47 EST


On Thu, Feb 03, 2022 at 06:32:29AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 03, 2022 at 10:51:19AM +0000, Cristian Marussi wrote:
> > On Tue, Feb 01, 2022 at 01:27:38PM -0500, Michael S. Tsirkin wrote:
> > > Looks correct, thanks. Some minor comments below:
> > >
> >
> > Hi Michael,
> >
> > thanks for the feedback.
> >

Hi Michael,


[snip]

> > > > @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> > > > ret = vq->split.desc_state[i].data;
> > > > detach_buf_split(vq, i, ctx);
> > > > vq->last_used_idx++;
> > > > + if (unlikely(!vq->last_used_idx))
> > > > + vq->wraps++;
> > > > /* If we expect an interrupt for the next entry, tell host
> > > > * by writing event index and flush out the write before
> > > > * the read in the next get_buf call. */
> > >
> > > So most drivers don't call virtqueue_poll.
> > > Concerned about the overhead here: another option is
> > > with a flag that will have to be set whenever a driver
> > > wants to use virtqueue_poll.
> >
> > Do you mean a compile time flag/Kconfig to just remove the possible
> > overhead instructions as a whole when not needed by the driver ?
> >
> > Or do you mean at runtime since checking the flag evry time should be
> > less costly than checking the wrpas each time AND counting when it
> > happens ?
>
> The later.
>
> > > Could you pls do a quick perf test e.g. using tools/virtio/
> > > to see what's faster?
> >
> > Yes I'll do, thanks for the hint, I have some compilation issues in
> > tools/virtio due to my additions (missing mirrored hehaders) or to some
> > recently added stuff (missing drv_to_virtio & friends for
> > suppressed_used_validation thing)...anyway I fixed those now and I'll
> > post related tools/virtio patches with next iteration.
> >
> > Anyway, do you mean perf data about vringh_test and virtio_test/vhost
> > right ? (ringtest/ excluded 'cause does not use any API is just
> > prototyping)
>
> can be either or both, virtio_test/vhost is a bit easier to use.
>

After a number of round tests with tools/virtio/virtio_test, below you can find
the most reliable results I had.

Using the flag as you suggested as in:

if (unlikely(vq->use_wrap_counter))
vq->wraps += !vq->last_used_idx;


seems definitely better as per the result in virtio_test_flag_V1.

The last run with virtio_test_flag_V1 --wrap-counters is the case in which
the test code request to set the use_wrap_counter flag to true at start.

Since such flag is nothig spec related, I added a new EXPORT API

virtqueue_use_wrap_counter()

to allow a driver willing to use polling to ask for wrap counting at
probe time.

Since all of this required a few build-fixes in tool/virtio both before
and after my additions, I'm going to post the proposed change in a new series
independent from this SCMI virtio series (and add later the call to
virtqueue_use_wrap_counter() to the SCMI virtio driver.

How does this sound ?

Thanks,
Cristian

---
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_unpatched
Performance counter stats for 'nice -n -20 /root/virtio_test_unpatched' (25 runs):

6100.81 msec task-clock # 1.002 CPUs utilized ( +- 0.16% )
19 context-switches # 3.126 /sec ( +- 2.08% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 22.049 /sec ( +- 0.03% )
18249525657 cycles # 3.003 GHz ( +- 0.07% )
45583397473 instructions # 2.52 insn per cycle ( +- 0.09% )
14009712668 branches # 2.305 G/sec ( +- 0.09% )
10075872 branch-misses # 0.07% of all branches ( +- 0.83% )

6.0908 +- 0.0107 seconds time elapsed ( +- 0.18% )


+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_wraps_noflag
Performance counter stats for 'nice -n -20 /root/virtio_test_wraps_noflag' (25 runs):

7982.99 msec task-clock # 0.996 CPUs utilized ( +- 0.14% )
16 context-switches # 1.999 /sec ( +- 2.56% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 16.744 /sec ( +- 0.03% )
23691074946 cycles # 2.960 GHz ( +- 0.06% )
68176350359 instructions # 2.88 insn per cycle ( +- 0.09% )
21037768642 branches # 2.629 G/sec ( +- 0.09% )
9083084 branch-misses # 0.04% of all branches ( +- 0.74% )

8.0125 +- 0.0114 seconds time elapsed ( +- 0.14% )


+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1
Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1' (25 runs):

6182.21 msec task-clock # 1.007 CPUs utilized ( +- 0.25% )
19 context-switches # 3.104 /sec ( +- 1.68% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 21.889 /sec ( +- 0.03% )
18142274957 cycles # 2.963 GHz ( +- 0.13% )
48973010013 instructions # 2.71 insn per cycle ( +- 0.18% )
15064825126 branches # 2.461 G/sec ( +- 0.18% )
8697800 branch-misses # 0.06% of all branches ( +- 0.89% )

6.1382 +- 0.0172 seconds time elapsed ( +- 0.28% )

+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1 --wrap-counters
Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1 --wrap-counters' (25 runs):

6051.58 msec task-clock # 0.984 CPUs utilized ( +- 0.22% )
21 context-switches # 3.424 /sec ( +- 1.25% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 21.846 /sec ( +- 0.03% )
17928356478 cycles # 2.923 GHz ( +- 0.11% )
48147192304 instructions # 2.67 insn per cycle ( +- 0.14% )
14808798588 branches # 2.414 G/sec ( +- 0.15% )
9108899 branch-misses # 0.06% of all branches ( +- 1.22% )

6.1525 +- 0.0155 seconds time elapsed ( +- 0.25% )