Re: [GIT PULL] virtio: fatures, fixes

From: Andres Freund
Date: Mon Aug 15 2022 - 03:02:13 EST


Hi,

On 2022-08-14 12:40:31 -0700, Andres Freund wrote:
> On 2022-08-14 04:59:48 -0400, Michael S. Tsirkin wrote:
> > On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > > > Is there specific information you'd like from the VM? I just recreated the
> > > > problem and can extract.
> > >
> > > Actually, after reproducing I seem to now hit a likely different issue. I
> > > guess I should have checked exactly the revision I had a problem with earlier,
> > > rather than doing a git pull (up to aea23e7c464b)
> >
> > Looks like there's a generic memory corruption so it crashes
> > in random places.
>
> Either a generic memory corruption, or something wrong with IO.
>
> > Would bisect be possible for you?
>
> I'll give it a go.

Bisect points to

commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7 (refs/bisect/bad)
Author: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
Date: Mon Aug 1 14:38:57 2022 +0800

virtio_net: set the default max ring size by find_vqs()

Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
rx at the same time.

| rx/tx ring size
-------------------------------------------
speed == UNKNOWN or < 10G| 1024
speed < 40G | 4096
speed >= 40G | 8192

Call virtnet_update_settings() once before calling init_vqs() to update
speed.

Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
Acked-by: Jason Wang <jasowang@xxxxxxxxxx>
Message-Id: <20220801063902.129329-38-xuanzhuo@xxxxxxxxxxxxxxxxx>
Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>


I'm not 100% confident yet, because the likelihood of encountering problems
was not uniform across the versions, with one of them showing the problem only
in 1/3 boots, whereas some of the others showed it 100% of the time. But I've
rebooted enough times to be fairly confident.

With 762faee5a267 I reliably see network not connecting, with
762faee5a267^=fe3dc04e31aa I haven't seen a problem yet.


I did see some other types of crashes in commits nearby, so this might not be
the only problematic bit. See also the discussion around
https://lore.kernel.org/all/CAHk-=wikzU4402P-FpJRK_QwfVOS+t-3p1Wx5awGHTvr-s_0Ew@xxxxxxxxxxxxxx/

Greetings,

Andres Freund