RE: [BISECTED][REGRESSION] Loading Hyper-V network drivers is racy in 3.14+ on Hyper-V 2012 R2

From: Haiyang Zhang
Date: Fri Jul 18 2014 - 17:47:24 EST




> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx]
> Sent: Tuesday, July 15, 2014 1:09 AM
> To: Haiyang Zhang
> Cc: KY Srinivasan; David S. Miller; devel@xxxxxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx
> Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers is
> racy in 3.14+ on Hyper-V 2012 R2
>
> On Mon, Jul 14, 2014 at 10:39:48PM +0000, Haiyang Zhang wrote:
> > > -----Original Message-----
> > > From: Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx]
> > > Sent: Monday, July 14, 2014 5:31 PM
> > > To: Haiyang Zhang
> > > Cc: KY Srinivasan; David S. Miller; devel@xxxxxxxxxxxxxxxxxxxxxx;
> linux-
> > > kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx
> > > Subject: Re: [BISECTED][REGRESSION] Loading Hyper-V network drivers
> is
> > > racy in 3.14+ on Hyper-V 2012 R2
> >
> > Thanks for the tests! I will make a patch that can automatically retry
> > smaller memory allocs when memory is insufficient.
>
> This concerns me a bit - why would there be insufficient memory on a 64
> bit VM with 4 GBytes of RAM just after startup (presumably the host's
> memory isn't the issue)? Additionally, while things might fail just when
> things are starting up, doing ifup eth0 at some point later succeeds so
> whatever issue it had seems temporary.
>
> Perhaps it would be wise to adding some debugging output to see if the
> allocation really failed and why...

Actually, there will be debug log in dmesg if the memory allocation fails.
But it didn't show up in your dmesg. And since it can be recovered by
"ifup eth0" later, the NIC must have been properly loaded (buffer alloc was
successful but took a bit longer time). I think the larger receive-buffer
size (16MB) may take longer time, because vzalloc() may sleep. And, that's
why we don't see the bug with a small buffer size, because the allocation
is quick.

Could you try put "LINKDELAY=60" into the this file?
/etc/sysconfig/network-scripts/ifcfg-eth0
And see if the problem goes away?

Thanks,
- Haiyang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/