Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers

From: Greg KH
Date: Fri Dec 09 2016 - 02:31:19 EST


On Fri, Dec 09, 2016 at 12:05:53AM +0000, KY Srinivasan wrote:
>
>
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
> > Sent: Thursday, December 8, 2016 7:56 AM
> > To: KY Srinivasan <kys@xxxxxxxxxxxxx>
> > Cc: linux-kernel@xxxxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx;
> > olaf@xxxxxxxxx; apw@xxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
> > jasowang@xxxxxxxxxx; leann.ogasawara@xxxxxxxxxxxxx;
> > bjorn.helgaas@xxxxxxxxx; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial
> > numbers
> >
> > On Thu, Dec 08, 2016 at 12:33:43AM -0800, kys@xxxxxxxxxxxxxxxxxxxxxx
> > wrote:
> > > From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > >
> > > We currently use MAC address to match VF and synthetic NICs. Hyper-V
> > > provides a serial number to both devices for this purpose. This patch
> > > implements the matching based on VF serial numbers. This is the way
> > > specified by the protocol and more reliable.
> > >
> > > Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > > ---
> > > drivers/net/hyperv/netvsc_drv.c | 55
> > ++++++++++++++++++++++++++++++++++++---
> > > 1 files changed, 51 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > b/drivers/net/hyperv/netvsc_drv.c
> > > index 9522763..c5778cf 100644
> > > --- a/drivers/net/hyperv/netvsc_drv.c
> > > +++ b/drivers/net/hyperv/netvsc_drv.c
> > > @@ -1165,9 +1165,10 @@ static void netvsc_free_netdev(struct
> > net_device *netdev)
> > > free_netdev(netdev);
> > > }
> > >
> > > -static struct net_device *get_netvsc_bymac(const u8 *mac)
> > > +static struct net_device *get_netvsc_byvfser(u32 vfser)
> > > {
> > > struct net_device *dev;
> > > + struct net_device_context *ndev_ctx;
> > >
> > > ASSERT_RTNL();
> > >
> > > @@ -1175,7 +1176,8 @@ static void netvsc_free_netdev(struct net_device
> > *netdev)
> > > if (dev->netdev_ops != &device_ops)
> > > continue; /* not a netvsc device */
> > >
> > > - if (ether_addr_equal(mac, dev->perm_addr))
> > > + ndev_ctx = netdev_priv(dev);
> > > + if (ndev_ctx->vf_serial == vfser)
> > > return dev;
> > > }
> > >
> > > @@ -1205,21 +1207,66 @@ static void netvsc_free_netdev(struct
> > net_device *netdev)
> > > return NULL;
> > > }
> > >
> > > +static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> > > +{
> > > + struct device *dev;
> > > + struct hv_device *hdev;
> > > + struct hv_pcibus_device *hbus = NULL;
> > > + struct list_head *iter;
> > > + struct hv_pci_dev *hpdev;
> > > + unsigned long flags;
> > > + u32 vfser = 0;
> > > + u32 count = 0;
> > > +
> > > + for (dev = &vf_netdev->dev; dev; dev = dev->parent) {
> >
> > You are going to walk the whole device tree backwards? That's crazy.
> > And foolish. And racy and broken (what happens if the tree changes
> > while you do this?) Where is the lock being grabbed while this happens?
> > What about reference counts? Do you see other drivers ever doing this
> > (if you do, point them out and I'll go yell at them too...)
>
> Greg,
>
> We are registering for netdev events. Coming into this function, the caller
> guarantees that the list of netdevs does not change - we assert this on entry:
> ASSERT_RTNL(). We are only walking up the device tree for the netdevs whose
> state change is being notified to us - the device tree being walked here is limited to
> netdevs under question.

But a netdev is a child of some type of "real" device, and you are now
walking the tree of all devices up to the "root" parent device, which
means you will hit PCI bridges, USB controllers, and all sorts of fun
things if you are a child of those types of devices.

And can't you tell if the netdev for this event, really is "your"
netdev? Or are you getting called this for "all" netdevs? Sorry, I
don't know this api, any pointers to it would be appreciated.

> We have a reference to the device and we know the device is not going away. Is it not
> safe to dereference the parent pointer - after all the child has taken a reference on
> the parent as part of device_add() call.

It might be, and might not be. There's a reason you don't see this
pattern anywhere in the kernel because of this...

> > > + if (!dev_is_vmbus(dev))
> > > + continue;
> >
> > Ick.
> >
> > Why isn't your parent pointer a vmbus device all the time? How could
> > you get burried down in the device hierarchy when you are the driver for
> > a specific bus type in the first place? How could this function ever be
> > called for a device that is NOT of this type?
>
> We get notified when state changes on any of the netdev devices in the system.
> Not all netdevs in the system belong to vmbus. Consider for instance the
> emulated NIC that can be configured. This is an emulated PCI NIC. We are only
> interested in netdevs that correspond to the VF instance that we are interested in.

Can you "know" this is your netdev by some other way than having to walk
the device tree? Name? local device type? Something else? This seems
like an odd api in that everyone would have to do gyrations like this in
order to determine if the netdev is "theirs" or not...

thanks,

greg k-h