Re: [PATCH v2] net/mlx5e: Use refcount_t for refcount

From: Saeed Mahameed
Date: Tue Aug 06 2019 - 14:54:18 EST


On Tue, 2019-08-06 at 15:38 -0300, Jason Gunthorpe wrote:
> On Tue, Aug 06, 2019 at 09:59:58AM +0300, Leon Romanovsky wrote:
> > On Mon, Aug 05, 2019 at 08:06:36PM +0000, Saeed Mahameed wrote:
> > > On Mon, 2019-08-05 at 14:55 +0800, Chuhong Yuan wrote:
> > > > On Mon, Aug 5, 2019 at 2:13 PM Leon Romanovsky <leon@xxxxxxxxxx
> > > > >
> > > > wrote:
> > > > > On Sun, Aug 04, 2019 at 10:44:47PM +0800, Chuhong Yuan wrote:
> > > > > > On Sun, Aug 4, 2019 at 8:59 PM Leon Romanovsky <
> > > > > > leon@xxxxxxxxxx>
> > > > > > wrote:
> > > > > > > On Sat, Aug 03, 2019 at 12:48:28AM +0800, Chuhong Yuan
> > > > > > > wrote:
> > > > > > > > refcount_t is better for reference counters since its
> > > > > > > > implementation can prevent overflows.
> > > > > > > > So convert atomic_t ref counters to refcount_t.
> > > > > > >
> > > > > > > I'm not thrilled to see those automatic conversion
> > > > > > > patches,
> > > > > > > especially
> > > > > > > for flows which can't overflow. There is nothing wrong in
> > > > > > > using
> > > > > > > atomic_t
> > > > > > > type of variable, do you have in mind flow which will
> > > > > > > cause to
> > > > > > > overflow?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > I have to say that these patches are not done
> > > > > > automatically...
> > > > > > Only the detection of problems is done by a script.
> > > > > > All conversions are done manually.
> > > > >
> > > > > Even worse, you need to audit usage of atomic_t and replace
> > > > > there
> > > > > it can overflow.
> > > > >
> > > > > > I am not sure whether the flow can cause an overflow.
> > > > >
> > > > > It can't.
> > > > >
> > > > > > But I think it is hard to ensure that a data path is
> > > > > > impossible
> > > > > > to have problems in any cases including being attacked.
> > > > >
> > > > > It is not data path, and I doubt that such conversion will be
> > > > > allowed
> > > > > in data paths without proving that no performance regression
> > > > > is
> > > > > introduced.
> > > > > > So I think it is better to do this minor revision to
> > > > > > prevent
> > > > > > potential risk, just like we have done in mlx5/core/cq.c.
> > > > >
> > > > > mlx5/core/cq.c is a different beast, refcount there means
> > > > > actual
> > > > > users
> > > > > of CQ which are limited in SW, so in theory, they have
> > > > > potential
> > > > > to be overflown.
> > > > >
> > > > > It is not the case here, there your are adding new port.
> > > > > There is nothing wrong with atomic_t.
> > > > >
> > > >
> > > > Thanks for your explanation!
> > > > I will pay attention to this point in similar cases.
> > > > But it seems that the semantic of refcount is not always as
> > > > clear as
> > > > here...
> > > >
> > >
> > > Semantically speaking, there is nothing wrong with moving to
> > > refcount_t
> > > in the case of vxlan ports.. it also seems more accurate and will
> > > provide the type protection, even if it is not necessary. Please
> > > let me
> > > know what is the verdict here, i can apply this patch to net-
> > > next-mlx5.
> >
> > There is no verdict here, it is up to you., if you like code churn,
> > go
> > for it.
>
> IMHO CONFIG_REFCOUNT_FULL is a valuable enough reason to not open
> code
> with an atomic even if overflow is not possible.
>
> Races resulting in incrs on 0 refcounts is a common enough mistake.
>
> Jason

Indeed, thanks Jason, I will take this patch to net-next-mlx5.