Re: [PATCH v2] net/mlx5e: Use refcount_t for refcount

From: Jason Gunthorpe
Date: Tue Aug 06 2019 - 14:38:28 EST


On Tue, Aug 06, 2019 at 09:59:58AM +0300, Leon Romanovsky wrote:
> On Mon, Aug 05, 2019 at 08:06:36PM +0000, Saeed Mahameed wrote:
> > On Mon, 2019-08-05 at 14:55 +0800, Chuhong Yuan wrote:
> > > On Mon, Aug 5, 2019 at 2:13 PM Leon Romanovsky <leon@xxxxxxxxxx>
> > > wrote:
> > > > On Sun, Aug 04, 2019 at 10:44:47PM +0800, Chuhong Yuan wrote:
> > > > > On Sun, Aug 4, 2019 at 8:59 PM Leon Romanovsky <leon@xxxxxxxxxx>
> > > > > wrote:
> > > > > > On Sat, Aug 03, 2019 at 12:48:28AM +0800, Chuhong Yuan wrote:
> > > > > > > refcount_t is better for reference counters since its
> > > > > > > implementation can prevent overflows.
> > > > > > > So convert atomic_t ref counters to refcount_t.
> > > > > >
> > > > > > I'm not thrilled to see those automatic conversion patches,
> > > > > > especially
> > > > > > for flows which can't overflow. There is nothing wrong in using
> > > > > > atomic_t
> > > > > > type of variable, do you have in mind flow which will cause to
> > > > > > overflow?
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > I have to say that these patches are not done automatically...
> > > > > Only the detection of problems is done by a script.
> > > > > All conversions are done manually.
> > > >
> > > > Even worse, you need to audit usage of atomic_t and replace there
> > > > it can overflow.
> > > >
> > > > > I am not sure whether the flow can cause an overflow.
> > > >
> > > > It can't.
> > > >
> > > > > But I think it is hard to ensure that a data path is impossible
> > > > > to have problems in any cases including being attacked.
> > > >
> > > > It is not data path, and I doubt that such conversion will be
> > > > allowed
> > > > in data paths without proving that no performance regression is
> > > > introduced.
> > > > > So I think it is better to do this minor revision to prevent
> > > > > potential risk, just like we have done in mlx5/core/cq.c.
> > > >
> > > > mlx5/core/cq.c is a different beast, refcount there means actual
> > > > users
> > > > of CQ which are limited in SW, so in theory, they have potential
> > > > to be overflown.
> > > >
> > > > It is not the case here, there your are adding new port.
> > > > There is nothing wrong with atomic_t.
> > > >
> > >
> > > Thanks for your explanation!
> > > I will pay attention to this point in similar cases.
> > > But it seems that the semantic of refcount is not always as clear as
> > > here...
> > >
> >
> > Semantically speaking, there is nothing wrong with moving to refcount_t
> > in the case of vxlan ports.. it also seems more accurate and will
> > provide the type protection, even if it is not necessary. Please let me
> > know what is the verdict here, i can apply this patch to net-next-mlx5.
>
> There is no verdict here, it is up to you., if you like code churn, go
> for it.

IMHO CONFIG_REFCOUNT_FULL is a valuable enough reason to not open code
with an atomic even if overflow is not possible.

Races resulting in incrs on 0 refcounts is a common enough mistake.

Jason