Re: [PATCH v1 net 2/2] net: mscc: ocelot: fix possible memory conflict for vcap_props

From: Colin Foster
Date: Sat Apr 30 2022 - 13:24:14 EST


Hi Vladimir,

On Sat, Apr 30, 2022 at 02:24:57PM +0000, Vladimir Oltean wrote:
> Hi Colin,
>
> On Fri, Apr 29, 2022 at 04:30:49PM -0700, Colin Foster wrote:
> > Each instance of an ocelot struct has the ocelot_vcap_props structure being
> > referenced. During initialization (ocelot_init), these vcap_props are
> > detected and the structure contents are modified.
> >
> > In the case of the standard ocelot driver, there will probably only be one
> > instance of struct ocelot, since it is part of the chip.
> >
> > For the Felix driver, there could be multiple instances of struct ocelot.
> > In that scenario, the second time ocelot_init would get called, it would
> > corrupt what had been done in the first call because they both reference
> > *ocelot->vcap. Both of these instances were assigned the same memory
> > location.
> >
> > Move this vcap_props memory to within struct ocelot, so that each instance
> > can modify the structure to their heart's content without corrupting other
> > instances.
> >
> > Fixes: 2096805497e2b ("net: mscc: ocelot: automatically detect VCAP
> > constants")
> >
> > Signed-off-by: Colin Foster <colin.foster@xxxxxxxxxxxxxxxx>
> > ---
>
> To prove an issue, you must come with an example of two switches which
> share the same struct vcap_props, but contain different VCAP constants
> in the hardware registers. Otherwise, what you call "corruption" is just
> "overwriting with the same values".
>
> I would say that by definition, if two such switches have different VCAP
> constants, they have different vcap_props structures, and if they have
> the same vcap_props structure, they have the same VCAP constants.
>
> Therefore, even in a multi-switch environment, a second call to
> ocelot_vcap_detect_constants() would overwrite the vcap->entry_width,
> vcap->tg_width, vcap->sw_count, vcap->entry_count, vcap->action_count,
> vcap->action_width, vcap->counter_words, vcap->counter_width with the
> exact same values.
>
> I do not see the point in duplicating struct vcap_props per ocelot
> instance.
>
> I assume you are noticing some problems with VSC7512? What are they?

I'm not seeing issues, no. I was looking to implement the shared
ocelot_vcap struct between the 7514 and (in-development 7512. In doing
so I came across this realization that these per-file structures could
be referenced multiple times, which was the point of this patch. If the
structure were simply a const configuration there would be no issue, but
since it is half const and half runtime populated it got more complicated.

(that is likely why I didn't make it shared initially... which feels
like ages ago at this point)

Whether or not hardware exists that could be affected by this corner
case I don't know.

> Note that since VSC7512 isn't currently supported by the kernel, even a
> theoretical corruption issue doesn't qualify as a bug, since there is no
> way to reproduce it. All the Microchip switches supported by the kernel
> are internal to an SoC, are single switches, and they have different
> vcap_props structures.

I see. So I do have a misunderstanding in the process.

I shouldn't have submitted this to net, because it isn't an actual "bug"
I observed. Instead it was a potential issue with existing code, and
could have affected certain hardware configurations. How should I have
sent this out? (RFC? net-next? separate conversation discussing the
validity?)

Back to this patch in particular:

You're saying there's no need to duplicate the vcap_props structure
array per ocelot instance. Understood. Would it be an improvement to
split up vcap into a const configuration section (one per hardware
layout) and a detected set? Or would you have any other suggestion?

And, of course, I can drag this along with my 7512 patch set for now, or
try to get this in now. This one feels like it is worth keeping
separate...

And thanks as always for your feedback!