Re: [PATCH v3 08/10] PCI/bwctrl: Add "controller" part into PCIe bwctrl

From: Ilpo Järvinen
Date: Mon Jan 01 2024 - 13:13:04 EST


On Sat, 30 Dec 2023, Lukas Wunner wrote:

> On Fri, Sep 29, 2023 at 02:57:21PM +0300, Ilpo Järvinen wrote:
>
>
> > --- a/drivers/pci/pcie/bwctrl.c

> > +static u16 speed2lnkctl2(enum pci_bus_speed speed)
> > +{
> > + static const u16 speed_conv[] = {
> > + [PCIE_SPEED_2_5GT] = PCI_EXP_LNKCTL2_TLS_2_5GT,
> > + [PCIE_SPEED_5_0GT] = PCI_EXP_LNKCTL2_TLS_5_0GT,
> > + [PCIE_SPEED_8_0GT] = PCI_EXP_LNKCTL2_TLS_8_0GT,
> > + [PCIE_SPEED_16_0GT] = PCI_EXP_LNKCTL2_TLS_16_0GT,
> > + [PCIE_SPEED_32_0GT] = PCI_EXP_LNKCTL2_TLS_32_0GT,
> > + [PCIE_SPEED_64_0GT] = PCI_EXP_LNKCTL2_TLS_64_0GT,
> > + };
>
> Looks like this could be a u8 array to save a little bit of memory.
>
> Also, this seems to duplicate pcie_link_speed[] defined in probe.c.

It's not the same. pcie_link_speed[] is indexed by a different thing
unless you also suggest I should do index minus a number? There are
plenty of arithmetics possible when converting between the types but
the existing converions functions don't seem to take advantage of those
properties so I've been a bit hesitant to add such assumptions myself.

I suppose I used u16 because the reg is u16 but you're of course correct
that the values don't take up more than u8.

> > +static int bwctrl_select_speed(struct pcie_device *srv, enum pci_bus_speed *speed)
> > +{
> > + struct pci_bus *bus = srv->port->subordinate;
> > + u8 speeds, dev_speeds;
> > + int i;
> > +
> > + if (*speed > PCIE_LNKCAP2_SLS2SPEED(bus->pcie_bus_speeds))
> > + return -EINVAL;
> > +
> > + dev_speeds = READ_ONCE(bus->pcie_dev_speeds);
>
> Hm, why is the compiler barrier needed?

It's probably an overkill but there could be a checker which finds this
read is not protected by anything while the value could get updated
concurrectly (there's probably already such checker as I've seen patches
to data races found with some tool). I suppose the alternative would be to
mark the data race being harmless (because if endpoint removal clears
pcie_dev_speeds, bwctrl will be pretty moot). I just chose to use
READ_ONCE() that prevents rereading the same value later in this function
making the function behave consistently regardless of what occurs parallel
to it with the endpoints.

If the value selected cannot be set because of endpoint no longer being
there, the other parts of the code will detect that.

So if I just add a comment to this line why there's the data race and keep
it as is?

> > + /* Only the lowest speed can be set when there are no devices */
> > + if (!dev_speeds)
> > + dev_speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
>
> Maybe move this to patch [06/10], i.e. set dev->bus->pcie_dev_speeds to
> PCI_EXP_LNKCAP2_SLS_2_5GB on removal (instead of 0).

Okay, I'll set it to 2_5GB on init and removal.

> > + speeds = bus->pcie_bus_speeds & dev_speeds;
> > + i = BIT(fls(speeds));
> > + while (i >= PCI_EXP_LNKCAP2_SLS_2_5GB) {
> > + enum pci_bus_speed candidate;
> > +
> > + if (speeds & i) {
> > + candidate = PCIE_LNKCAP2_SLS2SPEED(i);
> > + if (candidate <= *speed) {
> > + *speed = candidate;
> > + return 0;
> > + }
> > + }
> > + i >>= 1;
> > + }
>
> Can't we just do something like
>
> supported_speeds = bus->pcie_bus_speeds & dev_speeds;
> desired_speeds = GEN_MASK(pcie_link_speed[*speed], 1);
> *speed = BIT(fls(supported_speeds & desired_speeds));
>
> and thus avoid the loop altogether?

Yes, I can change to loopless version.

I'll try to create functions for the speed format conversions though
rather than open coding them into the logic.

> > @@ -91,16 +242,32 @@ static int pcie_bandwidth_notification_probe(struct pcie_device *srv)
> > if (ret)
> > return ret;
> >
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + ret = -ENOMEM;
> > + goto free_irq;
> > + }
> > + mutex_init(&data->set_speed_mutex);
> > + set_service_data(srv, data);
> > +
>
> I think you should move that further up so that you allocate and populate
> the data struct before requesting the IRQ. Otherwise if BIOS has already
> enabled link bandwith notification for some reason, the IRQ handler might
> be invoked without the data struct being allocated.

Sure, I don't know why I missed that possibility.


--
i.