Re: [PATCH v4] PCI/DOE: Expose the DOE protocols via sysfs

From: Alistair Francis
Date: Fri Aug 11 2023 - 14:41:17 EST


On Thu, Aug 10, 2023 at 9:04 PM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote:
>
> On 8/11/23 01:33, Alistair Francis wrote:
> > The PCIe 6 specification added support for the Data Object Exchange (DOE).
> > When DOE is supported the Discovery Data Object Protocol must be
> > implemented. The protocol allows a requester to obtain information about
> > the other DOE protocols supported by the device.
> >
> > The kernel is already querying the DOE protocols supported and cacheing
> > the values. This patch exposes the values via sysfs. This will allow
> > userspace to determine which DOE protocols are supported by the PCIe
> > device.
> >
> > By exposing the information to userspace tools like lspci can relay the
> > information to users. By listing all of the supported protocols we can
> > allow userspace to parse and support the list, which might include
> > vendor specific protocols as well as yet to be supported protocols.
> >
> > Each DOE feature is exposed as a single file. The files are empty and
> > the information is contained in the file name.
>
> s/feature/protocol ?

Fixed

>
> Personally, I would still have each file content repeat the same information as
> the file name specifies. That is, file value == file name. That will avoid
> people getting confused as empty sysfs files are rather uncommon.

I don't see an obvious way to implement that with the .show()
function. I don't see a clear way to know what file the user accessed.

Plus I don't see a need to. The files exist and provide the
information, do we really need to duplicate it?

>
> >
> > This uses pci_sysfs_init() instead of the ->is_visible() function as
> > is_visible only applies to the attributes under the group. Which
> > means that every PCIe device will see a `doe_protos` directory, no
> > matter if DOE is supported at all on the device.
> >
> > On top of that ->is_visible() is only called
> > (fs/sysfs/group.c:create_files()) if there are sub attrs, which we
> > don't necessary have. There are no static attrs, instead they are
> > all generated dynamically.
>
> You said that the kernel caches the protocols supported. So it should not be
> hard to allocate one attribute for each of the supported protocols when these
> are discovered, no ?

I couldn't figure out a way to get this to work. You end up with a
race between the sysfs group being created and the attributes being
created. The DOE features are probed before the sysfs init creates the
group.

>
> >
> > Signed-off-by: Alistair Francis <alistair.francis@xxxxxxx>
> > ---
> > v4:
> > - Fixup typos in the documentation
> > - Make it clear that the file names contain the information
> > - Small code cleanups
> > - Remove most #ifdefs
> > - Remove extra NULL assignment
> > v3:
> > - Expose each DOE feature as a separate file
> > v2:
> > - Add documentation
> > - Code cleanups
> >
> > We did talk about exposing DOE types under DOE vendor IDs, but I couldn't
> > figure out a simple way to do that
> >
> > Documentation/ABI/testing/sysfs-bus-pci | 10 +++
> > drivers/pci/doe.c | 104 ++++++++++++++++++++++++
> > drivers/pci/pci-sysfs.c | 7 ++
> > include/linux/pci-doe.h | 1 +
> > 4 files changed, 122 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> > index ecf47559f495..e09c51449284 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-pci
> > +++ b/Documentation/ABI/testing/sysfs-bus-pci
> > @@ -500,3 +500,13 @@ Description:
> > console drivers from the device. Raw users of pci-sysfs
> > resourceN attributes must be terminated prior to resizing.
> > Success of the resizing operation is not guaranteed.
> > +
> > +What: /sys/bus/pci/devices/.../doe_protos
> > +Date: August 2023
> > +Contact: Linux PCI developers <linux-pci@xxxxxxxxxxxxxxx>
> > +Description:
> > + This directory contains a list of the supported Data Object Exchange (DOE)
> > + features. The feature values are in the file name; the files have no contents.
> > + The value comes from the device and specifies the vendor and
> > + data object type supported. The lower byte is the data object type and the next
> > + two bytes are the vendor ID.
> > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> > index 1b97a5ab71a9..918872152fb6 100644
> > --- a/drivers/pci/doe.c
> > +++ b/drivers/pci/doe.c
> > @@ -56,6 +56,8 @@ struct pci_doe_mb {
> > wait_queue_head_t wq;
> > struct workqueue_struct *work_queue;
> > unsigned long flags;
> > +
> > + struct device_attribute *sysfs_attrs;
> > };
> >
> > struct pci_doe_protocol {
> > @@ -92,6 +94,108 @@ struct pci_doe_task {
> > struct pci_doe_mb *doe_mb;
> > };
> >
> > +#ifdef CONFIG_SYSFS
> > +static struct attribute *pci_dev_doe_proto_attrs[] = {
> > + NULL,
> > +};
> > +
> > +static const struct attribute_group pci_dev_doe_proto_group = {
> > + .name = "doe_protos",
>
> Why is this a static variable instead of being a member of the pci doe_mb struct
> ?d Devices without DOE support would always have that as NULL and only the

I don't follow. Do you mean define the name as part of the struct
pci_doe_mb *doe_mb?

> devices that support it would get the group and array of attributes that you
> allocate in pci_doe_sysfs_proto_supports(). That would also remove the need for
> the attrs array being a static variable as well.
>
> An let's spell things out to be clear and avoid confusions: s/protos/protocols

I can change the name

>
> > + .attrs = pci_dev_doe_proto_attrs,
> > +};
> > +
> > +static void pci_doe_sysfs_remove_desc(struct pci_doe_mb *doe_mb)
> > +{
> > + struct device_attribute *attrs = doe_mb->sysfs_attrs;
> > + unsigned long i;
> > + void *entry;
> > +
> > + if (!doe_mb->sysfs_attrs)
> > + return;
> > +
> > + doe_mb->sysfs_attrs = NULL;
> > + xa_for_each(&doe_mb->prots, i, entry)
> > + kfree(attrs[i].attr.name);
> > +
> > + kfree(attrs);
> > +}
> > +
> > +static int pci_doe_sysfs_proto_supports(struct pci_dev *pdev, struct pci_doe_mb *doe_mb)
> > +{
> > + struct device *dev = &pdev->dev;
> > + struct device_attribute *attrs;
> > + unsigned long num_protos = 0;
> > + unsigned long vid, type;
> > + unsigned long i;
> > + void *entry;
> > + int ret;
> > +
> > + xa_for_each(&doe_mb->prots, i, entry)
> > + num_protos++;
> > +
> > + attrs = kcalloc(num_protos, sizeof(*attrs), GFP_KERNEL);
> > + if (!attrs)
> > + return -ENOMEM;
> > +
> > + doe_mb->sysfs_attrs = attrs;
> > + xa_for_each(&doe_mb->prots, i, entry) {
> > + sysfs_attr_init(&attrs[i].attr);
> > + vid = xa_to_value(entry) >> 8;
> > + type = xa_to_value(entry) & 0xFF;
> > + attrs[i].attr.name = kasprintf(GFP_KERNEL, "0x%04lX:%02lX", vid, type);
> > + if (!attrs[i].attr.name) {
> > + ret = -ENOMEM;
> > + goto fail;
> > + }
> > +
> > + attrs[i].attr.mode = 0444;
> > +
> > + ret = sysfs_add_file_to_group(&dev->kobj, &attrs[i].attr,
> > + pci_dev_doe_proto_group.name);
> > + if (ret)
> > + goto fail;
> > + }
> > +
> > + return 0;
> > +
> > +fail:
> > + pci_doe_sysfs_remove_desc(doe_mb);
> > + return ret;
> > +}
> > +
> > +int doe_sysfs_init(struct pci_dev *pdev)
> > +{
> > + unsigned long total_protos = 0;
> > + struct pci_doe_mb *doe_mb;
> > + unsigned long index, j;
> > + void *entry;
> > + int ret;
> > +
> > + xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> > + xa_for_each(&doe_mb->prots, j, entry)
> > + total_protos++;
> > + }
> > +
> > + if (total_protos == 0)
> > + return 0;
> > +
> > + ret = devm_device_add_group(&pdev->dev, &pci_dev_doe_proto_group);
> > + if (ret) {
> > + pci_err(pdev, "can't create DOE goup: %d\n", ret);
> > + return ret;
> > + }
> > +
> > + xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> > + ret = pci_doe_sysfs_proto_supports(pdev, doe_mb);
> > +
>
> Remove this blank line.
>
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +#endif
> > +
> > static int pci_doe_wait(struct pci_doe_mb *doe_mb, unsigned long timeout)
> > {
> > if (wait_event_timeout(doe_mb->wq,
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index ab32a91f287b..ad621850a3e2 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -16,6 +16,7 @@
> > #include <linux/kernel.h>
> > #include <linux/sched.h>
> > #include <linux/pci.h>
> > +#include <linux/pci-doe.h>
> > #include <linux/stat.h>
> > #include <linux/export.h>
> > #include <linux/topology.h>
> > @@ -1226,6 +1227,12 @@ static int pci_create_resource_files(struct pci_dev *pdev)
> > int i;
> > int retval;
> >
> > + if (IS_ENABLED(CONFIG_PCI_DOE)) {
> > + retval = doe_sysfs_init(pdev);
> > + if (retval)
> > + return retval;
> > + }
> > +
> > /* Expose the PCI resources from this device as files */
> > for (i = 0; i < PCI_STD_NUM_BARS; i++) {
> >
> > diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
> > index 1f14aed4354b..4cc13d9ccb50 100644
> > --- a/include/linux/pci-doe.h
> > +++ b/include/linux/pci-doe.h
> > @@ -22,4 +22,5 @@ int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
> > const void *request, size_t request_sz,
> > void *response, size_t response_sz);
> >
> > +int doe_sysfs_init(struct pci_dev *pci_dev);
> > #endif
>
> --
> Damien Le Moal
> Western Digital Research
>