Re: [PATCH v9 2/3] PCI/DOE: Expose the DOE features via sysfs

From: Bjorn Helgaas
Date: Thu Oct 19 2023 - 12:58:45 EST


On Fri, Oct 13, 2023 at 01:41:57PM +1000, Alistair Francis wrote:
> The PCIe 6 specification added support for the Data Object Exchange (DOE).
> When DOE is supported the DOE Discovery Feature must be implemented per
> PCIe r6.1 sec 6.30.1.1. The protocol allows a requester to obtain
> information about the other DOE features supported by the device.
> ...

> +static umode_t pci_doe_sysfs_attr_is_visible(struct kobject *kobj,
> + struct attribute *a, int n)
> +{
> + struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
> + struct pci_doe_mb *doe_mb;
> + unsigned long index, j;
> + void *entry;
> +
> + xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> + xa_for_each(&doe_mb->feats, j, entry)
> + return a->mode;
> + }
> +
> + return 0;

The nested loops that don't test anything look a little weird and
maybe I'm missing something, but this looks like it returns a->mode if
any mailbox with a feature exists, and 0 otherwise.

Is that the same as this:

if (pdev->doe_mbs)
return a->mode;

return 0;

since it sounds like a mailbox must support at least one feature?

> +}
> +
> +static struct attribute *pci_dev_doe_feature_attrs[] = {
> + NULL,
> +};
> +
> +const struct attribute_group pci_dev_doe_feature_group = {
> + .name = "doe_features",
> + .attrs = pci_dev_doe_feature_attrs,
> + .is_visible = pci_doe_sysfs_attr_is_visible,
> +};
> +
> +static ssize_t pci_doe_sysfs_feature_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + return sysfs_emit(buf, "%s\n", attr->attr.name);
> +}
> +
> +static void pci_doe_sysfs_feature_remove(struct pci_dev *pdev,
> + struct pci_doe_mb *doe_mb)
> +{
> + struct device *dev = &pdev->dev;
> + struct device_attribute *attrs = doe_mb->sysfs_attrs;
> + unsigned long i;
> + void *entry;
> +
> + if (!attrs)
> + return;
> +
> + doe_mb->sysfs_attrs = NULL;
> + xa_for_each(&doe_mb->feats, i, entry) {
> + if (attrs[i].show)
> + sysfs_remove_file_from_group(&dev->kobj, &attrs[i].attr,
> + pci_dev_doe_feature_group.name);
> + kfree(attrs[i].attr.name);
> + }
> + kfree(attrs);
> +}
> +
> +static int pci_doe_sysfs_feature_populate(struct pci_dev *pdev,
> + struct pci_doe_mb *doe_mb)
> +{
> + struct device *dev = &pdev->dev;
> + struct device_attribute *attrs;
> + unsigned long num_features = 0;
> + unsigned long vid, type;
> + unsigned long i;
> + void *entry;
> + int ret;
> +
> + xa_for_each(&doe_mb->feats, i, entry)
> + num_features++;
> +
> + attrs = kcalloc(num_features, sizeof(*attrs), GFP_KERNEL);
> + if (!attrs)
> + return -ENOMEM;
> +
> + doe_mb->sysfs_attrs = attrs;
> + xa_for_each(&doe_mb->feats, i, entry) {
> + sysfs_attr_init(&attrs[i].attr);
> + vid = xa_to_value(entry) >> 8;
> + type = xa_to_value(entry) & 0xFF;
> + attrs[i].attr.name = kasprintf(GFP_KERNEL,
> + "0x%04lX:%02lX", vid, type);

What's the rationale for using "0x" on the vendor ID but not on the
type? "0x1234:10" hints that the "10" might be decimal since it lacks
"0x".

Suggest lower-case "%04lx:%02lx" either way.

FWIW, there's no "0x" prefix on the hex vendor IDs in "lspci -n"
output and dmesg messages like this:

pci 0000:01:00.0: [10de:13b6] type 00

> + if (!attrs[i].attr.name) {
> + ret = -ENOMEM;
> + goto fail;
> + }
> +
> + attrs[i].attr.mode = 0444;
> + attrs[i].show = pci_doe_sysfs_feature_show;
> +
> + ret = sysfs_add_file_to_group(&dev->kobj, &attrs[i].attr,
> + pci_dev_doe_feature_group.name);
> + if (ret) {
> + attrs[i].show = NULL;
> + goto fail;
> + }
> + }
> +
> + return 0;
> +
> +fail:
> + pci_doe_sysfs_feature_remove(pdev, doe_mb);
> + return ret;

Not sure we should treat this failure this seriously. Looks like it
will prevent creation of the BAR resource files, and possibly even
abort pci_sysfs_init() early. I think the pci_dev will still be
usable (lacking DOE sysfs) even if this fails.

> +}
> +
> +void pci_doe_sysfs_teardown(struct pci_dev *pdev)
> +{
> + struct pci_doe_mb *doe_mb;
> + unsigned long index;
> +
> + xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> + pci_doe_sysfs_feature_remove(pdev, doe_mb);
> + }
> +}
> +
> +int pci_doe_sysfs_init(struct pci_dev *pdev)
> +{
> + struct pci_doe_mb *doe_mb;
> + unsigned long index;
> + int ret;
> +
> + xa_for_each(&pdev->doe_mbs, index, doe_mb) {
> + ret = pci_doe_sysfs_feature_populate(pdev, doe_mb);
> + if (ret)
> + return ret;
> + }

I agree with Lukas that pci_create_resource_files() is not the right
place to call this.

I try hard to avoid calling *anything* from the
pci_create_sysfs_dev_files() path because it has the nasty
"sysfs_initialized" check and the associated pci_sysfs_init()
initcall.

It's so much cleaner when we can set up static attributes that are
automatically added in the device_add() path. I don't know whether
that's possible. I see lots of discussion with Greg KH that might be
related, but I'm not sure.

I do know that we enumerate the mailboxes and features during
pci_init_capabilities(), which happens before device_add(), so the
information about which attributes should be present is at least
*available* early enough:

pci_host_probe
pci_scan_root_bus_bridge
...
pci_scan_single_device
pci_device_add
pci_init_capabilities
pci_doe_init
while (pci_find_next_ext_capability(PCI_EXT_CAP_ID_DOE))
pci_doe_create_mb
pci_doe_cache_features
pci_doe_discovery
xa_insert(&doe_mb->feats) <--
device_add
device_add_attrs
device_add_groups
pci_bus_add_devices
pci_bus_add_device
pci_create_sysfs_dev_files
...
pci_doe_sysfs_init <--
xa_for_each(&pdev->doe_mbs)
pci_doe_sysfs_feature_populate
xa_for_each(&doe_mb->feats)
sysfs_add_file_to_group(pci_dev_doe_feature_group.name)

Is it feasible to build an attribute group in pci_doe_init() and add
it to dev->groups so device_add() will automatically add them?

It looks like __power_supply_register() does something like that:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/power/supply/power_supply_core.c?id=v6.5#n1375

> --- a/include/linux/pci-doe.h
> +++ b/include/linux/pci-doe.h
> @@ -22,4 +22,6 @@ int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
> const void *request, size_t request_sz,
> void *response, size_t response_sz);
>
> +int pci_doe_sysfs_init(struct pci_dev *pci_dev);
> +void pci_doe_sysfs_teardown(struct pci_dev *pdev);

These declarations look like they should be in drivers/pci/pci.h as
well. I don't think anything outside the PCI core should need these.

Bjorn