Re: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

From: Dan Williams
Date: Fri Jun 17 2022 - 21:09:03 EST


Alison Schofield wrote:
> On Fri, Jun 17, 2022 at 11:42:11AM -0700, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@xxxxxxxxx>
> > >
> > > The sysfs attribute, get_poison, allows user space to request the
> > > retrieval of a CXL devices poison list for its persistent memory.
> >
> > If the device supports get poison list for volatile memory, just grab
> > that too. With the "to be released soon" region patches userspace can
> > trivially translate DPA addresses to media type.
> >
>
> Dan,
>
> The only way I know to discover if the device supports poison list for
> volatile is to do the get_poison_list on the volatile range and see
> what happens. Am I missing a capability setting somewhere?

If someone executes "echo 1 > trace_poison_list" I expect that the
driver does:

get_poison_list(volatile_range);
get_poison_list(pmem_range);

...and if scanning the volatile partition ends in error then that just
means no error records appear. When the error is "Invalid Physical
Address" the driver can just remember that's a permanent error and never
try again. So it's more like:

if (volatile_range_valid) {
if (get_poison_list(volatile_range) == INVALID_PHYS_ADDR)
volatile_range_valid = false;
}
get_poison_list(pmem_range);

...but that's probably overkill since get_poison_list() is cheap. Just
treat it like the zero error records case.

In the to be released region provisioning patches there is a DPA
resource tree partitioned by DPA mode type, so the poison list code
probably wants to do something like:

down_read(&cxl_dpa_rwsem);
for (p = cxlds->dpa_res.child; p; p = p->sibling)
get_poison_list(p->start, resource_size(p));
up_read(&cxl_dpa_rwsem);