Re: [PATCH 02/15] cxl/core: Check physical address before mapping it in devm_cxl_iomap_block()

From: Dan Williams
Date: Fri Sep 30 2022 - 15:07:39 EST


Robert Richter wrote:
> On 16.09.22 11:04:01, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 07.09.22 22:48:57, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > > > The physical base address of a CXL range can be invalid and is then
> > > > > set to CXL_RESOURCE_NONE. Early check this case before mapping a
> > > > > memory block in devm_cxl_iomap_block().
> > > > >
> > > > > Signed-off-by: Robert Richter <rrichter@xxxxxxx>
> > > > > ---
> > > > > drivers/cxl/core/regs.c | 3 +++
> > > > > 1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > > > index 39a129c57d40..f216c017a474 100644
> > > > > --- a/drivers/cxl/core/regs.c
> > > > > +++ b/drivers/cxl/core/regs.c
> > > > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> > > > > void __iomem *ret_val;
> > > > > struct resource *res;
> > > > >
> > > > > + if (addr == CXL_RESOURCE_NONE)
> > > > > + return NULL;
> > > > > +
> > > > > res = devm_request_mem_region(dev, addr, length, dev_name(dev));
> > > > > if (!res) {
> > > > > resource_size_t end = addr + length - 1;
> > > > > --
> > > > > 2.30.2
> > > > >
> > > >
> > > > devm_request_mem_region() succeeds for you when this happens? More
> > > > details about the failure scenario please.
> > >
> > > No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is
> > > calculated that even overflows. I only vaguely remember the exact
> > > error message.
> > >
> > > This may happen e.g. if the Component Register Block is missing in the
> > > DVSEC. cxl_find_regblock() may fail then and returns
> > > CXL_RESOURCE_NONE. There are a couple of code paths there
> > > component_reg_phys is set to CXL_RESOURCE_NONE without exiting
> > > immediately.
> > >
> > > I saw it during code development, when I pre-inititalized a port with
> > > component_reg_phys set to CXL_RESOURCE_NONE. Since that case can
> > > generally happen, I think it must be checked.
> >
> > I think Jonathan had it right when we posited that the code should
> > probably have failed before getting to this point. For example, the
> > scenarios where the driver looks for a component register block via the
> > register locator DVSEC are not valid for RCDs in the first instance.
>
> I am ok having the code reviewed to prevent such situations. But the
> handling of component_reg_phys is by far not trivial as there are many
> locations it is initialized and others where it is used. Call chain is
> across multiple functions using various data sources for
> component_reg_phys, so it is hard to proof this may never happen.
> E.g. in add_port_attach_ep() I found this:
>
> component_reg_phys = find_component_registers(uport_dev);
> port = devm_cxl_add_port(&parent_port->dev, uport_dev,
> component_reg_phys, parent_dport);
>
> find_component_registers() and subsequent functions (e.g.
> cxl_regmap_to_base()) may return CXL_RESOURCE_NONE. But it is written
> to port without any further check in cxl_port_alloc():
>
> port->component_reg_phys = component_reg_phys;
>
> It is then later directly used in devm_cxl_setup_hdm() to map io
> ranges with devm_cxl_iomap_block().
>
> Just an example, I did not go through other code paths.
>
> IMO, the check is cheap and prevents weird follow up errors and log
> messages. But I would be also ok, to just drop the patch. There are no
> dependencies to this change. What do you think?

Good points above...

Yeah, the expectation is that only "broken hardware" fails to provide
component registers, but the driver is currently deficient in that it
assumes everything is CXL 2.0.

Now you have me thinking it should go a bit further and do a WARN_ONCE()
when CXL_RESOURCE_NONE is passed, as those locations need better error
handling than just handling it like an ioremap() failure.