Re: [PATCH v2 1/2] cxl/cdat: Handle cdat table build errors

From: Jonathan Cameron
Date: Mon Jan 08 2024 - 13:13:56 EST


On Mon, 8 Jan 2024 18:00:42 +0000
Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:

> On Mon, 8 Jan 2024 08:06:32 -0800
> Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
>
> > Jonathan Cameron wrote:
> > > On Wed, 20 Dec 2023 11:55:33 -0800
> > > Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
> > >
> > > > fan wrote:
> > > > > On Wed, Nov 29, 2023 at 05:33:03PM -0800, Ira Weiny wrote:
> > > > > > The callback for building CDAT tables may return negative error codes.
> > > > > > This was previously unhandled and will result in potentially huge
> > > > > > allocations later on in ct3_build_cdat()
> > > > > >
> > > > > > Detect the negative error code and defer cdat building.
> > > > > >
> > > > > > Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
> > > > > > Cc: Huai-Cheng Kuo <hchkuo@xxxxxxxxxxxxxxxxxxx>
> > > > > > Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> > > > > > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> > > > > > ---
> > > > > > hw/cxl/cxl-cdat.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > > > > > index 639a2db3e17b..24829cf2428d 100644
> > > > > > --- a/hw/cxl/cxl-cdat.c
> > > > > > +++ b/hw/cxl/cxl-cdat.c
> > > > > > @@ -63,7 +63,7 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> > > > > > cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
> > > > > > cdat->private);
> > > > > >
> > > > > > - if (!cdat->built_buf_len) {
> > > > > > + if (cdat->built_buf_len <= 0) {
> > > > > > /* Build later as not all data available yet */
> > > > > > cdat->to_update = true;
> > > > > > return;
> > > > > >
> > > > >
> > > > > The fix looks good to me. Just curious how to really build cdat table
> > > > > again when an error occurs, for example, the memory allocation fails.
> > > >
> > > > I did not go that far as I am unsure as well.
> > > Memory allocations in qemu don't fail (well if they do it crashes)
> > > Side effect of using glib which makes for simpler cases.
> > > https://docs.gtk.org/glib/func.malloc.html
> > >
> > > There shouldn't even be any checks :( I'll fix that up at somepoint
> > > across all the CXL emulation. Sometimes reviewers noticed and
> > > we dropped it at earlier stages, but clearly didn't catch them all.
> > >
> > > Which come to think of it is why this error condition is in practice
> > > not actually buggy as the code won't ever manage to return -ENOMEM and
> > > I don't think there are other error codes.
> >
> > Ah. Ok but in that case I would say that build_cdat_table() should never
> > return < 0 to be clear at this level what can happen.
> >
> > Would you like a patch for that? (/me assumes you dropped this patch)
>
> Probably needs to first rip out all the -ENOMEM returns that got into
> the CXL code in general, then tidy up the return type to be unsigned.
>
> If you want to do that it would be welcome!
Actually. Build_cdat_table() can return errors just not for this reason.

host_memory_backend_get_memory() can fail for example. So original patch is good
as is, just that the discussion of memory allocation failure threw me
off and should be cleaned up separately.

Jonathan

>
> Jonathan
>
>
> >
> > Ira
> >
>