Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

From: Frank Li
Date: Fri Oct 20 2023 - 11:17:32 EST


On Fri, Oct 20, 2023 at 10:47:52AM -0400, Frank Li wrote:
> On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > Frank.li@xxxxxxx wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> >
> > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > Hi Frank,
> > > >
> > > > Frank.li@xxxxxxx wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > >
> > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > > > > Hi Frank,
> > > > > >
> > > > > > Frank.Li@xxxxxxx wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > >
> > > > > > > master side report:
> > > > > > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > >
> > > > > > > BIT 20: TIMEOUT error
> > > > > > > The module has stalled too long in a frame. This happens when:
> > > > > > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > middle of a message,
> > > > > > > - No STOP was issued and between messages,
> > > > > > > - IBI manual is used and no decision was made.
> > > > > >
> > > > > > I am still not convinced this should be ignored in all cases.
> > > > > >
> > > > > > Case 1 is a problem because the hardware failed somehow.
> > > > >
> > > > > But so far, no action to handle this case in current code.
> > > >
> > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > reporting it without handling it. Instead of totally ignoring this I
> > > > would at least write a debug message (identical to what's below) before
> > > > returning false, even though I am not convinced unconditionally
> > > > returning false here is wise. If you fail a hardware sequence because
> > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > is idle" condition. If that's the case, then you need some
> > > > additional heuristics to properly ignore the bit?
> > > >
> > >
> > > dev_err(master->dev,
> > > "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > mstatus, merrwarn);
> > > +
> > > + /* ignore timeout error */
> > > + if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > + return false;
> > > +
> > >
> > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> >
> > I think you mentioned earlier that the problem was not the printk but
> > the return value. So perhaps there is a way to know if the timeout
> > happened after a transaction and was legitimate or not?
>
> Error message just annoise user, don't impact function. But return false
> let IBI thread running to avoid dead lock.

I forget mention one thing. Any error message here will make SDA low for
longer. Before emit stop, SDA is low.

I have not checked I3C spec yet about how long SDA will be allowed. it will
worser if message go through uart port. The bus will be locked longer.

It's better to print error message after emit_stop to reduce SDA low time.

Frank

>
> >
> > In any case we should probably lower the log level for this error.
>
> Only SVC_I3C_MERRWARN_TIMEOUT is warning
>
> Maybe below logic is better
>
> if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> dev_dbg(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);
> return false;
> }
>
> dev_err(master->dev,
> "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> mstatus, merrwarn);
> ....
>
> Frank
>
> >
> > Thanks,
> > Miquèl