Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error

From: Miquel Raynal
Date: Fri Oct 20 2023 - 10:06:51 EST


Hi Frank,

Frank.li@xxxxxxx wrote on Thu, 19 Oct 2023 11:39:42 -0400:

> On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > Hi Frank,
> >
> > Frank.Li@xxxxxxx wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> >
> > > master side report:
> > > silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > >
> > > BIT 20: TIMEOUT error
> > > The module has stalled too long in a frame. This happens when:
> > > - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > middle of a message,
> > > - No STOP was issued and between messages,
> > > - IBI manual is used and no decision was made.
> >
> > I am still not convinced this should be ignored in all cases.
> >
> > Case 1 is a problem because the hardware failed somehow.
>
> But so far, no action to handle this case in current code.

Yes, but if you detect an issue and ignore it, it's not better than
reporting it without handling it. Instead of totally ignoring this I
would at least write a debug message (identical to what's below) before
returning false, even though I am not convinced unconditionally
returning false here is wise. If you fail a hardware sequence because
you added a printk, it's a problem. Maybe you consider this line as
noise, but I believe it's still an error condition. Maybe, however,
this bit gets set after the whole sequence, and this is just a "bus
is idle" condition. If that's the case, then you need some
additional heuristics to properly ignore the bit?

> In svc_i3c_master_xfer() have not check this flags. also have not enable
> ERRWARN irq.
>
> If we met this case, we can add new functions/argument to handle this.
> Then we can real debug the code and recover bus.
>
> Without this patch, simplest add some debug message before issue
> SVC_I3C_MCTRL_REQUEST_AUTO_IBI, TIMEOUT will be set.

Yes, and sometimes it won't be an issue, but sometimes it may. Maybe we
can find more advanced heuristics there.

> And svc_i3c_master_error() was only called by svc_i3c_master_ibi_work().
>
> So I can think only case 3 happen in svc_i3c_master_ibi_work().

Case 3 cannot be handled by Linux (because of the natural latency of
the OS).

>
> Frank
>
> > Case 2 is fine I guess.
> > Case 3 is not possible in Linux, this will not be supported.
> >
> > > The maximum stall period is 10 KHz or 100 μs.
> >
> > s/10 KHz//
> >
> > >
> > > This is a just warning. System irq thread schedule latency is possible
> > > bigger than 100us. Just omit this waring.
> >
> > This can be considered as being just a warning as the system IRQ
> > latency can easily be greater than 100us.

This was skipped in your v3.

> > > Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: Frank Li <Frank.Li@xxxxxxx>
> > > ---

Thanks,
Miquèl