Re: [PATCH] usb: typec: ucsi: Clear pending after acking connector change

From: Benjamin Berg
Date: Wed May 19 2021 - 07:50:40 EST


On Wed, 2021-05-19 at 14:33 +0300, Heikki Krogerus wrote:
> On Tue, May 18, 2021 at 08:04:14PM +0200, Benjamin Berg wrote:
> > On Tue, 2021-05-18 at 16:29 +0300, Heikki Krogerus wrote:
> > > On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote:
> > > >
> > > > [SNIP]
> > > > Unfortunately, I don't feel it'll work. The problem that I was
> > > > seeing
> > > > looked like a race condition in the PPM itself, where the
> > > > window is
> > > > the
> > > > time between the UCSI_GET_CONNECTOR_STATUS command and the
> > > > subsequent
> > > > ACK.
> > > > For such a firmware level bug in the PPM, we need a way to
> > > > detect
> > > > the
> > > > race condition when it happens (or get a fix for the firmware).
> > >
> > > OK. Let me know does the patch bring the issue back for you.
> >
> > So, I just tried the patch, and I can occasionally reproduce the
> > issue
> > where "online" for the ucsi power adapter is stuck at "1" after
> > unplugging with the patch applied.
>
> Thanks for testing it.
>
> I'm still not sure that the PPM is the culprit here. I have a feeling
> that the problem you are seeing is caused by the workaround (bad
> workaround) that we have for the issue where the EC firmware does not
> return with the BUSY bit set in the CCI when it should in many cases.
> The UCSI ACPI driver has one minute timeout value for command
> completion because of that, which is way too long. So if the EC
> firmware decides to take its time before acknowledging command, the
> driver is stuck, and we start loosing the events...

Hmm, interesting, yes. If the PPM is sending a second change event
before or after we ACK'ed the first one, and we loose this event, then
that would absolutely explain the issue I am seeing.

In my case, I was able to considerably increase the probability of the
bug by inserting a sleep just before acknowledging the connector
change. Not sure whether this is meaningful, but maybe that tells us
something about who is at fault here.

Benjamin

> Well, I guess
> technically the PPM would be the culprit in the end in any case, but
> I'm just not sure that there is any race like you suspected.
>
> But this is off topic. I'll send you an RFC proposal what I think we
> could do about that.
>
>
> thanks,
>

Attachment: signature.asc
Description: This is a digitally signed message part