Re: [RFC] Fix stuck UCSI controller on DELL

From: Christian A. Ehrhardt
Date: Wed Jan 17 2024 - 01:35:39 EST



Hi Mario,

On Tue, Jan 16, 2024 at 09:00:03PM -0600, Mario Limonciello wrote:
> On 1/15/2024 12:55, Christian A. Ehrhardt wrote:
> >
> > Hi Heikki,
> >
> > sorry to bother you again with this but I'm afraid there's
> > a misunderstanding wrt. the nature of the quirk. See below:
> >
> > On Thu, Jan 04, 2024 at 01:59:02PM +0200, Heikki Krogerus wrote:
> > > Hi Christian,
> > >
> > > On Wed, Jan 03, 2024 at 11:06:35AM +0100, Christian A. Ehrhardt wrote:
> > > > I have a DELL Latitude 5431 where typec only works somewhat.
> > > > After the first plug/unplug event the PPM seems to be stuck and
> > > > commands end with a timeout (GET_CONNECTOR_STATUS failed (-110)).
> > > >
> > > > This patch fixes it for me but according to my reading it is in
> > > > violation of the UCSI spec. On the other hand searching through
> > > > the net it appears that many DELL models seem to have timeout problems
> > > > with UCSI.
> > > >
> > > > Do we want some kind of quirk here? There does not seem to be a quirk
> > > > framework for this part of the code, yet. Or is it ok to just send the
> > > > additional ACK in all cases and hope that the PPM will do the right
> > > > thing?
> > >
> > > We can use DMI quirks. Something like the attached diff (not tested).
> > >
> > > thanks,
> > >
> > > --
> > > heikki
> >
> > > diff --git a/drivers/usb/typec/ucsi/ucsi_acpi.c b/drivers/usb/typec/ucsi/ucsi_acpi.c
> > > index 6bbf490ac401..7e8b1fcfa024 100644
> > > --- a/drivers/usb/typec/ucsi/ucsi_acpi.c
> > > +++ b/drivers/usb/typec/ucsi/ucsi_acpi.c
> > > @@ -113,18 +113,44 @@ ucsi_zenbook_read(struct ucsi *ucsi, unsigned int offset, void *val, size_t val_
> > > return 0;
> > > }
> > > -static const struct ucsi_operations ucsi_zenbook_ops = {
> > > - .read = ucsi_zenbook_read,
> > > - .sync_write = ucsi_acpi_sync_write,
> > > - .async_write = ucsi_acpi_async_write
> > > -};
> > > +static int ucsi_dell_sync_write(struct ucsi *ucsi, unsigned int offset,
> > > + const void *val, size_t val_len)
> > > +{
> > > + u64 ctrl = *(u64 *)val;
> > > + int ret;
> > > +
> > > + ret = ucsi_acpi_sync_write(ucsi, offset, val, val_len);
> > > + if (ret && (ctrl & (UCSI_ACK_CC_CI | UCSI_ACK_CONNECTOR_CHANGE))) {
> > > + ctrl= UCSI_ACK_CC_CI | UCSI_ACK_COMMAND_COMPLETE;
> > > +
> > > + dev_dbg(ucsi->dev->parent, "%s: ACK failed\n", __func__);
> > > + ret = ucsi_acpi_sync_write(ucsi, UCSI_CONTROL, &ctrl, sizeof(ctrl));
> > > + }
> >
> > Unfortunately, this has the logic reversed. The quirk (i.e. the
> > additional UCSI_ACK_COMMAND_COMPLETE) is required after a _successful_
> > UCSI_ACK_CONNECTOR_CHANGE. Otherwise, _subsequent_ commands will timeout
> > (usually the next GET_CONNECTOR_CHANGE).
> >
> > This means the quirk must be applied _before_ we detect any failure.
> > Consequently, the quirk has the potential to break working systems.
> >
> > Sorry, if that wasn't clear from my original mail. Please let me know
> > if this changes how you want the quirks handled.
> >
> > Thanks Christian
> >
>
> For the problematic scenario have you tried to play with it a bit to see if
> it's too short of a timeout (raise timeout) or to output the response bits
> to see if anything else surprising is sent?

It is not a problem with the timeout. Waiting forever in this case
doesn't help. IMHO this is actually a bug in the PPM, i.e. in Dell's
bios.

Sending an ack after the timeout fixes things, though.

> Does it always fail on the same command, or does it happen to a bunch of
> them?

It always fails on the first command after UCSI_ACK_CC_CI for a
connector change. However, there might be no such command if the
next event is a notification.

I did play around with it a bit more and came up with a way to
probe for the issue:

https://lore.kernel.orgorg/all/20240116224041.220740-1-lk@xxxxxxx/

regards Christian