Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music

From: Guenter Roeck
Date: Thu Feb 02 2023 - 13:41:15 EST


On 2/2/23 09:04, Mark Brown wrote:
On Thu, Feb 02, 2023 at 07:51:01AM -0800, Guenter Roeck wrote:
On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:

I'll see if I can implement a downstream fix.

If you implement something I don't see a reason not to post it upstream.

I had a look into the code, and concluded that it is too complex for anyone
who doesn't know it to find a proper fix. For example, for an outsider it

It's definitely unclear, there's a datasheet at [1] which does appear to
explicitly call for a 512ms delay though (see figure 20 on page 50). It
does look like it should only be applied in the case where an inserted
jack is detected (ie, when identifying an accessory or button press) and
not when removal is detected though.


The datasheet doesn't really suggest that a delay shall be applied using
msleep (ie in the code). The chip presumably debounces internally (see
jackdet_debounce and jackdet_rem_deb), and there is also jack_detect_rate
to configure the detection rate. The table seems to suggest (to me) that
there is an e_jack_insertion event, which would then be followed 64-512 ms
later with an e_jack_detect_complete event.

Whatever is done in software is on top of that, or at least that is my
understanding, and not explained by anything in the datasheet.

Given that the chip itself supports debouncing internally, it is not clear
to me what the delay is actually supposed to accomplish. Soft debounce
on top of chip debounce ? I don't see that explained anywhere, though of
course I might be missing it.

is not conceivable (or explained) why the ground switch is enabled only
to be disabled immediately afterwards if a jack was removed.

It smells like there's a power benefit to leaving it disabled when
unplugged (which seems plausible), and possibly like the detection is
more stable with the ground switch enabled. The ground switch is not
documented AFAICT (it's in register 0xfb which isn't named and doesn't
appear to appear in the datsheet from a quick search). The code is
leaving the switch enabled so long as an accessory is plugged.


I understand. What I don't understand is that it is always enabled
in the interrupt handler, no matter if a jack was inserted or not,
only to be disabled immediately if the jack was disabled or after
insertion detection work is complete.

Overall it is not clear what the impact of enabling ground switch
actually is. What is really odd is that the original code only enabled
ground switch once during initialization and disabled it either
after a disconnect or after insertion detection was complete,
but never re-enabled it. Now it is briefly enabled in the interrupt
handler, but only after sleeping.

This is now the top crash reason on affected Chromebooks (so far I
identified Asus C424, HP SeaStar, and HP StingRay) with this patch
applied. I am inclined to revert it from all ChromeOS kernel branches.
At least for us the cure for the problem is much worse than the problem
itself.

Are you saying this is actually crashing, or just that you're getting
warnings about threads being blocked for too long (that was what was
posted earlier in the thread)? The only things I can see that look like

ChromeOS is configured to crash after stalled threads are detected (ie
after 120 seconds), so this is actually causing crashes.

they have the potential to actually lock up are the cancel_work_sync()
calls but they were unchanged and the backtrace you showed was showing
the thread in the msleep(). My guess would be that you've got systems
where there are very frequent jack detection events (potentiallly with
broken accessories, or possibly due to the ground switch putting things
into the wrong priority) and that the interrupt is firing again as soon
as the thread unmasks the primary interrupt which means it never
actually stops running.


That is what I strongly suspect is happening. I don't know why exactly
the interrupt is firing continuously, but the hang is always in msleep().
One possibility might be that the event is actually a disconnect event,
and that enabling and immediately disabling the ground switch causes
another interrupt, which is then handled immediately, causing the hang.

It's possible that reordering things so that the delay is only applied
if DA7219_JACK_INSERTION_STS_MASK is set would help, that'd need some
motion of the interrupt acking as well. That's probably a good idea in
general, it's what the datasheet seems to call for and would lead to
prompter removal detection. However if the issue is systems with broken
accessories constantly firing spurious button events they'd still be
seeing the delay.

My other guess would be that moving the delay that's been added to a
delayed work would avoid the warnings, though you might want to manually
keep the physical interrupt disabled while that's running which is fun.
Possibly also tuning down the delay given that as you say 500ms is
rather a long potential delay even in the context of jack debounces,
though if it is bad accessories then there's probably a bit of luck
involved in the original code not triggering issues and any debounce is
likely to cause fun, and like I say the datasheet does seem to say that
this is the appropriate delay.

You'd end up with something along the lines of

disable_irq();
schedule_delayed_work(delay, current_irq_code);


I am not sure if that would fix anything. The current code sleeps, then
enables the ground switch and does the rest of the detection. I'd somewhat
understand the code if it would enable the ground switch after an "insertion
detected" interrupt, then wait for some amount of time and handle the rest
of the detection after waiting (even though that should really be handled by
the "detection complete" interrupt). But that isn't what it does.
If we were to implement the above, I suspect the result would be that the
interrupt still happens all the time, and the only difference would be that
it would be "silenced" while the delayed work is waiting to be scheduled.
That doesn't really fix the problem, it only works around it. But, sure,
it would be much better than the current situation.

My "wild shot" fix would be to enable the ground switch after an insertion
event and to drop the software sleep entirely.

However, it is really impossible to know what the delay is for in the
first place. Looking into the code further, the sleep time actually matches
the configured jack detection rate. I have no idea why it would make sense
to wait for a detection cycle after an event, then enable the ground switch
and actually handle the event (which by then probably reports that jack
detection is complete after an insertion). I really don't understand
the logic behind that.

Guenter

in the IRQ handler then call enable_irq() on the way out of the new
delayed_work. That would keep the same flow but not look like the task
is running which should avoid setting off the hung task alarm.

[1] https://www.renesas.com/us/en/document/dst/da7219-datasheet?r=1563341