Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

From: Matthias Kaehlcke
Date: Thu Mar 07 2019 - 13:20:14 EST


Hi Balakrishna,

On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi wrote:
> hi Matthias,
>
> On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> > Firmware download to the WCN3990 often fails with a 'TLV response size
> > mismatch' error:
> >
> > [ 133.064659] Bluetooth: hci0: setting up wcn3990
> > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
> >
> > This is caused by a vendor event that corresponds to an earlier command
> > to change the baudrate. The event is not processed in the context of the
> > baudrate change and later interpreted as response to the firmware
> > download command (which is also a vendor command), but the driver
> > detects
> > that the event doesn't have the expected amount of associated data.
> >
> > More details:
> >
> > For the WCN3990 the vendor command for a baudrate change isn't sent as
> > synchronous HCI command, because the controller sends the corresponding
> > vendor event with the new baudrate. The event is received and decoded
> > after the baudrate change of the host port.
> >
> > Identify the 'unused' event when it is received and don't add it to
> > the queue of RX frames.
> >
> > Signed-off-by: Matthias Kaehlcke <mka@xxxxxxxxxxxx>
> > ---
>
> ...
>
> Can you test by reverting this change "94d6671473924".

The issue is still reproducible.

> We need at least 15ms minimum delay for the soc to change its baud rate and
> respond to the with command complete event.

The baudrate change has clearly been successful when the problem is
observed, since the host receives the vendor event with the new
baudrate.

We could increase the delay to 15ms just in case, though in my stress
tests with 1000s of iterations I didn't observe any problem with the
baudrate change itself. However if this patch gets landed the delay
could be removed completely, since we are waiting until the vendor
event is received with the new baudrate.

> In my previous stress test we have not observed any issues.

Which patches did you use for the stress test?

Did your stack include '78e8fa2972e55 Bluetooth: hci_qca: Deassert RTS
while baudrate change command' (or equivalent)? It is mainly this
patch which *exposes* the problem, before it (and with the long delay)
the controller response was received with the old baudrate and could
not be decoded ("frame reassembly errors").

Also make sure to test without 'Bluetooth: btqca: inject command
complete event during fw download'
(https://lore.kernel.org/patchwork/patch/1027955/), since it affects
timing and injects extra events. It might seem that it fixes the
problem, but it just papers over it.

I have my suspicions that the mega-timeout of 300ms for ROME was
introduced to 'fix' a related issue ...

Cheers

Matthias