Re: [BlueZ PATCH] Bluetooth: hci_core: Skip hci_cmd_work if hci_request is pending

From: Luiz Augusto von Dentz
Date: Fri Feb 16 2024 - 10:58:31 EST


Hi Hsin-chen,

On Thu, Feb 15, 2024 at 11:21 PM Hsin-chen Chuang <chharry@xxxxxxxxxx> wrote:
>
> +Some Googlers who would be interested in
>
> Hi Luiz,
>
> How about moving the hci_req-related data out from sent_cmd? This allows sending HCI commands while hci_req data would not be overwritten.

I have something like the following in the works:

https://gist.github.com/Vudentz/251275bb688fac32585f90ac0076c407

It is not stable yet, but I think we can get away with it since it
just means we can keep the pending request stored in the req_skb, that
said we might need to overhaul this design since it is not very clean
in my opinion.

> On Fri, Feb 16, 2024 at 5:37 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote:
>>
>> Hi Hsin-chen,
>>
>> On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@xxxxxxxxxxxx> wrote:
>> >
>> > hci_cmd_work overwrites the hdev->sent_cmd which contains the required
>> > info for a hci_request to work. In the real world, it's observed that
>> > a request from hci_le_ext_create_conn_sync could be interrupted by
>> > the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When
>> > it happends, hci_le_ext_create_conn_sync hangs until timeout; If the
>> > LE connection is triggered by MGMT, it freezes the whole MGMT interface.
>> >
>> > Signed-off-by: Hsin-chen Chuang <chharry@xxxxxxxxxxxx>
>> > ---
>> >
>> > net/bluetooth/hci_core.c | 7 +++++--
>> > 1 file changed, 5 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index 34c8dca2069f..e3706889976d 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work)
>> > BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
>> > atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
>> >
>> > - /* Send queued commands */
>> > - if (atomic_read(&hdev->cmd_cnt)) {
>> > + /* Send queued commands. Don't send the command when there is a pending
>> > + * hci_request because the request callbacks would be overwritten.
>> > + */
>> > + if (atomic_read(&hdev->cmd_cnt) &&
>> > + !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) {
>> > skb = skb_dequeue(&hdev->cmd_q);
>> > if (!skb)
>> > return;
>> > --
>> > 2.43.0.687.g38aa6559b0-goog
>>
>>
>> This seems to be causing some mgmt-tester failures:
>>
>> Pair Device - Sec Mode 3 Success 1 Timed out 22.753 seconds
>> Pair Device - Sec Mode 3 Reject 1 Timed out 22.533 seconds
>> Pair Device - Sec Mode 3 Reject 2 Timed out 22.526 seconds
>>
>> I think this is because we need to respond to an event with a command like:
>>
>> < HCI Command: Create Conn.. (0x01|0x0005) plen 13 #241 [hci0] 16:25:38699066
>> Address: 00:AA:01:01:00:00 (Intel Corporation)
>> Packet type: 0x0018
>> DM1 may be used
>> DH1 may be used
>> Page scan repetition mode: R2 (0x02)
>> Page scan mode: Mandatory (0x00)
>> Clock offset: 0x0000
>> Role switch: Allow peripheral (0x01)
>> > HCI Event: Command Status (0x0f) plen 4 #242 [hci0] 16:25:38701881
>> Create Connection (0x01|0x0005) ncmd 1
>> Status: Success (0x00)
>> > HCI Event: Link Key Request (0x17) plen 6 #243 [hci0] 16:25:38702375
>> Address: 00:AA:01:01:00:00 (Intel Corporation)
>>
>> But because Create Connection is pending we cannot respond to Link Key
>> Request, so it is actually a design problem if we cannot send commands
>> because something is pending so perhaps we need to redesign how we
>> store cmd_sent so we can have multiple outstanding commands rather
>> than just one.
>>
>> --
>> Luiz Augusto von Dentz
>
>
>
> --
> Best Regards,
> Hsin-chen



--
Luiz Augusto von Dentz