Re: [PATCH v2 1/2] usb: dwc3: Not set DWC3_EP_END_TRANSFER_PENDING in ep cmd fails

From: Jung Daehwan
Date: Thu Feb 17 2022 - 23:51:32 EST


On Tue, Feb 15, 2022 at 05:14:50PM +0000, Thinh Nguyen wrote:
> Jung Daehwan wrote:
> > Hi Thinh,
> >
> > On Mon, Feb 14, 2022 at 06:44:33PM +0000, Thinh Nguyen wrote:
> >> Hi,
> >>
> >> Daehwan Jung wrote:
> >>> It always sets DWC3_EP_END_TRANSFER_PENDING in dwc3_stop_active_transfer
> >>> even if dwc3_send_gadget_ep_cmd fails. It can cause some problems like
> >>
> >> How does it fail? Timed out?
> >
> > Yes, timed out.
> >>
> >>> skipping clear stall commmand or giveback from dequeue. We fix to set it
> >>> only when ep cmd success. Additionally, We clear DWC3_EP_TRANSFER_STARTED
> >>> for next trb to start transfer not update transfer.
> >>
> >> We shouldn't do this. Things will be out of sync. It may work for 1
> >> scenario, but it won't work for others.
> >>
> >> Please help me understand a few things:
> >>
> >> 1) What is the scenario that triggers this? Is it random?
> >>
> > ep cmd timeout occurs on dequeue request from user side. End Transfer command
> > would be sent in dwc3_stop_active transfer.
>
> At the high level, what's triggering the request dequeue? Is it from a
> disconnect, change of interface, or simply function driver protocol that
> changes it.
>
> What application was used to trigger this?
>
Removing function driver requests dequeue.
> >
> >> 2) Are there other traffics pending while issuing the End Transfer
> >> command? If so, what transfer type(s)?
> >>
> > I haven't checked it yet.
>
> Can you check?
>
It seems there's no pending traffic.
> >
> >> 3) Have you tried increasing the timeout?
> >>
> > No, I haven't.
>
> Can you try up to 10 seconds (just for experiment)
>
Yes, it's reproduced on 10 secs.
> >> BR,
> >> Thinh
> >>
> >
> > This issue occurs very rarely on customer. I only have restricted
> > information. That's why I've been trying to reproduce it.
>
> How did you test your fix if you can't reproduce it?
>

I sent patches to my client and he said this issue would not occur after
applying them. In fact, They are not totally equal with those we are
discussing. I've refactor-ed and pushed them here.

> >
> > Wesley may have run into same issue and you can see this issue in detail.
> > https://protect2.fireeye.com/v1/url?k=39bd3e50-66260725-39bcb51f-0cc47a31ce4e-bf000d5a697ead66&q=1&e=b04500ad-43d0-4ee7-b5f0-a7283d8fc0da&u=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D9d423b69-fc3fd32e-9d43b026-74fe485fff30-77a099b52659410d%26q%3D1%26e%3D20b4d9f5-2599-4f57-8b6a-7c4ec167d228%26u%3Dhttps%2A3A%2A2F%2A2Flore.kernel.org%2A2Flinux-usb%2A2F20220203080017.27339-1-quic_wcheng%2A40quicinc.com%2A2F__%3BJSUlJSUlJQ%21%21A4F2R9G_pg%21JWPzNLoO3BFX_IZCVzmHPtxq6frr_VFbSNNaxSQylunt1Y4EauTOefth2LCIcVEuTx8E%24
> >
>
> I can take a look, but please provide the tracepoints of the failure if
> you can reproduce it.
>

Thanks to client's help, I could reproduce it in my envrionment.

1. Trace

MtpServer-2484 [003] d..2 114.376829: dwc3_ep_queue: ep1out: req 0000000055561867 length 0/512 zsI ==> -115
MtpServer-2484 [003] d..2 114.376838: dwc3_prepare_trb: ep1out: trb 0000000074078a95 (E23:D22) buf 0000000954845600 size 512 ctrl 0000001d (HlCS:sc:normal)
MtpServer-2484 [003] d..2 114.376844: dwc3_prepare_trb: ep1out: trb 0000000095f503e7 (E24:D22) buf 00000000e1303000 size 512 ctrl 00000819 (HlcS:sC:normal)
MtpServer-2484 [003] d..2 114.376867: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
ksoftirqd/2-23 [002] d.s2 190.428752: dwc3_ep_dequeue: ep2out: req 00000000732c0f69 length 0/16384 zsI ==> -115
ksoftirqd/2-23 [002] dns3 190.447625: dwc3_gadget_ep_cmd: ep2out: cmd 'End Transfer' [40d08] params 00000000 00000000 00000000 --> status: Timed Out
ksoftirqd/2-23 [002] dns2 190.450596: dwc3_ep_dequeue: ep2out: req 00000000be3434d4 length 0/16384 zsI ==> -115
ksoftirqd/2-23 [002] dns2 190.450618: dwc3_ep_dequeue: ep2out: req 0000000033655f2e length 0/16384 zsI ==> -115
ksoftirqd/2-23 [002] dns2 190.450625: dwc3_ep_dequeue: ep2out: req 00000000dc0ff635 length 0/16384 zsI ==> -115
ksoftirqd/2-23 [002] dns2 190.450633: dwc3_ep_dequeue: ep2out: req 0000000023ffb8e8 length 0/16384 zsI ==> -115
ksoftirqd/2-23 [002] dns2 190.450641: dwc3_ep_dequeue: ep2out: req 000000005fab61aa length 0/16384 zsI ==> -115
ksoftirqd/2-23 [002] dns2 190.450648: dwc3_ep_dequeue: ep2out: req 000000001f618cd8 length 0/16384 zsI ==> -115
ksoftirqd/2-23 [002] dns2 190.450654: dwc3_ep_dequeue: ep2out: req 00000000da0ea1b1 length 0/16384 zsI ==> -115
##### CPU 6 buffer started ####
HwBinder:636_1-658 [006] d..1 190.735324: dwc3_gadget_ep_disable: ep0out: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:>
HwBinder:636_1-658 [006] d..1 190.735327: dwc3_gadget_ep_disable: ep0in: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:<
HwBinder:636_1-658 [003] dn.1 214.206652: dwc3_gadget_ep_disable: ep0out: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:>
HwBinder:636_1-658 [003] dn.1 214.206677: dwc3_gadget_ep_disable: ep0in: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:<

2. Kernel log

[ 190.447711]I[2: ksoftirqd/2: 23] ------------[ cut here ]------------
[ 190.447777]I[2: ksoftirqd/2: 23] WARNING: CPU: 2 PID: 23 at drivers/usb/dwc3/gadget.c:3632 dwc3_gadget_ep_dequeue+0x228/0x234
[ 190.449587]I[2: ksoftirqd/2: 23] pc : dwc3_gadget_ep_dequeue+0x228/0x234
[ 190.449610]I[2: ksoftirqd/2: 23] lr : dwc3_gadget_ep_dequeue+0x124/0x234
[ 190.449629]I[2: ksoftirqd/2: 23] sp : ffffffc01294bad0
[ 190.449647]I[2: ksoftirqd/2: 23] x29: ffffffc01294baf0 x28: 0000000000000000
[ 190.449679]I[2: ksoftirqd/2: 23] x27: ffffffc0121e4000 x26: ffffff88208c9dc0
[ 190.449711]I[2: ksoftirqd/2: 23] x25: ffffff893dd5c570 x24: ffffff893dd22080
[ 190.449742]I[2: ksoftirqd/2: 23] x23: ffffff88a4f8a168 x22: ffffff88a4f8a100
[ 190.449773]I[2: ksoftirqd/2: 23] x21: ffffff893dd5c500 x20: 0000000000000080
[ 190.449803]I[2: ksoftirqd/2: 23] x19: ffffff893dd22238 x18: ffffffc01291d068
[ 190.449834]I[2: ksoftirqd/2: 23] x17: 0000000000004000 x16: ffffff88bb0cc000
[ 190.449864]I[2: ksoftirqd/2: 23] x15: 0000000000000200 x14: ffffff88208d4d00
[ 190.449895]I[2: ksoftirqd/2: 23] x13: ffffff88208d4d00 x12: 00000000000001b5
[ 190.449926]I[2: ksoftirqd/2: 23] x11: ffffff88208c9dc0 x10: 0000000100000103
[ 190.449958]I[2: ksoftirqd/2: 23] x9 : ffffff88208c9dc0 x8 : 0000000000000008
[ 190.449989]I[2: ksoftirqd/2: 23] x7 : fefefefefefefefe x6 : 0000000000008080
[ 190.450020]I[2: ksoftirqd/2: 23] x5 : 0000000000000000 x4 : 0000000000000103
[ 190.450050]I[2: ksoftirqd/2: 23] x3 : 0000000000000080 x2 : ffffff88208ac900
[ 190.450081]I[2: ksoftirqd/2: 23] x1 : ffffff88208ac900 x0 : 00000000ffffff92
[ 190.450120]I[2: ksoftirqd/2: 23] Call trace:
[ 190.450153]I[2: ksoftirqd/2: 23] dwc3_gadget_ep_dequeue+0x228/0x234
[ 190.450187]I[2: ksoftirqd/2: 23] usb_ep_dequeue+0x4c/0xa0
[ 190.450219]I[2: ksoftirqd/2: 23] ffs_aio_cancel+0x4c/0x84
[ 190.450247]I[2: ksoftirqd/2: 23] free_ioctx_users+0x80/0x110
[ 190.450281]I[2: ksoftirqd/2: 23] percpu_ref_put_many+0xf0/0x164
[ 190.450310]I[2: ksoftirqd/2: 23] percpu_ref_switch_to_atomic_rcu+0x13c/0x1dc
[ 190.450339]I[2: ksoftirqd/2: 23] rcu_do_batch+0x208/0x5b4
[ 190.450360]I[2: ksoftirqd/2: 23] rcu_core+0x22c/0x43c
[ 190.450384]I[2: ksoftirqd/2: 23] rcu_core_si+0x14/0x24
[ 190.450409]I[2: ksoftirqd/2: 23] efi_header_end+0x134/0x350
[ 190.450437]I[2: ksoftirqd/2: 23] run_ksoftirqd+0x34/0x74

The point is there would be some requests pending without giveback-ed.
They're from removing function driver(ffs_aio_cancel). It could cause
the function driver would wait for dequeue requests completed.

Best Regards,
Jung Daehwan

> Thanks,
> Thinh