Re: [PATCH v2] usb: chipidea: local_irq_save/restore added for hw_ep_prime

From: Jeaho Hwang
Date: Fri Aug 20 2021 - 01:16:15 EST


2021년 8월 19일 (목) 오후 5:48, Sebastian Andrzej Siewior
<bigeasy@xxxxxxxxxxxxx>님이 작성:
>
> On 2021-08-19 08:50:27 [+0900], Jeaho Hwang wrote:
> > Without RT, udc_irq runs as a forced threaded irq handler, so it runs
> > without any interruption or preemption. NO similar case is found on
> > non-RT.
>
> I see only a devm_request_irq() so no force-threading here. Booting with
> threadirqs would not lead to the problem since commit
> 81e2073c175b8 ("genirq: Disable interrupts for force threaded handlers")
>

I was wrong. udc threaded irq handler allows twd interrupt even on
non-RT and with threaded irq.
I believed Chen's comment "The function hw_ep_prime is only called at
udc_irq which is registered as top-half irq handlers. Why the timer
interrupt is occurred when hw_ep_prime is executing?".
We have additional experiments and got the results like below. RNDIS
host was Windows.

RT, 1ms delay between first ENDPTSETUPSTAT read and priming : error
case occurred
RT, 1ms delay + irq_save : no error case occurred.
non-RT, threaded irq, 1ms delay : no error case occurred even twd
fires inside the function execution.

It doesn't seem to be a timing issue. But irq definitely affects
priming on the RT kernel. Do you RT experts have any idea about the
causes?
If isr_tr_complete_handler fails ep priming it calls _ep_set_halt and
goes an infinite loop in hw_ep_set_halt. It was an actual problem we
experienced.
So we protect irqs inside hw_ep_priming not to make error cases and
also add a timeout inside the hw_ep_set_halt loop for a walkaround.
The timeout patch is submitted to linux-usb.
( https://marc.info/?l=linux-usb&m=162918269024007&w=2 )

We withdrew this patch since we don't know if disabling irq is the
best solution to solve the problem and udc would work fine with
hw_ep_set_halt walkaround even though hw_ep_prime fails.
But we are still trying to find out the cause of this symptom so We'd
so appreciate it if RT or USB experts share some ideas or ways to
report somewhere. Xilinx doesn't provide any support without their
official kernel :(

Thanks for the discussion Sebastian.

Jeaho Hwang.

>
> > > If this function here is sensitive to timing (say the cpu_relax() loop
> > > gets interrupt for 1ms) then it has to be documented as such.
> >
> > The controller sets ENDPTSETUPSTAT register if the host sent a setup packet.
> > yes it is a timing problem. I will document that and resubmit again if
> > you agree that local_irq_save could help from the timing problem.
> >
> > Thanks for the advice.
>
> If it is really a timing issue in the function as you describe below
> then disabling interrupts would help and it is indeed an RT only issue.
>
> So you read OP_ENDPTSETUPSTAT, it is 0, all good.
> You write OP_ENDPTPRIME, wait for it to be cleared.
> Then you read OP_ENDPTSETUPSTAT again and if it is 0, all good.
>
> And the TWD interrupt could delay say the second read would read 1 and
> it is invalidated. Which looks odd.
> However, it is "okay" if the TWD interrupt happens after the second
> read? Even if the host sends a setup packet, nothing breaks?
> Do you have numbers on how long irq-off section is here? It seems to
> depend on how long the HW needs to clear the OP_ENDPTPRIME bits.
>
> Sebastian



--
황재호, Jay Hwang, linux team manager of RTst
010-7242-1593