Re: [RESEND] [PATCH 1/1] can: m_can: Control tx and rx flow to avoid communication stall

From: Marc Kleine-Budde
Date: Wed Jul 05 2023 - 08:53:56 EST


On 23.06.2023 14:29:20, Kumari Pallavi wrote:
> In bi-directional CAN transfer using M_CAN IP, with
> the frame gap being set to '0', it leads to Protocol
> error in Arbitration phase resulting in communication
> stall.

Is there a (public) erratum describing the problem?

> Discussed with Bosch M_CAN IP team and the stall issue
> can only be overcome by controlling the tx and rx
> packets flow as done by the patch.

Please elaborate the suggested workaround.

> Rx packets would also be serviced when there is a tx
> interrupt. The solution has been tested extensively for
> more than 10 days, and no issues has been observed.

Can you describe how your patch implements the workaround?

| Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
| instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
| to do frotz", as if you are giving orders to the codebase to change
| its behaviour.

See: https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#describe-your-changes

> Setup that is used to reproduce the issue:
>
> +---------------------+ +----------------------+
> |Intel ElkhartLake | |Intel ElkhartLake |
> | +--------+ | | +--------+ |
> | |m_can 0 | |<=======>| |m_can 0 | |
> | +--------+ | | +--------+ |
> +---------------------+ +----------------------+
>
> Steps to be run on the two Elkhartlake HW:
>
> 1. ip link set can0 type can bitrate 1000000
> 2. ip link set can0 txqueuelen 2048
> 3. ip link set can0 up
> 4. cangen -g 0 can0
> 5. candump can0
>
> cangen -g 0 can0 & candump can0 commands are used for transmit and
> receive on both the m_can HW simultaneously where -g is the frame gap
> between two frames.
>
> Signed-off-by: Kumari Pallavi <kumari.pallavi@xxxxxxxxx>
> Signed-off-by: Srikanth Thokala <srikanth.thokala@xxxxxxxxx>
> ---
> drivers/net/can/m_can/m_can.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index a5003435802b..94aa0ba89202 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -1118,7 +1118,7 @@ static irqreturn_t m_can_isr(int irq, void *dev_id)
> /* New TX FIFO Element arrived */
> if (m_can_echo_tx_event(dev) != 0)
> goto out_fail;
> -

nitpick: please keep that empty line.

> + m_can_write(cdev, M_CAN_IE, IR_ALL_INT & ~(IR_TEFN));

- What's the purpose of "()" around IR_TEFN?
- You enable a lot of interrupts that have not been enabled before. Have
a look at m_can_chip_config() how the original register value for
M_CAN_IE is calculated.

> if (netif_queue_stopped(dev) &&
> !m_can_tx_fifo_full(cdev))
> netif_wake_queue(dev);
> @@ -1787,6 +1787,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
> }
> } else {
> cdev->tx_skb = skb;
> + m_can_write(cdev, M_CAN_IE, IR_ALL_INT & (IR_TEFN));

- What's the purpose of "()" around IR_TEFN?
- "IR_ALL_INT & (IR_TEFN)" is equal to IR_TEFN, isn't it?
- This basically disables all other interrupts, is this what you want to
do?
- What happens if the bus is busy with high prio CAN frames and you want
to send low prio ones? You will not get any RX-IRQ, this doesn't look
correct to me.

> return m_can_tx_handler(cdev);
> }
>
> --
> 2.17.1
>
>

Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |

Attachment: signature.asc
Description: PGP signature