Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

From: Brian Starkey
Date: Fri Nov 25 2016 - 08:14:22 EST


Hi,

On Wed, Nov 23, 2016 at 12:03:28PM -0800, Eric Dumazet wrote:
On Wed, Nov 23, 2016 at 10:21 AM, Brian Starkey <brian.starkey@xxxxxxx> wrote:

This patch didn't help.

I did get some new traces though - I've attached the diff for the
trace_printks I added.

Before 4cd13c21b207:
https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
After 4cd13c21b207:
https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8

It looks like the difference is that after 4cd13c21b207 the RX softirq
isn't running, and RX interrupts don't call softirq_raise anymore -
presumably because there's one pending, but I didn't have time to
track that down to a code-path.

Cheers,
-Brian


Hi Brian

Looks like netif_rx() drops the incoming packets then ?

Maybe netif_running() is not happy :(

Could you trace netif_rx() return value (NET_RX_SUCCESS or NET_RX_DROP)

Some packets are dropped, but not very many:

$ grep NET_RX_SUCCESS trace_netif_rx.txt | wc -l
14399
$ grep NET_RX_DROP trace_netif_rx.txt | wc -l
22

Without the ksoftirqd change there were zero NET_RX_DROPs.

-Brian


Thanks !