RE: [PATCH] lost softirq, 2.6.24-rc7

From: Rowand, Frank
Date: Tue Jan 15 2008 - 21:48:30 EST



Steve,

You are totally correct. I used the wrong words when I said
"ksoftirqd thread runs". My apologies for very misleading wording.

I have updated the wording in-line below, in the original message to
indicate that it is softirq threads, in the ksoftirqd() function, not
the ksoftirqd thread.

-Frank

-----Original Message-----
From: Steven Rostedt [mailto:rostedt@xxxxxxxxxxx]
Sent: Tue 1/15/2008 4:39 PM
To: Rowand, Frank
Cc: linux-kernel@xxxxxxxxxxxxxxx; mingo@xxxxxxxxxx
Subject: Re: [PATCH] lost softirq, 2.6.24-rc7

On Tue, Jan 15, 2008 at 02:15:26PM -0800, Frank Rowand wrote:
> From: Frank Rowand <frank.rowand@xxxxxxxxxxx>
>
> (Ingo, there is a question for you after the description, just before the
> patch.)
>
> When running an interrupt and network intensive stress test with PREEMPT_RT
> enabled, the target system stopped processing received network packets.
> skbs from received packets were being queued by net_rx_action(), but the
> NET_RX_SOFTIRQ softirq was never running to remove the skbs from the queue.
> Since the target system root file system is NFS mounted, the system is now
> effectively hung.
>
> A pseudocode description of how this state was reached follows.
> Each level of indentation represents a function call from the previous line.
>
>
> ethernet driver irq handler receives packet
> netif_rx()
> queues skb (qlen == 1), raises NET_RX_SOFTIRQ
>
> on return from irq
> ___do_softirq() [ 1 ]
> Reset the pending bitmask

Frank,

This path should not be hit when running with PREEMPT_RT. The softirqs
are now all separate, and are not run in batch in ksoftirqd. In fact,
ksoftirqd should not be running at all with PREEMPT_RT.

-- Steve

> net_rx_action()
> dequeues skb (qlen == 0)
> jiffies incremented, so
> break out of processing
> and raise NET_RX_SOFTIRQ
> (but don't deassert NAPI_STATE_SCHED)
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ksoftirqd thread runs

^^^^^^^^^^^^^^^^^^^^^^ should have been:

the TIMER_SOFTIRQ and RCU_SOFTIRQ softirq threads, which are
both executing ksoftirqd() run

> process TIMER_SOFTIRQ
> process RCU_SOFTIRQ

> << ksoftirqd sleeps >>

^^^^^^^^^^^^^^^^^^^^^^^ should have been:
the softirq threads, executing in ksoftirqd(), sleep

> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> ___do_softirq() [ 2 ]
> Reset the pending bitmask
> finds NET_RX_SOFTIRQ raised but already running
> << ___do_softirq() [ 2 ] completes >>
>
> << ___do_softirq() [ 1 ] resumes >>
> the pending bitmask is empty, so NET_RX_SOFTIRQ is lost
>
>
>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/