Re: 2.4.7 softirq incorrectness.

From: Maksim Krasnyanskiy (maxk@qualcomm.com)
Date: Thu Jul 26 2001 - 19:47:50 EST


>If there are lots of users of netif_rx outside bh or irq context I guess
>this is the simpler way is:
I'm not sure about a lot but there are certainly some. TUN/TAP driver for example.
It calls netif_rx from user context (syscall).

> > > after netif_rx.
> >
> > But why not to do just local_bh_disable(); netif_rx(); local_bh_enable()?
> > Is this not right?
>
>That is certainly right. However it is slower than just doing if (pending) do_softirq() after netif_rx().
Should we then create generic function (something like netif_rx_from_user) than will call do_softirq
after calling netif_rx ?

btw I think I just found another problem with softirqs. Actually with tasklets. May be you guys can help me with that.
The problem is that my tasklet is getting run on 2 cpu simultaneously and according to description in linux/interrupt.h
it shouldn't.
I have a simple tx_task. All it does is sending queued stuff to the device. Whenever I need to send something
I queue it and do tasklet_schedule(tx_task). Everything works just fine but on SMP machine I noticed that sometimes
data is sent in the wrong order. And the only reason why reordering could happen is if several tx_tasks are runing at the
same time. So I added a simple check in tasklet function to complain if it's already running.
Here is the simplified code example:

tasklet_init(&hdev->tx_task, hci_tx_task, (unsigned long) hdev);

static void hci_tx_task(unsigned long arg)
{
         struct hci_dev *hdev = (struct hci_dev *) arg;
         struct sk_buff *skb;

         static __u32 r = 0;

         if (test_and_set_bit(1, &r)) {
                 ERR("already running cpu %d", smp_processor_id());
                 return;
         }

         /* Send next queued packet */
         while ((skb = skb_dequeue(&hdev->raw_q)))
                 hci_send_frame(skb);

         clear_bit(1, &r);
}

tasklet_schedule(&hdev->tx_task);

Now everything is in order and I'm getting bunch of "already running" messages with different cpu_id.
(hci_tx_task is never called directly).

So, I looked at the tasklet_schedule and tasklet_action code. And it seems to me that tasklet can be scheduled on several
cpus at the same time. Here is scenario:
         tasklet X is running on cpu 1. tasklet_action clears STATE_SCHED bit and calls tasklet function.
         tasklet_schedule(taskletX) is called on cpu 2. Since STATE_SCHED is cleared tasklet X is scheduled.
         tasklet_action is called on cpu 2 (tasklet X functions is still running on cpu 1)
In that case tasklet_action on cpu 2 should hit a:
         if (!tasklet_trylock(t))
                 BUG();
because tasklet is still locked by cpu 1. For some reason it doesn't though (I don't see any "kernel bug" messages).
But my tx_task is run second time.

Comments ?

Max

Maksim Krasnyanskiy
Senior Kernel Engineer
Qualcomm Incorporated

maxk@qualcomm.com
http://bluez.sf.net
http://vtun.sf.net

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 31 2001 - 21:00:30 EST