Re: Andrea's pending 2.2.x patches

Andrea Arcangeli (andrea@suse.de)
Mon, 11 Oct 1999 18:18:45 +0200 (CEST)


On Mon, 11 Oct 1999, David S. Miller wrote:

> Date: Mon, 11 Oct 1999 15:35:36 +0100 (BST)
> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
>
> > o clear-backlog-2 -> fix a SMP race in the common
> > device backlog (where the irq
> > queue packets that will be
> > processed from net_bh()).
>
> Network fixes go via DaveM.
>
>And I'm not accepting this one until Andrea shows me why ANK's clever
>trick there, which is _specifically_ to avoid grabbing the lock in the
>top-level list traversal code, does not work correctly.

I explained this in detail some email ago. Maybe you missed my reply so I
quote my reply here:

On Mon, 4 Oct 1999, Andrea Arcangeli wrote:
>
>Date: Mon, 4 Oct 1999 00:34:15 +0200 (CEST)
>From: Andrea Arcangeli <andrea@suse.de>
>To: Andi Kleen <ak@muc.de>
>Cc: linux-kernel@vger.rutgers.edu
>Subject: Re: Network-related Oopses on 2.2.13pre14
>
>On 4 Oct 1999, Andi Kleen wrote:
>
>>I don't expect it to fix any problems. Interrupts never remove packets
>>from the backlog, only add them. dev_clear_backlog is protected against other
>
>The fact that it's only adding elements is just too much to be safe.
>
>We are walking a list that can change under us.
>
>If you want to make the algorithm SMP safe then you should insert the
>proper mb()/wmb()/rmb() calls both in dev_clear_backlog and in
>skb_queue_tail.
>
>Suppose you have this queue:
>
> --> head -> elem1 -> elem2 --
> | |
> -----------------------------
>
>Suppose you want to add the newsk to the end of queue as netif_rx does.
>
>Suppose dev_clear_backlog is walking the queue on the other CPU while it
>see the memory this way:
>
> head -> elem -> elem -> newsk -> NULL
>
>As you are not doing ordered read in dev_clear_backlog and ordered writes
>in skb_queue_tail you have no garantee you won't find the list corrupted.
>Actually skb_queue_tail don't even enforce ordering at the assembler
>level.
>
>The effect of skb_queue_tail() could be reordered this way due memory
>speculative operations in SMP.
>
>This is the core of the stock skb_queue_tail.
>
>{
> struct sk_buff *prev, *next;
>
> newsk->list = list;
> list->qlen++;
> next = (struct sk_buff *)list;
> prev = next->prev;
> newsk->next = next;
> newsk->prev = prev;
> next->prev = newsk;
> prev->next = newsk;
>}
>
>It may be easily reodered or at the assembler level or at the CPU level
>this way:
>
>extern __inline__ void __skb_queue_tail(struct sk_buff_head *list, struct
>sk_buf
>{
> struct sk_buff *prev, *next;
>
> newsk->list = list;
> list->qlen++;
> next = (struct sk_buff *)list;
> prev = next->prev;
> next->prev = newsk;
> prev->next = newsk;
>
> /* HERE dev_clear_backlog is reading the last entry of the list
> that is newsk and then he is reading newsk->next before we have
> a chance to update newsk->next to fix the queue
> and so it will Oops (if newsk was not weird, if it was weird
> you'll have some more fun). */
>
> newsk->next = next;
> newsk->prev = prev;
>}
>
>Andrea
>
>

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/