Scaling problem with a lot of AF_PACKET sockets on different interfaces

From: Vitaly V. Bursov
Date: Fri Jun 07 2013 - 08:06:46 EST


Hello,

I have a Linux router with a lot of interfaces (hundreds or
thousands of VLANs) and an application that creates AF_PACKET
socket per interface and bind()s sockets to interfaces.

Each socket has attached BPF filter too.

The problem is observed on linux-3.8.13, but as far I can see
from the source the latest version has alike behavior.

I noticed that box has strange performance problems with
most of the CPU time spent in __netif_receive_skb:
86.15% [k] __netif_receive_skb
1.41% [k] _raw_spin_lock
1.09% [k] fib_table_lookup
0.99% [k] local_bh_enable_ip

and this the assembly with the "hot spot":
â shr $0x8,%r15w
â and $0xf,%r15d
0.00 â shl $0x4,%r15
â add $0xffffffff8165ec80,%r15
â mov (%r15),%rax
0.09 â mov %rax,0x28(%rsp)
â mov 0x28(%rsp),%rbp
0.01 â sub $0x28,%rbp
â jmp 5c7
1.72 â5b0: mov 0x28(%rbp),%rax
0.05 â mov 0x18(%rsp),%rbx
0.00 â mov %rax,0x28(%rsp)
0.03 â mov 0x28(%rsp),%rbp
5.67 â sub $0x28,%rbp
1.71 â5c7: lea 0x28(%rbp),%rax
1.73 â cmp %r15,%rax
â je 640
1.74 â cmp %r14w,0x0(%rbp)
â jne 5b0
81.36 â mov 0x8(%rbp),%rax
2.74 â cmp %rax,%r8
â je 5eb
1.37 â cmp 0x20(%rbx),%rax
â je 5eb
1.39 â cmp %r13,%rax
â jne 5b0
0.04 â5eb: test %r12,%r12
0.04 â je 6f4
â mov 0xc0(%rbx),%eax
â mov 0xc8(%rbx),%rdx
â testb $0x8,0x1(%rdx,%rax,1)
â jne 6d5

This corresponds to:

net/core/dev.c:
type = skb->protocol;
list_for_each_entry_rcu(ptype,
&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
if (ptype->type == type &&
(ptype->dev == null_or_dev || ptype->dev == skb->dev ||
ptype->dev == orig_dev)) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}

Which works perfectly OK until there are a lot of AF_PACKET sockets, since
the socket adds a protocol to ptype list:

# cat /proc/net/ptype
Type Device Function
0800 eth2.1989 packet_rcv+0x0/0x400
0800 eth2.1987 packet_rcv+0x0/0x400
0800 eth2.1986 packet_rcv+0x0/0x400
0800 eth2.1990 packet_rcv+0x0/0x400
0800 eth2.1995 packet_rcv+0x0/0x400
0800 eth2.1997 packet_rcv+0x0/0x400
.......
0800 eth2.1004 packet_rcv+0x0/0x400
0800 ip_rcv+0x0/0x310
0011 llc_rcv+0x0/0x3a0
0004 llc_rcv+0x0/0x3a0
0806 arp_rcv+0x0/0x150

And this obviously results in a huge performance penalty.

ptype_all, by the looks, should be the same.

Probably one way to fix this it to perform interface name matching in
af_packet handler, but there could be other cases, other protocols.

Ideas are welcome :)

--
Thanks
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/