Re: [PATCH] netfilter: use per-cpu spinlock and RCU (v5)

From: Patrick McHardy
Date: Thu Apr 16 2009 - 09:53:38 EST


Eric Dumazet wrote:
Stephen Hemminger a écrit :
This is an alternative version of ip/ip6/arp tables locking using
per-cpu locks. This avoids the overhead of synchronize_net() during
update but still removes the expensive rwlock in earlier versions.

The idea for this came from an earlier version done by Eric Dumazet.
Locking is done per-cpu, the fast path locks on the current cpu
and updates counters. The slow case involves acquiring the locks on
all cpu's. This version uses RCU for the table->base reference
but per-cpu-lock for counters.

This version is a regression over 2.6.2[0-9], because of two points

1) Much more atomic ops :

Because of additional

+ spin_lock(&__get_cpu_var(ip_tables_lock));
ADD_COUNTER(e->counters, ntohs(ip->tot_len), 1);
+ spin_unlock(&__get_cpu_var(ip_tables_lock));

added on each counter updates.

On many setups, each packet coming in or out of the machine has
to update between 2 to 20 rule counters. So to avoid *one* atomic ops
of read_unlock(), this v4 version adds 2 to 20 atomic ops...

I agree, this seems to be a step backwards.

I still not see the problem between the previous version (2.6.2[0-8]) that had a central
rwlock, that hurted performance on SMP because of cache line ping pong, and the solution
having one rwlock per cpu.

We wanted to reduce the cache line ping pong first. This *is* the hurting point,
by an order of magnitude.

Dave doesn't seem to like the rwlock approach. I don't see a way to
do anything asynchronously like call_rcu() to improve this, so to
bring up one of Stephens suggestions again:

> * use on_each_cpu() somehow to do grace periood?

We could use this to replace the counters, presuming it is
indeed faster than waiting for a RCU grace period.

2) Second problem : Potential OOM

About freeing old rules with call_rcu() and/or schedule_work(), this is going
to OOM pretty fast on small appliances with basic firewall setups loading
rules one by one, as done by original topic reporter.

We had reports from guys using linux with 4MB of available ram (French provider free.fr on
their applicance box), and we had to use SLAB_DESTROY_BY_RCU thing on conntrack
to avoid OOM for their setups. We dont want to use call_rcu() and queue 100 or 200 vfree().

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/