One way of making the light weight counters race free for x86_64 and
i386 is to use local_t. With that those two platforms are fine.
However, the others fall back to atomic operations.
Maybe we could deal with that on per platform basis? Some platforms may want to switch the local_t implementation away from atomics to regular integers if preemption is not configured. Most commercial Linux distros ship with preempt off. So this would preserve the speed of light weight counters, while holding off the worst races. But it still could allow
interrupts while the counter is being incremented and so it would not be race free. This would no longer allow the use of local_t for refcounts but I think those uses are not that performance critical
and we may just switch to atomic. Maybe I am just off in fantasyland.
Andi?
Another thing to investigate (at least on ia64) is how significant the impact of a fetchadd instruction is if none of the results are used. Maybe it is tolerable and we can stay with the existing implementation.
On IA64 we we would trade an interrupt disable/ load / add / store /interrupt enable against one fetchadd instruction with this patch. How bad/good a trade is that?