Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

From: Yajun Deng
Date: Sat Oct 07 2023 - 02:37:21 EST



On 2023/10/7 13:29, Eric Dumazet wrote:
On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@xxxxxxxxx> wrote:
Although there is a kfree_skb_reason() helper function that can be used to
find the reason why this skb is dropped, but most callers didn't increase
one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.

...

+
+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+ /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+ struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+ unsigned long *field;
+
+ if (unlikely(!p))
+ p = netdev_core_stats_alloc(dev);
+
+ if (p) {
+ field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+ WRITE_ONCE(*field, READ_ONCE(*field) + 1);
This is broken...

As I explained earlier, dev_core_stats_xxxx(dev) can be called from
many different contexts:

1) process contexts, where preemption and migration are allowed.
2) interrupt contexts.

Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.

I _think_ I already gave you how to deal with this ?


Yes, I replied in v6.

https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@xxxxxxxxx/

Please try instead:

+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+ /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+ struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+ unsigned long __percpu *field;
+
+ if (unlikely(!p)) {
+ p = netdev_core_stats_alloc(dev);
+ if (!p)
+ return;
+ }
+ field = (__force unsigned long __percpu *)((__force void *)p + offset);
+ this_cpu_inc(*field);
+}


This wouldn't trace anything even the rx_dropped is in increasing. It needs to add an extra operation, such as:

pr_info, ++, trace_xxx... . I don't know what's going on.

If this is adopted, I need to send two patches, one is  introduce netdev_core_stats_inc, another is add an tracepoint , like:


+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+ /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+ struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+ unsigned long __percpu *field;
+
+ if (unlikely(!p)) {
+ p = netdev_core_stats_alloc(dev);
+ if (!p)
+ return;
+ }
+ trace_netdev_core_stats_inc(dev, offset);
+ field = (__force unsigned long __percpu *)((__force void *)p + offset);
+ this_cpu_inc(*field);
+}


--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h

+TRACE_EVENT(netdev_core_stats_inc,
+
+       TP_PROTO(struct net_device *dev,
+                u32 offset),
+
+       TP_ARGS(dev, offset),
+
+       TP_STRUCT__entry(
+               __string(       name,           dev->name )
+               __string(       driver, netdev_drivername(dev))
+               __field(        u32,            offset          )
+       ),
+
+       TP_fast_assign(
+               __assign_str(name, dev->name);
+               __assign_str(driver, netdev_drivername(dev));
+               __entry->offset = offset;
+       ),
+
+       TP_printk("dev=%s driver=%s offset=%u",
+               __get_str(name), __get_str(driver), __entry->offset)
+);


We can trace netdev_core_stats_inc by tracepoint or kprobe.