Re: [PATCH net-next v1] net: Add skb user timestamp flag to distinguish between timestamps

From: Willem de Bruijn
Date: Fri Feb 16 2024 - 13:11:24 EST


Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
>
> Signed-off-by: Abhishek Chauhan <quic_abchauha@xxxxxxxxxxx>
> ---
> Note:- I am a little skeptical of using bool inside the skbuff
> structure as no one today has used bool so far in the struct.
> (Expecting some comments from upstream for sure)
>
> I am also touching the heart of sk buff so i hope this is reviewed
> thoroughly. I have crossed checked multiple times on all the ipv4
> /ipv6 paths where userspace timestamp is populated. I tried as much
> as possible to cover all the references and made sure i put my changes
> in place.
>
> Bug description:- If the physical network interface is bridged the
> etf packets are dropped since bridge driver before forwarding the packet
> is setting the userspace timestamp to 0.
>
> Bridge driver call stack
>
> [ 157.120189] now is set to 1706054553072734733
> [ 157.120194] tx time from SKB is 0 <== SKB when reaches the etf qdisc is 0
> [ 157.120195] q->last time is 0
> [ 157.120197] CPU: 3 PID: 9206 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> [ 157.120201] Hardware name: Qualcomm SA8775P Ride (DT)
> [ 157.120202] Call trace:
> [ 157.120203] dump_backtrace+0xb0/0x130
> [ 157.120212] show_stack+0x1c/0x30
> [ 157.120215] dump_stack_lvl+0x74/0x8c
> [ 157.120220] dump_stack+0x14/0x24
> [ 157.120223] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> [ 157.120230] dev_qdisc_enqueue+0x2c/0x110
> [ 157.120234] __dev_xmit_skb+0x114/0x644
> [ 157.120236] __dev_queue_xmit+0x31c/0x774
> [ 157.120238] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> [ 157.120253] br_forward_finish+0xdc/0xec [bridge] <== This function is culprit as its making the tstamp as 0
> [root@ecbldauto-lvarm04-lnx ~]# [ 157.120263] __br_forward+0xd8/0x210 [bridge]
> [ 157.120272] br_forward+0x12c/0x150 [bridge]
> [ 157.120281] br_dev_xmit+0x288/0x49c [bridge]
> [ 157.120290] dev_hard_start_xmit+0xe4/0x2b4
> [ 157.120292] __dev_queue_xmit+0x6ac/0x774
> [ 157.120294] neigh_resolve_output+0x128/0x1ec
> [ 157.120297] ip_finish_output2+0x184/0x54c
> [ 157.120300] __ip_finish_output+0xa4/0x19c
> [ 157.120302] ip_finish_output+0x38/0xf0
> [ 157.120303] ip_output+0x13c/0x1f4
> [ 157.120305] ip_send_skb+0x54/0x10c
> [ 157.120307] udp_send_skb+0x128/0x394
> [ 157.120310] udp_sendmsg+0x7e8/0xa6c
> [ 157.120311] inet_sendmsg+0x48/0x70
> [ 157.120313] sock_sendmsg+0x54/0x60
> [ 157.120315] ____sys_sendmsg+0x1f8/0x254
> [ 157.120316] ___sys_sendmsg+0x84/0xcc
> [ 157.120318] __sys_sendmsg+0x60/0xb0
> [ 157.120319] __arm64_sys_sendmsg+0x28/0x30
> [ 157.120320] invoke_syscall.constprop.0+0x7c/0xd0
> [ 157.120323] el0_svc_common.constprop.0+0x140/0x150
> [ 157.120325] do_el0_svc+0x38/0xa0
> [ 157.120327] el0_svc+0x38/0x1d0
> [ 157.120329] el0t_64_sync_handler+0xb4/0x130
> [ 157.120330] el0t_64_sync+0x17c/0x180
>
> After my changes:-
> [ 2215.130148] now is set to 1706056610952501031
> [ 2215.130154] tx time from SKB is 1706056610953467393 <== Time is forwarded to etf correctly
> [ 2215.130155] q->last time is 1706056591423364609
> [ 2215.130158] CPU: 1 PID: 108166 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> [ 2215.130162] Hardware name: Qualcomm SA8775P Ride (DT) [ 2215.130163] Call trace:
> [ 2215.130164] dump_backtrace+0xb0/0x130
> [ 2215.130172] show_stack+0x1c/0x30 [root@ecbldauto-lvarm04-lnx ~]#
> [ 2215.130175] dump_stack_lvl+0x74/0x8c [ 2215.130181] dump_stack+0x14/0x24
> [ 2215.130184] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> [ 2215.130191] dev_qdisc_enqueue+0x2c/0x110
> [ 2215.130197] __dev_xmit_skb+0x114/0x644
> [ 2215.130200] __dev_queue_xmit+0x31c/0x774
> [ 2215.130202] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> [ 2215.130217] br_forward_finish+0xe4/0xf0 [bridge]
> [ 2215.130226] __br_forward+0xd8/0x20c [bridge]
> [ 2215.130235] br_forward+0x12c/0x150 [bridge]
> [ 2215.130243] br_dev_xmit+0x288/0x49c [bridge]
> [ 2215.130252] dev_hard_start_xmit+0xe4/0x2b4
> [ 2215.130254] __dev_queue_xmit+0x6ac/0x774
> [ 2215.130257] neigh_hh_output+0xcc/0x140
> [ 2215.130260] ip_finish_output2+0x300/0x54c
> [ 2215.130262] __ip_finish_output+0xa4/0x19c
> [ 2215.130263] ip_finish_output+0x38/0xf0
> [ 2215.130265] ip_output+0x13c/0x1f4
> [ 2215.130267] ip_send_skb+0x54/0x110
> [ 2215.130269] udp_send_skb+0x128/0x394
> [ 2215.130271] udp_sendmsg+0x7e8/0xa6c
> [ 2215.130272] inet_sendmsg+0x48/0x70
> [ 2215.130275] sock_sendmsg+0x54/0x60
> [ 2215.130277] ____sys_sendmsg+0x1f8/0x254
> [ 2215.130278] ___sys_sendmsg+0x84/0xcc
> [ 2215.130279] __sys_sendmsg+0x60/0xb0
> [ 2215.130281] __arm64_sys_sendmsg+0x28/0x30
> [ 2215.130282] invoke_syscall.constprop.0+0x7c/0xd0
> [ 2215.130285] el0_svc_common.constprop.0+0x140/0x150
> [ 2215.130287] do_el0_svc+0x38/0xa0
> [ 2215.130289] el0_svc+0x38/0x1d0
> [ 2215.130291] el0t_64_sync_handler+0xb4/0x130
> [ 2215.130292] el0t_64_sync+0x17c/0x180
>
>
> include/linux/skbuff.h | 13 +++++++++++++
> include/net/inet_sock.h | 1 +
> include/net/sock.h | 1 +
> net/core/sock.c | 1 +
> net/ipv4/ip_output.c | 3 +++
> net/ipv4/raw.c | 1 +
> net/ipv6/ip6_output.c | 2 ++
> net/ipv6/raw.c | 1 +
> net/packet/af_packet.c | 3 +++
> 9 files changed, 26 insertions(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2dde34c29203..b098b7d30b56 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -744,6 +744,7 @@ typedef unsigned char *sk_buff_data_t;
> * @tstamp: Time we arrived/left
> * @skb_mstamp_ns: (aka @tstamp) earliest departure time; start point
> * for retransmit timer
> + * @user_delivery_time: states that timestamp was populated from userspace
> * @rbnode: RB tree node, alternative to next/prev for netem/tcp
> * @list: queue head
> * @ll_node: anchor in an llist (eg socket defer_list)
> @@ -879,6 +880,8 @@ struct sk_buff {
> ktime_t tstamp;
> u64 skb_mstamp_ns; /* earliest departure time */
> };
> + /* States that time is from userspace */
> + bool user_delivery_time;
> /*
> * This is the control buffer. It is free to use for every
> * layer. Please put your private variables there. If you
> @@ -4208,6 +4211,16 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
> if (skb->mono_delivery_time)
> return;
>
> + /* When userspace timestamp packets are forwarded via bridge
> + * the br_forward_finish clears the tstamp and the tstamp
> + * from the userspace is lost. Hence the check for user
> + * delivery time. With the below check now tc-etf qdisc will
> + * not end up dropping the packets if the packet is forwarded via
> + * bridge interface.
> + */
> + if (skb->user_delivery_time)
> + return;
> +
> skb->tstamp = 0;
> }
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index d94c242eb3ed..e7523545a493 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -175,6 +175,7 @@ struct inet_cork {
> __u16 gso_size;
> u64 transmit_time;
> u32 mark;
> + bool user_delivery_time;
> };

There's no need for a cork member, as by definition the fields in this
struct are coming from userspace.

There is a very high bar to adding new fields to sk_buff, because it
is used by many paths and would be enormous if stuck with fields for
every intersection between a pair of features.

The goal here is for the bridge to disambiguate earliest delivery time
timestamps from which? From those looped through ip forwarding? Why
does the bridge zero the tstamp field at all? That might help finding
a reasonable implementation.

We have run in the issue of labeling the value of skb->tstamp before.
With redirect and looping it is definitely subtle.