Re: [PATCH net-next v1] net: Add skb user timestamp flag to distinguish between timestamps

From: Willem de Bruijn
Date: Mon Feb 19 2024 - 11:26:57 EST


Abhishek Chauhan (ABC) wrote:
>
>
> On 2/16/2024 10:11 AM, Willem de Bruijn wrote:
> > Abhishek Chauhan wrote:
> >> Bridge driver today has no support to forward the userspace timestamp
> >> packets and ends up resetting the timestamp. ETF qdisc checks the
> >> packet coming from userspace and encounters to be 0 thereby dropping
> >> time sensitive packets. These changes will allow userspace timestamps
> >> packets to be forwarded from the bridge to NIC drivers.
> >>
> >> Signed-off-by: Abhishek Chauhan <quic_abchauha@xxxxxxxxxxx>
> >> ---
> >> Note:- I am a little skeptical of using bool inside the skbuff
> >> structure as no one today has used bool so far in the struct.
> >> (Expecting some comments from upstream for sure)
> >>
> >> I am also touching the heart of sk buff so i hope this is reviewed
> >> thoroughly. I have crossed checked multiple times on all the ipv4
> >> /ipv6 paths where userspace timestamp is populated. I tried as much
> >> as possible to cover all the references and made sure i put my changes
> >> in place.
> >>
> >> Bug description:- If the physical network interface is bridged the
> >> etf packets are dropped since bridge driver before forwarding the packet
> >> is setting the userspace timestamp to 0.
> >>
> >> Bridge driver call stack
> >>
> >> [ 157.120189] now is set to 1706054553072734733
> >> [ 157.120194] tx time from SKB is 0 <== SKB when reaches the etf qdisc is 0
> >> [ 157.120195] q->last time is 0
> >> [ 157.120197] CPU: 3 PID: 9206 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> >> [ 157.120201] Hardware name: Qualcomm SA8775P Ride (DT)
> >> [ 157.120202] Call trace:
> >> [ 157.120203] dump_backtrace+0xb0/0x130
> >> [ 157.120212] show_stack+0x1c/0x30
> >> [ 157.120215] dump_stack_lvl+0x74/0x8c
> >> [ 157.120220] dump_stack+0x14/0x24
> >> [ 157.120223] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> >> [ 157.120230] dev_qdisc_enqueue+0x2c/0x110
> >> [ 157.120234] __dev_xmit_skb+0x114/0x644
> >> [ 157.120236] __dev_queue_xmit+0x31c/0x774
> >> [ 157.120238] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> >> [ 157.120253] br_forward_finish+0xdc/0xec [bridge] <== This function is culprit as its making the tstamp as 0
> >> [root@ecbldauto-lvarm04-lnx ~]# [ 157.120263] __br_forward+0xd8/0x210 [bridge]
> >> [ 157.120272] br_forward+0x12c/0x150 [bridge]
> >> [ 157.120281] br_dev_xmit+0x288/0x49c [bridge]
> >> [ 157.120290] dev_hard_start_xmit+0xe4/0x2b4
> >> [ 157.120292] __dev_queue_xmit+0x6ac/0x774
> >> [ 157.120294] neigh_resolve_output+0x128/0x1ec
> >> [ 157.120297] ip_finish_output2+0x184/0x54c
> >> [ 157.120300] __ip_finish_output+0xa4/0x19c
> >> [ 157.120302] ip_finish_output+0x38/0xf0
> >> [ 157.120303] ip_output+0x13c/0x1f4
> >> [ 157.120305] ip_send_skb+0x54/0x10c
> >> [ 157.120307] udp_send_skb+0x128/0x394
> >> [ 157.120310] udp_sendmsg+0x7e8/0xa6c
> >> [ 157.120311] inet_sendmsg+0x48/0x70
> >> [ 157.120313] sock_sendmsg+0x54/0x60
> >> [ 157.120315] ____sys_sendmsg+0x1f8/0x254
> >> [ 157.120316] ___sys_sendmsg+0x84/0xcc
> >> [ 157.120318] __sys_sendmsg+0x60/0xb0
> >> [ 157.120319] __arm64_sys_sendmsg+0x28/0x30
> >> [ 157.120320] invoke_syscall.constprop.0+0x7c/0xd0
> >> [ 157.120323] el0_svc_common.constprop.0+0x140/0x150
> >> [ 157.120325] do_el0_svc+0x38/0xa0
> >> [ 157.120327] el0_svc+0x38/0x1d0
> >> [ 157.120329] el0t_64_sync_handler+0xb4/0x130
> >> [ 157.120330] el0t_64_sync+0x17c/0x180
> >>
> >> After my changes:-
> >> [ 2215.130148] now is set to 1706056610952501031
> >> [ 2215.130154] tx time from SKB is 1706056610953467393 <== Time is forwarded to etf correctly
> >> [ 2215.130155] q->last time is 1706056591423364609
> >> [ 2215.130158] CPU: 1 PID: 108166 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> >> [ 2215.130162] Hardware name: Qualcomm SA8775P Ride (DT) [ 2215.130163] Call trace:
> >> [ 2215.130164] dump_backtrace+0xb0/0x130
> >> [ 2215.130172] show_stack+0x1c/0x30 [root@ecbldauto-lvarm04-lnx ~]#
> >> [ 2215.130175] dump_stack_lvl+0x74/0x8c [ 2215.130181] dump_stack+0x14/0x24
> >> [ 2215.130184] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> >> [ 2215.130191] dev_qdisc_enqueue+0x2c/0x110
> >> [ 2215.130197] __dev_xmit_skb+0x114/0x644
> >> [ 2215.130200] __dev_queue_xmit+0x31c/0x774
> >> [ 2215.130202] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> >> [ 2215.130217] br_forward_finish+0xe4/0xf0 [bridge]
> >> [ 2215.130226] __br_forward+0xd8/0x20c [bridge]
> >> [ 2215.130235] br_forward+0x12c/0x150 [bridge]
> >> [ 2215.130243] br_dev_xmit+0x288/0x49c [bridge]
> >> [ 2215.130252] dev_hard_start_xmit+0xe4/0x2b4
> >> [ 2215.130254] __dev_queue_xmit+0x6ac/0x774
> >> [ 2215.130257] neigh_hh_output+0xcc/0x140
> >> [ 2215.130260] ip_finish_output2+0x300/0x54c
> >> [ 2215.130262] __ip_finish_output+0xa4/0x19c
> >> [ 2215.130263] ip_finish_output+0x38/0xf0
> >> [ 2215.130265] ip_output+0x13c/0x1f4
> >> [ 2215.130267] ip_send_skb+0x54/0x110
> >> [ 2215.130269] udp_send_skb+0x128/0x394
> >> [ 2215.130271] udp_sendmsg+0x7e8/0xa6c
> >> [ 2215.130272] inet_sendmsg+0x48/0x70
> >> [ 2215.130275] sock_sendmsg+0x54/0x60
> >> [ 2215.130277] ____sys_sendmsg+0x1f8/0x254
> >> [ 2215.130278] ___sys_sendmsg+0x84/0xcc
> >> [ 2215.130279] __sys_sendmsg+0x60/0xb0
> >> [ 2215.130281] __arm64_sys_sendmsg+0x28/0x30
> >> [ 2215.130282] invoke_syscall.constprop.0+0x7c/0xd0
> >> [ 2215.130285] el0_svc_common.constprop.0+0x140/0x150
> >> [ 2215.130287] do_el0_svc+0x38/0xa0
> >> [ 2215.130289] el0_svc+0x38/0x1d0
> >> [ 2215.130291] el0t_64_sync_handler+0xb4/0x130
> >> [ 2215.130292] el0t_64_sync+0x17c/0x180
> >>
> >>
> >> include/linux/skbuff.h | 13 +++++++++++++
> >> include/net/inet_sock.h | 1 +
> >> include/net/sock.h | 1 +
> >> net/core/sock.c | 1 +
> >> net/ipv4/ip_output.c | 3 +++
> >> net/ipv4/raw.c | 1 +
> >> net/ipv6/ip6_output.c | 2 ++
> >> net/ipv6/raw.c | 1 +
> >> net/packet/af_packet.c | 3 +++
> >> 9 files changed, 26 insertions(+)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 2dde34c29203..b098b7d30b56 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -744,6 +744,7 @@ typedef unsigned char *sk_buff_data_t;
> >> * @tstamp: Time we arrived/left
> >> * @skb_mstamp_ns: (aka @tstamp) earliest departure time; start point
> >> * for retransmit timer
> >> + * @user_delivery_time: states that timestamp was populated from userspace
> >> * @rbnode: RB tree node, alternative to next/prev for netem/tcp
> >> * @list: queue head
> >> * @ll_node: anchor in an llist (eg socket defer_list)
> >> @@ -879,6 +880,8 @@ struct sk_buff {
> >> ktime_t tstamp;
> >> u64 skb_mstamp_ns; /* earliest departure time */
> >> };
> >> + /* States that time is from userspace */
> >> + bool user_delivery_time;
> >> /*
> >> * This is the control buffer. It is free to use for every
> >> * layer. Please put your private variables there. If you
> >> @@ -4208,6 +4211,16 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
> >> if (skb->mono_delivery_time)
> >> return;
> >>
> >> + /* When userspace timestamp packets are forwarded via bridge
> >> + * the br_forward_finish clears the tstamp and the tstamp
> >> + * from the userspace is lost. Hence the check for user
> >> + * delivery time. With the below check now tc-etf qdisc will
> >> + * not end up dropping the packets if the packet is forwarded via
> >> + * bridge interface.
> >> + */
> >> + if (skb->user_delivery_time)
> >> + return;
> >> +
> >> skb->tstamp = 0;
> >> }
> >>
> >> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> >> index d94c242eb3ed..e7523545a493 100644
> >> --- a/include/net/inet_sock.h
> >> +++ b/include/net/inet_sock.h
> >> @@ -175,6 +175,7 @@ struct inet_cork {
> >> __u16 gso_size;
> >> u64 transmit_time;
> >> u32 mark;
> >> + bool user_delivery_time;
> >> };
> >
> > There's no need for a cork member, as by definition the fields in this
> > struct are coming from userspace.
> >
> > There is a very high bar to adding new fields to sk_buff, because it
> > is used by many paths and would be enormous if stuck with fields for
> > every intersection between a pair of features.
> >
> > The goal here is for the bridge to disambiguate earliest delivery time
> > timestamps from which? From those looped through ip forwarding? Why
> > does the bridge zero the tstamp field at all? That might help finding
> > a reasonable implementation.
> >
> > We have run in the issue of labeling the value of skb->tstamp before.
> > With redirect and looping it is definitely subtle.
>
> Thanks for your comments Willem,
>
> There is a clear explanation of why this is needed as part of the below link
> https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@xxxxxx/
>
> From the above link Martin KaFai Lau has mentioned and i quote.
>
> "In the future, another bit (e.g. skb->user_delivery_time) can be added
> for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid."

Let's CC: Martin on this thread extending his work.

In some cases SO_TXTIME sets timestamps for in monotonic, for FQ.

The issue here is that it may set in against other clocks, such as
CLOCK_TAI for the ETF qdisc.

Instead of adding yet another rarely used bit to sk_buff, could we
modify the existing bit to be clockid_delivery_time, and default
sk_clockid to CLOCK_MONOTONIC for TCP?

There is no conflict between SCM_TXTIME and TCP, because the explicit
cmsg makes no sense for TCP and indeed only RAW and DGRAM sockets
interpret it.