Re: [PATCH net-next v7 15/16] net: ethtool: ts: Let the active time stamping layer be selectable

From: Vladimir Oltean
Date: Mon Nov 20 2023 - 16:18:11 EST


On Mon, Nov 20, 2023 at 11:58:39AM -0800, Jakub Kicinski wrote:
> Rx time stamping is configured by filters. Is there a problem with user
> specifying that they want "true" timestamps for PTP/NTP packets, and
> "dma" timestamps for all the rest?

There is, because enum hwtstamp_rx_filters is NIC-wide, and there is
only one of those - corresponding to the single hwtstamp (ts[2]) provider.
There were never talks in this patch set, AFAIR, about multiple hwtstamp
providers active simultaneously (for different traffic streams) and
configuring them independently, with separate RX filters.

> Maybe we can extend struct scm_timestamping to carry an indication
> which stamp ended up in ts[2] but that's less important to me than
> the ability to configure the thing. Right now, as I said, mlx5 uses
> an ethtool priv flag :(

No, you misunderstood me. I didn't suggest (at least not here) to add an
indication to struct scm_timestamping of "what's the source of ts[2]
(the hwtstamp)".

I was just _asking_ (collecting data) whether it's ultimately desirable
for DMA timestamps to be visible in ts[2] (indistinguishable from a
better hwtstamp, as they currently are, I guess) rather than their
own thing. Like for example, in a congestion control algorithm, where
does TCP really get them from.

If they'd rather be their own thing in a fully developed API, then the
whole discussion is rather off-topic to Köry, because here, we're
beating the dead horse of "where does ts[2] come from" - _still_ a
single source, just selectable, that is.

> > But maybe I'm wrong and there are NICs which can do that filtering.
> > If such NIC exists, then I guess a SOF_TIMESTAMPING_RX_DMA flag should
> > be added to the socket layer, and the NIC driver provides timestamps
> > according to the skb->sk->sk_tsflags, and that problem is completely out
> > of scope for Köry's patch set - and implicitly compatible with it, since
> > as you say, the device-wide timestamping layer - PHC index - does not
> > really change.
>
> IDK. Maybe the sniffles I picked up at LPC are clouding my judgment
> but to me this patch set is shaped too much by current implementation
> and not enough by what it's modeling. It basically exposes to user
> space the "mux" for choosing NETDEV vs PHYLIB.

The last sentence I agree with.

> There are multiple time stamping points as the packet moves thru
> the pipeline. Expose them so that SIOC[GS]HWTSTAMP can target each
> on individually.

Ok, that is rather vague and complex.

You will forever have to contend with the fact that struct scm_timestamping
can contain a single hwtstamp per packet: ts[2]. So you need complex
control path logic to ensure that the sum of RX filters for all
timestamping points in the packet data path doesn't actually request
more than one ts[2] for any skb.

I understand how that could scrath your itch, but here, it sounds
off-topic?

This is actually starting to get close, in a way, to my feedback to
Richard to allow multiple hwtstamp sources for an skb, and to just give
an indication in the cmsg of what's their source, leaving user space to
figure out the rest.
https://lore.kernel.org/netdev/20220120164832.xdebp5vykib6h6dp@skbuf/

But his response was "There was a fair amount of discussion, and it
seemed to me that everyone wanted a pony."
https://lore.kernel.org/netdev/Y%2F0Idkhy27TObawi@xxxxxxxxxxxxxxxxxx/

I mean, IDK, maybe it's not off-topic, but it's a round-about way of
achieving what they think they can achieve in a more straightforward way.

Rephrased in my own words, you're saying:

Forget the concept of an active hwtstamp provider, just open up the
knobs of _all_ possible hwtstamp providers for a NIC. Simultaneously!
To make one active and all the others inactive, just use
HWTSTAMP_FILTER_NONE/HWTSTAMP_TX_OFF for all except one, and the desired
enum hwtstamp_rx_filters / enum hwtstamp_tx_types for the active one.
Live with this expanded configuration model for a while, just restricted
for a single active timestamping layer, and then, once user space is
ready for an enhanced struct scm_timestamping which supports potentially
multiple cmsgs with distinct hwtstamps, remove the restriction and let
it all rip! Everybody gets their pony!

Additionally, SIOCSHWTSTAMP is kinda rusty, has a fixed binary format,
and is not extensible to target a specific hwtstamp provider. So a
netlink conversion of that, as a first step, would of course be great.

Is it an accurate summary?

If it is, I'll let Köry comment on the feasibility :)

> > If I'm not wrong and the MAC-or-DMA timestamp selection is NIC-wide
> > (which diverges from your problem description),
>
> Nope.
>
> > then neither Köry's work
> > nor my "everything is a phc_index" proposal will bring your use case to
> > fruition without further work. Here I would avoid speculating, because a
> > lot will depend upon the details which you haven't really given.
>
> What are the details you'd like? PTP gets stamped at the PHY/MAC,
> the rest gets stamped at DMA. mlx5 achieves this by splitting the
> PTP traffic to a separate queue pair, and configuring that qp to
> capture PHY/MAC stamps, AFAIU.

That's enough, thanks.

> > One question will be whether, in the case of "NIC-wide DMA timestamps",
> > DMA timestamps should be presented as hardware timestamps - struct
> > scm_timestamping[2] from CMSG_DATA() - or as their own thing, that user
> > space needs explicit support for - by parsing a new cmsg level/type.
> > If DMA timestamps won't look to user space like hardware timestamps,
> > then the use case is again out of scope for Köry's work, as far as I see
> > it.
> >
> > Another simple question is - if NICs do this today - probably by giving
> > the "unrepresentable mix" to user space in an implicit, hardcoded and
> > very fine tuned way such that nobody bats an eye - then what is there
> > more to support? Are you looking at extra UAPI as a way to legitimize
> > hacks, or do you feel there is extra control that applications can gain?
>
> I don't understand what you're asking me.

You've partially answered above. The mix of timestamps coming from the
PHY/MAC and those coming from the DMA is unrepresentable in today's
UAPI, and is just fine-tuned to work for the existing use case of "PTP
gets PHY/MAC, everything else gets DMA".

Still not 100% clear what would the proper UAPI (separate user-controllable
RX filters for PHY, MAC and DMA) gain, in addition to what exists in mlx5.

> DMA timestamping is becoming increasingly important. Ready any
> congestion control paper from the last 5 years and chances are
> it will be using delay as a signal. If we're extending uAPI
> for Hw stamping we should make sure to cater to CC use cases.

I'll stop commenting here for a while and go read some of those papers and
RFCs, in the hope that I find some info about the way in which TCP expects
hwtstamps from a NIC. It's quite evident to me that I don't have enough
information to help this discussion reach a conclusion.