Re: [PATCH net-next 2/2] net: phy: Add ability to debug RGMII connections

From: Florian Fainelli
Date: Thu Oct 17 2019 - 18:22:55 EST




On 10/17/2019 3:06 PM, Vladimir Oltean wrote:
>> +static int phy_rgmii_debug_rcv(struct sk_buff *skb, struct net_device
>> *dev,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct packet_type *pt, struct net_device *unused)
>> +{
>> +ÂÂÂ struct phy_rgmii_debug_priv *priv = pt->af_packet_priv;
>> +ÂÂÂ u32 fcs;
>> +
>> +ÂÂÂ /* If we receive something, the Ethernet header was valid and so was
>> +ÂÂÂÂ * the Ethernet type, so to re-calculate the FCS we need to undo
>> what
>> +ÂÂÂÂ * eth_type_trans() just did.
>> +ÂÂÂÂ */
>> +ÂÂÂ if (!__skb_push(skb, ETH_HLEN))
>> +ÂÂÂÂÂÂÂ return 0;
>
> Why would this return NULL?
I don't think it can, good point.

>
>> +
>> +ÂÂÂ fcs = phy_rgmii_probe_skb_fcs(skb);
>> +ÂÂÂ if (skb->len != priv->skb->len || fcs != priv->fcs) {
>
> I feel like this logic is broken. How do you know that this skb is that
> skb? Everybody else can still enqueue to the netdev, right?

That is true, so I could be defeated by someone sending an Ethernet
Frame with a 0xdada ethernet type through, e.g.: raw sockets, good point.

>
> Actually if I'm right about the FCS errors resulting in drops below,
> then any news here is good news, no need to even compare the FCS of two
> frames which you don't know whether they're in fact one and the same.

FCS is a bit overstated here, although it actually is what the HW would
generate/verify but the point was really that if you have a RGMII issue
you may very well end-up with two packets instead of one, because of the
clock/data misalignment.

>
>> +ÂÂÂÂÂÂÂ print_hex_dump(KERN_INFO, "RX probe skb: ",
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ DUMP_PREFIX_OFFSET, 16, 1, skb->data, 32,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ false);
>> +ÂÂÂÂÂÂÂ netdev_warn(dev, "Calculated FCS: 0x%08x expected: 0x%08x\n",
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ fcs, priv->fcs);
>> +ÂÂÂ } else {
>> +ÂÂÂÂÂÂÂ priv->rcv_ok = 1;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ complete(&priv->compl);
>> +
>> +ÂÂÂ return 0;
>> +}
>> +
>> +static int phy_rgmii_trigger_config(struct phy_device *phydev,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ phy_interface_t interface)
>> +{
>> +ÂÂÂ int ret = 0;
>> +
>> +ÂÂÂ /* Configure the interface mode to be tested */
>> +ÂÂÂ phydev->interface = interface;
>> +
>> +ÂÂÂ /* Forcibly run the fixups and config_init() */
>> +ÂÂÂ ret = phy_init_hw(phydev);
>> +ÂÂÂ if (ret) {
>> +ÂÂÂÂÂÂÂ phydev_err(phydev, "phy_init_hw failed: %d\n", ret);
>> +ÂÂÂÂÂÂÂ return ret;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ /* Some PHY drivers configure RGMII delays in their config_aneg()
>> +ÂÂÂÂ * callback, so make sure we run through those as well.
>> +ÂÂÂÂ */
>> +ÂÂÂ ret = phy_start_aneg(phydev);
>> +ÂÂÂ if (ret) {
>> +ÂÂÂÂÂÂÂ phydev_err(phydev, "phy_start_aneg failed: %d\n", ret);
>> +ÂÂÂÂÂÂÂ return ret;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ /* Put back in loopback mode since phy_init_hw() may have issued
>> +ÂÂÂÂ * a software reset.
>> +ÂÂÂÂ */
>> +ÂÂÂ ret = phy_loopback(phydev, true);
>> +ÂÂÂ if (ret)
>> +ÂÂÂÂÂÂÂ phydev_err(phydev, "phy_loopback failed: %d\n", ret);
>> +
>> +ÂÂÂ return ret;
>> +}
>> +
>> +static void phy_rgmii_probe_xmit_work(struct work_struct *work)
>> +{
>> +ÂÂÂ struct phy_rgmii_debug_priv *priv;
>> +
>> +ÂÂÂ priv = container_of(work, struct phy_rgmii_debug_priv, work);
>> +
>> +ÂÂÂ dev_queue_xmit(priv->skb);
>
> Oops, you just lost ownership of priv->skb here. Anything happening
> further is in a race with the netdev driver. You need to hold a
> reference to it with skb_get().

Doh, yes, thanks!

>
>> +}
>> +
>> +static int phy_rgmii_prepare_probe(struct phy_rgmii_debug_priv *priv)
>> +{
>> +ÂÂÂ struct phy_device *phydev = priv->phydev;
>> +ÂÂÂ struct net_device *ndev = phydev->attached_dev;
>> +ÂÂÂ struct sk_buff *skb;
>> +ÂÂÂ int ret;
>> +
>> +ÂÂÂ skb = netdev_alloc_skb(ndev, ndev->mtu);
>> +ÂÂÂ if (!skb)
>> +ÂÂÂÂÂÂÂ return -ENOMEM;
>> +
>> +ÂÂÂ priv->skb = skb;
>
> Could you assign priv->skb at the end, not here? This way you won't risk
> leaking a freed pointer into priv->skb if eth_header below fails.

Makes sense.

>
>> +ÂÂÂ skb->dev = ndev;
>> +ÂÂÂ skb_put(skb, ndev->mtu);
>> +ÂÂÂ memset(skb->data, 0xaa, skb->len);
>> +
>
> I think you need to do something like this before skb_put:
>
> +ÂÂÂÂÂÂ skb->protocol = htons(ETH_P_EDSA);
> +ÂÂÂÂÂÂ skb_reset_network_header(skb);
> +ÂÂÂÂÂÂ skb_reset_transport_header(skb);
>
> Otherwise I get a lot of these errors on a bridged net device:
>
> [Â 142.919783] protocol 0000 is buggy, dev swp2
> [Â 142.924436] protocol 0000 is buggy, dev eth2
>
>> +ÂÂÂ /* Build the header */
>> +ÂÂÂ ret = eth_header(skb, ndev, ETH_P_EDSA, ndev->dev_addr,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂ NULL, ndev->mtu);
>
> A switch net device will complain about having SMAC == DMAC and drop the
> frame. Don't you want to send broadcast frames here?

Yes, that makes sense, if you do not have broadcast in your network
filter, your network adapter is not great use.

>
>> +ÂÂÂ if (ret != ETH_HLEN) {
>> +ÂÂÂÂÂÂÂ kfree_skb(skb);
>> +ÂÂÂÂÂÂÂ return -EINVAL;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ priv->fcs = phy_rgmii_probe_skb_fcs(skb);
>> +
>
> I'm far from a checksumming expert, but if the FCS was invalid, wouldn't
> the RX MAC just drop the frame?

Depends if the user has requested NETIF_F_RXALL, this was just a
convenient way to produce a strong enough checksum to compare against,
the HW will have to insert it and strip it back on its way back to itself.

>
>> +ÂÂÂ return 0;
>> +}
>> +
>> +static int phy_rgmii_probe_interface(struct phy_rgmii_debug_priv *priv,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ phy_interface_t iface)
>> +{
>> +ÂÂÂ struct phy_device *phydev = priv->phydev;
>> +ÂÂÂ struct net_device *ndev = phydev->attached_dev;
>> +ÂÂÂ unsigned long timeout;
>> +ÂÂÂ int ret;
>> +
>> +ÂÂÂ ret = phy_rgmii_trigger_config(phydev, iface);
>> +ÂÂÂ if (ret) {
>> +ÂÂÂÂÂÂÂ netdev_err(ndev, "%s rejected by driver(s)\n",
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ phy_modes(iface));
>> +ÂÂÂÂÂÂÂ return ret;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ netdev_info(ndev, "Trying \"%s\" PHY interface\n",
>> phy_modes(iface));
>> +
>> +ÂÂÂ /* Prepare probe frames now */
>> +ÂÂÂ ret = phy_rgmii_prepare_probe(priv);
>> +ÂÂÂ if (ret)
>> +ÂÂÂÂÂÂÂ return ret;
>> +
>> +ÂÂÂ priv->rcv_ok = 0;
>> +ÂÂÂ reinit_completion(&priv->compl);
>> +
>> +ÂÂÂ cancel_work_sync(&priv->work);
>> +ÂÂÂ schedule_work(&priv->work);
>> +
>> +ÂÂÂ timeout = wait_for_completion_timeout(&priv->compl,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ msecs_to_jiffies(3000));
>> +ÂÂÂ if (!timeout) {
>> +ÂÂÂÂÂÂÂ netdev_err(ndev, "transmit timeout!\n");
>> +ÂÂÂÂÂÂÂ ret = -ETIMEDOUT;
>> +ÂÂÂÂÂÂÂ goto out;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ ret = priv->rcv_ok == 1 ? 0 : -EINVAL;
>> +out:
>> +ÂÂÂ phy_loopback(phydev, false);
>> +ÂÂÂ dev_consume_skb_any(priv->skb);
>
> Don't consume the skb if the xmit has timed out. The driver will have
> already freed it in that case, leading to:
>
> [Â 145.994328] sja1105 spi0.1 swp2: transmit timeout!
> [Â 145.999259] ------------[ cut here ]------------
> [Â 146.003901] WARNING: CPU: 0 PID: 163 at lib/refcount.c:190
> refcount_sub_and_test_checked+0xb8/0xc0
> [Â 146.013029] refcount_t: underflow; use-after-free.
>
> That means, in practice, moving the kfree_skb call to phy_rgmii_debug_rcv.
>
>> +ÂÂÂ return ret;
>> +}
>> +
>> +static struct packet_type phy_rgmii_probes_type __read_mostly = {
>> +ÂÂÂ .typeÂÂÂ = cpu_to_be16(ETH_P_EDSA),
>> +ÂÂÂ .funcÂÂÂ = phy_rgmii_debug_rcv,
>> +};
>> +
>> +static int phy_rgmii_can_debug(struct phy_device *phydev)
>> +{
>> +ÂÂÂ struct net_device *ndev = phydev->attached_dev;
>> +
>> +ÂÂÂ if (!ndev) {
>> +ÂÂÂÂÂÂÂ netdev_err(ndev, "No network device attached\n");
>> +ÂÂÂÂÂÂÂ return -EOPNOTSUPP;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ if (!phy_interface_is_rgmii(phydev)) {
>> +ÂÂÂÂÂÂÂ netdev_info(ndev, "Not RGMII configured, nothing to do\n");
>> +ÂÂÂÂÂÂÂ return 0;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ if (!phydev->is_gigabit_capable) {
>> +ÂÂÂÂÂÂÂ netdev_err(ndev, "not relevant in non-Gigabit mode\n");
>> +ÂÂÂÂÂÂÂ return -EOPNOTSUPP;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ if (phy_driver_is_genphy(phydev) ||
>> phy_driver_is_genphy_10g(phydev)) {
>> +ÂÂÂÂÂÂÂ netdev_err(ndev, "only relevant with non-generic drivers\n");
>> +ÂÂÂÂÂÂÂ return -EOPNOTSUPP;
>> +ÂÂÂ }
>> +ÂÂÂ return 1;
>> +}
>> +
>> +int phy_rgmii_debug_probe(struct phy_device *phydev)
>> +{
>> +ÂÂÂ struct net_device *ndev = phydev->attached_dev;
>> +ÂÂÂ unsigned char operstate = ndev->operstate;
>> +ÂÂÂ phy_interface_t rgmii_modes[4] = {
>> +ÂÂÂÂÂÂÂ PHY_INTERFACE_MODE_RGMII,
>> +ÂÂÂÂÂÂÂ PHY_INTERFACE_MODE_RGMII_ID,
>> +ÂÂÂÂÂÂÂ PHY_INTERFACE_MODE_RGMII_RXID,
>> +ÂÂÂÂÂÂÂ PHY_INTERFACE_MODE_RGMII_TXID
>> +ÂÂÂ };
>> +ÂÂÂ struct phy_rgmii_debug_priv *priv;
>> +ÂÂÂ unsigned int i, count;
>> +ÂÂÂ int ret;
>> +
>> +ÂÂÂ ret = phy_rgmii_can_debug(phydev);
>> +ÂÂÂ if (ret <= 0)
>> +ÂÂÂÂÂÂÂ return ret;
>> +
>> +ÂÂÂ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> +ÂÂÂ if (!priv)
>> +ÂÂÂÂÂÂÂ return -ENOMEM;
>> +
>> +ÂÂÂ if (phy_rgmii_probes_type.af_packet_priv)
>> +ÂÂÂÂÂÂÂ return -EBUSY;
>> +
>> +ÂÂÂ phy_rgmii_probes_type.af_packet_priv = priv;
>> +ÂÂÂ priv->phydev = phydev;
>> +ÂÂÂ INIT_WORK(&priv->work, phy_rgmii_probe_xmit_work);
>> +ÂÂÂ init_completion(&priv->compl);
>> +
>> +ÂÂÂ /* We are now testing this network device */
>> +ÂÂÂ ndev->operstate = IF_OPER_TESTING;
>> +
>
> Shouldn't you put the netdev in promisc mode somewhere?

If we send with a broadcast MAC SA (which is a good suggestion) and our
own MAC DA, then no.

[snip]

>>
>
> Despite the above, I couldn't actually get this running successfully. At
> the end of the test I always get "-bash: echo: write error: Connection
> timed out".
> It's a fun toy, but I don't really think it's very useful in catching
> any bug.

Looks like it just did, with itself :)

> It's basically a glorified ping test, and brainless ping tests are
> precisely the reason why people get this wrong most of the time. You
> can't have a generic software tool identify for you a configuration
> problem that depends entirely upon a private hardware implementation of
> a specification that is vague.
>
> I mean in theory, the arithmetic is simple enough for a MAC-to-PHY
> connection. These 2 equalities always need to hold true:
>
> MAC TX delay + PCB TX delay + PHY TX delay == 1
> MAC RX delay + PCB RX delay + PHY RX delay == 1
>
> meaning that delays in each direction need to be applied at most once.
>
> For a PHY-to-MAC connection, there is this unwritten Linux rule that the
> PHY should apply the requested delays in both directions. This already
> contradicts common sense, as it is not uncommon, from a hardware point
> of view, for each device to add the delays in its own TX direction (so
> the MAC would add the TX delays and the PHY would add the RX delays).
> That is not possible to specify with Linux. But let's go with the flow.
> So the PHY adds all specified delays, and one can assume that the
> unspecified delays up to rgmii-id were added by the PCB. This small
> kernel thread would basically probe for PCB delays, in this case,
> assuming that the MAC driver and the PHY driver are both compliant.
>
> Let's say there is more than one phy-mode that works. Andrew said to
> raise a red flag in that case, because the PHY driver is surely not
> doing the right thing with the delays. But:
> - Maybe it is, but the equalities above aren't completely set in stone.
> Maybe the inserted propagation delays aren't high enough that two of
> them would break the link again.
> - Which of the multiple phy-mode configurations that work is the right
> one? A tool that can't tell me that is pointless, IMO. My PHY works due
> to pin strapping, but the driver is buggy. Do I care? No, as long as it
> works, and as long as it will continue to work after somebody fixes the
> driver. How do I know what delay mode is right? Well, of course, if it
> works with the configuration out of pin strapping, then obviously I
> should put the pin strapping settings in the DT. End of story. Can this
> kernel thread tell me that? No....
>
> And then, there's the RGMII fixed-link. The rules are cloudy for that
> one, because now there's potentially 2 phy-modes that operate on the
> same link. To complicate matters even further, your patch does not
> consider the fixed-link (no PHY) case, and there is no generic interface
> to even add selftests for that in the future. You would need to unbind
> the MAC driver, mangle the DT bindings, then bind it back again...
>
> I guess I'm just concerned about the chaos that a tool returning false
> positives would create for people who don't really follow what's going
> on ("look, but the tool said this!").

And maybe I should have marked this RFC, the commit subject is clear
that this not fool proof, it cannot be, for all the reasons you
outlined. The thing is that I have spent many hours of my life (like
you, like Andrew) helping people troubleshoot why RGMII does not work,
if we have a good litmus test we can submit, that gets us half-way there.

I am completely fine dropping this if you believe this is going to cause
more harm than good.
--
Florian