[Issue report] drivers/ftgmac100: DHCP occasionally fails during boot up or link down/up

From: Heyi Guo
Date: Tue Feb 15 2022 - 01:39:02 EST


Hi,

We are using Aspeed 2600 and found DHCP occasionally fails during boot up or link down/up. The DHCP client is systemd 247.6 networkd. Our network device is 2600 MAC4 connected to a RGMII PHY module.

Current investigation shows the first DHCP discovery packet sent by systemd-networkd might be corrupted, and sysmtemd-networkd will continue to send DHCP discovery packets with the same XID, but no other packets, as there is no IP obtained at the moment. However the server side will not respond with this serial of DHCP requests, until it receives some other packets. This situation can be recovered by another link down/up, or a "ping -I eth0 xxx.xxx.xxx.xxx" command to insert some other TX packets.

Navigating the driver code ftgmac.c, I've some question about the work flow from link down to link up. I think the flow is as below:

1. ftgmac100_open() will enable net interface with ftgmac100_init_all(), and then call phy_start()

2. When PHY is link up, it will call netif_carrier_on() and then adjust_link interface, which is ftgmac100_adjust_link() for ftgmac100

3. In ftgmac100_adjust_link(), it will schedule the reset work (ftgmac100_reset_task)

4. ftgmac100_reset_task() will then reset the MAC

I found networkd will start to send DHCP request immediately after netif_carrier_on() called in step 2, but step 4 will reset the MAC, which may potentially corrupt the sending packet.

Is there anything wrong in this flow? Or do I miss something?

Thanks,

Heyi