Re: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device

From: Willem de Bruijn
Date: Thu Jan 12 2023 - 14:01:16 EST


On Thu, Jan 12, 2023 at 11:25 AM Hervé Boisse <admin@xxxxxxxxxxx> wrote:
>
> On Thu, Jan 12, 2023 at 04:47:38PM +0100, Paolo Abeni wrote:
> > I understand, thanks. Still is not clear why the user-space application
> > would attach to dummy0.832 instead of dummy0.
> >
> > With your patch the filter will match, but the dhcp packet will reach
> > the wire untagged, so the app will behave exactly as it would do
> > if/when attached to dummy0.
> >
> > To me it looks like the dhcp client has a bad configuration (wrong
> > interface) and these patches address the issue in the wrong place
> > (inside the kernel).
>
> No, the packet will actually reach the wire as a properly tagged 802.1Q frame.
> For devices that do not support VLAN offloading (such as dummy but also the network card I am using), the kernel adds the tag itself in software before transmitting the packet to the real device.
>
> You can verify this with a capture using tcpdump/wireshark on dummy0 versus dummy0.832.
> That's why dhclient has to send its packets over dummy0.832 and not dummy0.
>
> The same will happen on a real device. I checked on real hardware, with two boxes and their network cards connected through a cable.
> If dhclient is started directly on the first box real device (eth0), the frame is received untagged by the second box, as intended.
> But, if dhclient is started on top of the VLAN device (eth0.832), the second box receives a properly tagged frame.

SOCK_DGRAM writing the tag and SOCK_RAW not writing it is inconsistent.

The driver clearly anticipates SOCK_RAW writers that write only
Ethernet, and fixes up the difference in its ndo_start_xmit:

/* Handle non-VLAN frames if they are sent to us, for example by DHCP.

That workaround only comes too late for code between dev_queue_xmit
and ndo_start_xmit: tc filters.

Strictly, dhclient is just not writing the right link layer, as
advertised by this device in dev->hard_header_len and
vlan_dev_hard_header. But being pedantic won't make the application
work (I assume it never has).

Perhaps the device can have an optional mode where it does present as
a pure Ethernet device, and handles all the VLAN insertion purely in
the driver code in ndo_start_xmit?