Possible Kernel Bug caused by NAT in Open vSwitch

From: Jay Rhine
Date: Mon Nov 27 2023 - 23:13:27 EST


Hello!

I’m hoping that you can help me got to the bottom of what appears to be a Kernel Bug that I have been seeing recently.
I have been experiencing system crashes on multiple x86_64 systems running Open vSwitch on Ubuntu 22.04 with kernel version 5.15.0-89-generic #99-Ubuntu SMP. This is the latest generic Ubuntu 22.04 kernel at this time. When this crash occurs, I see the following error message in journalctl, but we get nothing on the console or logged anywhere else, and just a complete system freeze:

Nov 26 06:56:34 system_name kernel: ------------[ cut here ]------------
Nov 26 06:56:34 system_name kernel: kernel BUG at net/core/skbuff.c:1697!
-- Boot f90d566815cb4044bc7cbc8703a7aa9e --
Nov 26 07:22:32 system_name kernel:

I cannot reliably reproduce the issue on a test system. However, when it does occur it seems to be directly correlated with traffic using NAT through Open vSwitch on these servers. It is not as simple as just putting any traffic through the NAT, so it must be related to something more subtle (a specific type of traffic, header, etc.). This issue occurred multiple time on at least 6 separate servers, so it is not related to a hardwares issue.

Unfortunately, the error message above does not provide much in the way of details (no stack trace, etc). So I pulled down the latest kernel source code specifically for this version of the ubuntu kernel, and it appears that this “kernel BUG” message is originating from call to “BUG_ON(skb_shared(skb));" in the “int pskb_expand_head" function in the “net/core/skbuff.c" file.

Here is the beginning of the function in context:

int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
gfp_t gfp_mask)
{
int i, osize = skb_end_offset(skb);
int size = osize + nhead + ntail;
long off;
u8 *data;

BUG_ON(nhead < 0);

BUG_ON(skb_shared(skb)); // THIS IS LINE 1697

size = SKB_DATA_ALIGN(size);

if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
gfp_mask, NUMA_NO_NODE, NULL);

Looking at “skb_shared”, it will only fail if &skb->users has anything other than one user connect to it. Here is that function:

/**
* skb_shared - is the buffer shared
* @skb: buffer to check
*
* Returns true if more than one person has a reference to this
* buffer.
*/
static inline int skb_shared(const struct sk_buff *skb)
{
return refcount_read(&skb->users) != 1;
}

Based on the above tracing through of the kernel code, I believe this implies that the kernel is crashing because somehow an sk_buff struct is being passed around with too many references. Since this is highly correlated with NAT being done by OVS, I suspect the issue could be caused by something in the “net/openvswitch/conntrack.c" file. I did confirm that this file will indirectly call pskb_expand_head, but there was nothing obvious to me in the file that indicated it was not correctly incrementing or decrementing the references in the sk_buff.

I would really like to add a few debug statements in around this issue and recompile the kernel, but since I can’t reliable reproduce the issue when I want to I am unable to trigger the issue on a test machine to do that.

If anyone has seen this issue or anything like it and can provide any thoughts, suggestions, etc., I would really appreciate it.

Thank you!

Jay Rhine