Re: [BUG] Boot crash on v6.7-rc2

From: Gustavo A. R. Silva
Date: Sat Nov 25 2023 - 13:41:26 EST




On 11/25/23 12:31, Kees Cook wrote:


On November 25, 2023 9:54:28 AM PST, "Gustavo A. R. Silva" <gustavo@xxxxxxxxxxxxxx> wrote:


On 11/24/23 09:28, Gustavo A. R. Silva wrote:


On 11/24/23 04:24, Joey Gouly wrote:
Hi all,

I just hit a boot crash on v6.7-rc2 (arm64, FVP model):

[..]

Checking `struct neighbour`:

    struct neighbour {
        struct neighbour __rcu    *next;
        struct neigh_table    *tbl;
    .. fields ..
        u8            primary_key[0];
    } __randomize_layout;

Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the memcpy() corrupts the tbl pointer.

I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.

It seems the issue is caused by this change that was recently added to -rc2:

commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays")

Previously, one-element and zero-length arrays were treated as true flexible arrays
(however, they are "fake" flex arrays), and __randomize_layout would leave them
untouched at the end of the struct; the same for proper C99 flex-array members. But
after the commit above, that's no longer the case: Only C99 flex-array members will
behave correctly (remaining untouched at end of the struct), and the other two types
of arrays will be randomized.

mmh... it seems that commit 1ee60356c2dc only prevents one-element arrays from being
treated as flex arrays, while the code should still keep zero-length arrays untouched:

if (typesize == NULL_TREE && TYPE_DOMAIN(fieldtype) != NULL_TREE &&
TYPE_MAX_VALUE(TYPE_DOMAIN(fieldtype)) == NULL_TREE)
return true;

- if (typesize != NULL_TREE &&
- (TREE_CONSTANT(typesize) && (!tree_to_uhwi(typesize) ||
- tree_to_uhwi(typesize) == tree_to_uhwi(elemsize))))
- return true;
-

This should be both the 0 and 1 checks. I think the original fix is correct: switch to a true flex array.

This code is new to me and I got a bit confused. Thanks for the clarification. :)

So, it'd be nice to apply this change:

https://lore.kernel.org/linux-hardening/b6c1c3ce-3ba0-4439-b0fb-2bb0c38586e0@xxxxxxxxxxxxxx/



Sorry about the confusion.



I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
However I tried changing the zero-length-array to a flexible one:

    +    DECLARE_FLEX_ARRAY(u8, primary_key);
    +    u8        primary_key[0];

Was this line supposed to be "-"?


Then the field offsets ended up overlapping, and I also got the same crash on v6.6.

The right approach is to transform the zero-length array into a C99 flex-array member,
like this:

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 07022bb0d44d..0d28172193fa 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -162,7 +162,7 @@ struct neighbour {
        struct rcu_head         rcu;
        struct net_device       *dev;
        netdevice_tracker       dev_tracker;
-       u8                      primary_key[0];
+       u8                      primary_key[];
 } __randomize_layout;

 struct neigh_ops {

In any case, I think we still should convert [0] to [ ].

I would expect the above to fix the problem. If it doesn't I'll need to take a closer look at the plugin...

I think this should fix the issue. Let me go create a proper patch for this.
I'll send it out, shortly.

--
Gustavo