Re: [PATCH] bcachefs: Refactor bkey_i to use a flexible array

From: Brian Foster
Date: Mon Oct 16 2023 - 08:42:25 EST


On Fri, Oct 13, 2023 at 04:44:21PM -0700, Kees Cook wrote:
> On Fri, Oct 13, 2023 at 07:26:11AM -0400, Brian Foster wrote:
> > On Tue, Oct 10, 2023 at 04:56:12PM -0700, Kees Cook wrote:
> > > The memcpy() in bch2_bkey_append_ptr() is operating on an embedded
> > > fake flexible array. Instead, make it explicit, and convert the memcpy
> > > to target the flexible array instead. Fixes the W=1 warning seen for
> > > -Wstringop-overflow:
> > >
> > > In file included from include/linux/string.h:254,
> > > from include/linux/bitmap.h:11,
> > > from include/linux/cpumask.h:12,
> > > from include/linux/smp.h:13,
> > > from include/linux/lockdep.h:14,
> > > from include/linux/radix-tree.h:14,
> > > from include/linux/backing-dev-defs.h:6,
> > > from fs/bcachefs/bcachefs.h:182:
> > > fs/bcachefs/extents.c: In function 'bch2_bkey_append_ptr':
> > > include/linux/fortify-string.h:57:33: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
> > > 57 | #define __underlying_memcpy __builtin_memcpy
> > > | ^
> > > include/linux/fortify-string.h:648:9: note: in expansion of macro '__underlying_memcpy'
> > > 648 | __underlying_##op(p, q, __fortify_size); \
> > > | ^~~~~~~~~~~~~
> > > include/linux/fortify-string.h:693:26: note: in expansion of macro '__fortify_memcpy_chk'
> > > 693 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
> > > | ^~~~~~~~~~~~~~~~~~~~
> > > fs/bcachefs/extents.c:235:17: note: in expansion of macro 'memcpy'
> > > 235 | memcpy((void *) &k->v + bkey_val_bytes(&k->k),
> > > | ^~~~~~
> > > fs/bcachefs/bcachefs_format.h:287:33: note: destination object 'v' of size 0
> > > 287 | struct bch_val v;
> > > | ^
> > >
> > > Cc: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > > Cc: Brian Foster <bfoster@xxxxxxxxxx>
> > > Cc: linux-bcachefs@xxxxxxxxxxxxxxx
> > > Reported-by: kernel test robot <lkp@xxxxxxxxx>
> > > Closes: https://lore.kernel.org/oe-kbuild-all/202309192314.VBsjiIm5-lkp@xxxxxxxxx/
> > > Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx>
> > > ---
> > > fs/bcachefs/bcachefs_format.h | 5 ++++-
> > > fs/bcachefs/extents.h | 2 +-
> > > 2 files changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
> > > index f0d130440baa..f5e8cb43697b 100644
> > > --- a/fs/bcachefs/bcachefs_format.h
> > > +++ b/fs/bcachefs/bcachefs_format.h
> > > @@ -300,7 +300,10 @@ struct bkey_i {
> > > __u64 _data[0];
> > >
> > > struct bkey k;
> > > - struct bch_val v;
> > > + union {
> > > + struct bch_val v;
> > > + DECLARE_FLEX_ARRAY(__u8, bytes);
> > > + };
> > > };
> >
> > Hi Kees,
> >
> > I'm curious if this is something that could be buried in bch_val given
> > it's already kind of a fake structure..? If not, my only nitty comment
>
> I was thinking it would be best to keep the flexible array has "high" in
> the struct as possible, as in the future more refactoring will be needed
> to avoid having flex arrays overlap with other members in composite
> structures. So instead of pushing into bch_val, I left it at the highest
> level possible, bch_i, as that's the struct being used by the memcpy().
>
> Eventually proper unions will be needed instead of overlapping bch_i
> with other types, as in:
>
> struct btree_root {
> struct btree *b;
>
> /* On disk root - see async splits: */
> __BKEY_PADDED(key, BKEY_BTREE_PTR_VAL_U64s_MAX);
> u8 level;
> u8 alive;
> s8 error;
> };
>
> But that's all for the future. Right now I wanted to deal with the more
> pressing matter of a 0-sized array not being zero sized. :)
>

Ok, but I'm not really following how one approach vs. the other relates
to this particular example of an embedded bkey_i. I'm probably just not
familiar enough with the current issues with 0-sized arrays and the
anticipated path forward. Can you elaborate for somebody who is more
focused on trying to manage the design/complexity of these various key
data structures? For example, what's the practical difference here (for
future work) if the flex array lives in bch_val vs. bkey_i?

Note that I don't necessarily have a strong opinion on this atm, but if
there's a "for future reasons" aspect to this approach I'd like to at
least understand it a little better. ;)

> > is that memcpy(k->bytes[], ...) makes it kind of read like we're copying
> > in opaque key data rather than value data, so perhaps a slightly more
> > descriptive field name would be helpful. But regardless I'd wait until
> > Kent has a chance to comment before changing anything..
>
> How about "v_bytes" instead of "bytes"? Or if it really is preferred,
> I can move the flex array into bch_val -- it just seems like the wrong
> layer...
>

Yeah.. v_bytes, value_bytes, etc. etc. Anything that avoids misleading
code when using the field is good with me. Thanks.

Brian

> -Kees
>
> >
> > Brian
> >
> > >
> > > #define KEY(_inode, _offset, _size) \
> > > diff --git a/fs/bcachefs/extents.h b/fs/bcachefs/extents.h
> > > index 7ee8d031bb6c..6248e17bbac5 100644
> > > --- a/fs/bcachefs/extents.h
> > > +++ b/fs/bcachefs/extents.h
> > > @@ -642,7 +642,7 @@ static inline void bch2_bkey_append_ptr(struct bkey_i *k, struct bch_extent_ptr
> > >
> > > ptr.type = 1 << BCH_EXTENT_ENTRY_ptr;
> > >
> > > - memcpy((void *) &k->v + bkey_val_bytes(&k->k),
> > > + memcpy(&k->bytes[bkey_val_bytes(&k->k)],
> > > &ptr,
> > > sizeof(ptr));
> > > k->k.u64s++;
> > > --
> > > 2.34.1
> > >
> >
>
> --
> Kees Cook
>