Re: Stack corruption in bch2_nocow_write

From: Kent Overstreet
Date: Sat Dec 30 2023 - 14:24:26 EST


On Sat, Dec 30, 2023 at 04:34:39PM +0800, Daniel J Blueman wrote:
> On Sat, 30 Dec 2023 at 02:54, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
> >
> > On Fri, Dec 29, 2023 at 07:43:13PM +0800, Daniel J Blueman wrote:
> > > Hi Kent et al,
> > >
> > > On Linux 6.7-rc7 from bcachefs master SHA f3608cbdfd built with UBSAN
> > > [1], with a crafted workload [2] I'm able to trigger stack corruption
> > > in bch2_nocow_write [3].
> > >
> > > Let me know if you can't reproduce it and I'll check reproducibility
> > > on another platform, and let me know for any patch testing.
> >
> > this should be fixed in the testing branch:
> >
> > commit ab35f724070ccdaa31f6376a1890473e7d031ed0
> > Author: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > Date: Fri Dec 29 13:54:00 2023 -0500
> >
> > bcachefs: fix nocow write path when writing to multiple extents
> >
> > Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> >
> > diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
> > index c5961bac19f0..7c5963cd0b85 100644
> > --- a/fs/bcachefs/io_write.c
> > +++ b/fs/bcachefs/io_write.c
> > @@ -1316,6 +1316,7 @@ static void bch2_nocow_write(struct bch_write_op *op)
> > closure_get(&op->cl);
> > bch2_submit_wbio_replicas(to_wbio(bio), c, BCH_DATA_user,
> > op->insert_keys.top, true);
> > + nr_buckets = 0;
> >
> > bch2_keylist_push(&op->insert_keys);
> > if (op->flags & BCH_WRITE_DONE)
>
> Thanks for the quick update, Kent.
>
> With this change and a few runs of the reproducer, I still hit this
> stack corruption with the same backtrace.

Reprod it - my first fix was bogus. Turns out I didn't consider cached
extents; those can exceed the BCH_REPLICAS_MAX limit, and there's
another issue I just spotted - the bucket invalidate path doesn't
respect nocow locking.

And I'm wondering if there's a lock inversion between nocow locks and
btree locks as well; need to add lockdep support for that.