Re: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

From: David Sterba
Date: Tue May 09 2023 - 19:42:32 EST


On Tue, May 09, 2023 at 05:57:16AM -1000, Tejun Heo wrote:
> Hello, David.
>
> Thanks for taking a look.
>
> On Tue, May 09, 2023 at 04:53:32PM +0200, David Sterba wrote:
> > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > > index 59ea049fe7ee..32d08aed88b6 100644
> > > --- a/fs/btrfs/disk-io.c
> > > +++ b/fs/btrfs/disk-io.c
> > > @@ -2217,7 +2217,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
> > > fs_info->qgroup_rescan_workers =
> > > btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
> > > fs_info->discard_ctl.discard_workers =
> > > - alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
> > > + alloc_ordered_workqueue("btrfs_discard", WQ_FREEZABLE);
> > >
> > > if (!(fs_info->workers && fs_info->hipri_workers &&
> > > fs_info->delalloc_workers && fs_info->flush_workers &&
> >
> > I think there are a few more conversions missing. There's a local flags
> > variable in btrfs_init_workqueues
> >
> > 2175 static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
> > 2176 {
> > 2177 u32 max_active = fs_info->thread_pool_size;
> > 2178 unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
> >
> > And used like
> >
> > 2194 fs_info->fixup_workers =
> > 2195 btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0);
> >
> > 2213 fs_info->qgroup_rescan_workers =
> > 2214 btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
>
> Right you are.
>
> > WQ_UNBOUND is not mentioned explicitliy like for the "btrfs_discard"
> > workqueue. Patch v2 did the switch in btrfs_alloc_workqueue according
> > to the max_active/limit_active parameter but this would affect all
> > queues and not all of them require to be ordered.
>
> The thresh mechanism which auto adjusts max active means that the workqueues
> allocated btrfs_alloc_workqueue() can't be ordered, right? When thresh is
> smaller than DFT_THRESHOLD, the mechanism is disabled but that looks like an
> optimization.

Yeah I think so but I'm not entierly sure. The ordering for all queues
that don't start with max_active > 1 should not be required, here the
parallelization and out of order processing is expected and serialized
or decided once the work is done.

> > In btrfs_resize_thread_pool the workqueue_set_max_active is called
> > directly or indirectly so this can set the max_active to a user-defined
> > mount option. Could this be a problem or trigger a warning? This would
> > lead to max_active==1 + WQ_UNBOUND.
>
> That's not a problem. The only thing we need to make sure is that the
> workqueues which actually *must* be ordered use alloc_ordered_workqueue() as
> they won't be implicitly treated as ordered in the future.
>
> * The current patch converts two - fs_info->discard_ctl.discard_workers and
> scrub_workers when @is_dev_replace is set. Do they actually need to be
> ordered?
>
> * As you pointed out, fs_info->fixup_workers and
> fs_info->qgroup_rescan_workers are also currently implicitly ordered. Do
> they actually need to be ordered?

I think all of them somehow implictly depend on the ordering. The
replace process sequentially goes over a block group and copies blocks.

The fixup process is quite obscure and we should preserve the semantics
as much as possible. It has something to do with pages that get out of
sync with extent state without btrfs knowing and that there are more such
requests hapenning at the same time is low but once it happens it can
lead to corruptions.

Quota rescan is in its nature also a sequential process but I think it
does not need to be ordered, it's started from higher level context like
enabling quotas or rescan but there are also calls at remount time so
this makes it less clear.

In summary, if the ordered queue could be used then I'd recommend to do
it as the safe option.