Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6

From: Coly Li
Date: Thu Jan 06 2022 - 11:17:45 EST


On 1/6/22 11:29 AM, Eric Wheeler wrote:
On Tue, 25 Jun 2019, Coly Li wrote:
On 2019/6/25 2:14 上午, Eric Wheeler wrote:
On Mon, 24 Jun 2019, Coly Li wrote:

On 2019/6/23 7:16 上午, Eric Wheeler wrote:
From: Eric Wheeler <git@xxxxxxxxxxxxxxxxxx>

While some drivers set queue_limits.io_opt (e.g., md raid5), there are
currently no SCSI/RAID controller drivers that do. Previously stripe_size
and partial_stripes_expensive were read-only values and could not be
tuned by users (eg, for hardware RAID5/6).

This patch enables users to save the optimal IO size via sysfs through
the backing device attributes stripe_size and partial_stripes_expensive
into the bcache superblock.

Superblock changes are backwards-compatable:

* partial_stripes_expensive: One bit was used in the superblock flags field

* stripe_size: There are eight 64-bit "pad" fields for future use in
the superblock which default to 0; from those, 32-bits are now used
to save the stripe_size and load at device registration time.

Signed-off-by: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>
Hi Eric,

In general I am OK with this patch. Since Peter comments lots of SCSI
RAID devices reports a stripe width, could you please list the hardware
raid devices which don't list stripe size ? Then we can make decision
whether it is necessary to have such option enabled.
Perhaps they do not set stripe_width using io_opt? I did a grep to see if
any of them did, but I didn't see them. How is stripe_width indicated by
RAID controllers?

If they do set io_opt, then at least my Areca 1883 does not set io_opt as
of 4.19.x. I also have a LSI MegaRAID 3108 which does not report io_opt as
of 4.1.x, but that is an older kernel so maybe support has been added
since then.

Martin,

Where would stripe_width be configured in the SCSI drivers? Is it visible
through sysfs or debugfs so I can check my hardware support without
hacking debugging the kernel?

Another point is, this patch changes struct cache_sb, it is no problem
to change on-disk format. I plan to update the super block version soon,
to store more configuration persistently into super block. stripe_size
can be added to cache_sb with other on-disk changes.
Hi Eric,

Maybe bumping version makes sense, but even if you do not, this is safe to
use on systems without bumping the version because the values are unused
and default to 0.
Yes, I understand you, it works as you suggested. I need to think how to
organize all options in struct cache_sb, stripe_size will be arranged
then. And I will ask help to you for reviewing the changes of on-disk
format.
Hi Coli,

Just checking in, its been a while and I didn't see any more discussion on
the topic:

Hi Eric,

Thank you for reminding me. The persistent on-disk options were that much as I thought, so using a reserved space from the on-disk super block is fine.

This would benefit users with older RAID controllers using RAID-5/6 that
don't set io_opt.

Even new new RAID controlers that _do_ provide `io_opt` still do _not_
indicate partial_stripes_expensive (which is an mdraid feature, but Martin
please correct me if I'm wrong here). Thus, all hardware RAID-5/6 users
could benefit by manually flagging partial_stripes_expensive to get burst
writes out of bcache that fit their stride width.

Yeah, I agree with you.

This patch probably needs rebased and documentation updated about io_opt,
but here is the original patch with documentation for your reference:
https://lkml.org/lkml/2019/6/22/298

What do you think?

Yes please rebase the patch with latest mainline kernel and let's start the review.

Thank you.

Coly Li