[RFC PATCH 0/1] Introduce a new target: lzbd - LightNVM Zoned Block Device

From: hans
Date: Thu Apr 18 2019 - 08:02:17 EST


From: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>

Introduce a new target: lzbd - LightNVM Zoned Block Device

The new target makes it possible to expose an
Open-Channel 2.0 SSD as one or more zoned block devices exposing
BLK_ZONE_TYPE_SEQWRITE_REQ zones.

I've been playing around with this the last couple of months and
now I'd love to get some feedback.

It's very been useful to look at null_blk's zone support when
doing the plumbing work and Simon and Klaus has also been very helpful
when figuring out the design. Thanks guys!

Naming is sometimes the hardest thing. I named this thing lzbd, as
I found that most descriptive acronym.

NOTE: This is an early prototype and lacking some vital
features at the moment. It is worth looking at and playing
around with for those interested, but beware of dragons :)

See the lzbd documentation(Documentation/lightnvm/lzbd.txt) for my ideas on how
a full implementation would look like.

What is supported(for now):

* Reads
* Sequential writes
* Unaligned writes (a per-zone ws_opt alignment buffer is used)
* Zone resets
* Zone reporting
* Wear leveling(sort of, wear indices are not upated on reset yet)

I've mainly tested in QEMU (cunits=0, ws_min=4, ws_opt=8).

The zoned block device tests in blktests (tests/zbd) passes, and I've done
a bunch of general smoke testing(aligned/unaligned writes with verification
using dd and fio, ..), so the general plumbing seems to hold up, but
more testing is needed.

Performance is definately not what it should be yet. Only one chunk per zone
is being written to at a time, effectively rate-limiting writes per zone,
which is an interesting constraint, but probably not what we want.

What is not supported(yet):

* Metadata persistance (when the instance is removed, data is lost)
- Zone to chunks mapping needs to be stored

* Sync handling (flushing alignment buffers)
- Zone Aligment buffer needs to be flushed to disk

* Write error handling
- Write errors will require zone -> chunk remapping
of the failing chunk.

* Chuck reset error handling (chunks going offline)
* Updating wear indices on chunk resets
- This is low hanging fruit to fix

* Cunits read buffering


Final thoughts, for now:

Since lzbd (and pblk for that matter) are not entirely unlike file systems,
it would be nice to create a mkfs/fsck/dmzadm-like tool that would:

* Format the drive and persist instance configuration in a superblock
contained in the instance metadata.
* Repair broken(i.e. powerfailed) instances
Per-sector metadata is currently not utilized in lzbd, but would
be helpful in recovery scenarios.


The patch is based on Matias for5.2/core branch in the github
openchannel project. It is also available at [1] (branch for-5.2/lzbd)


Thanks,
Hans

[1] CNEX Labs linux github project: https://github.com/CNEX-Labs/linux


Hans Holmberg (1):
lightnvm: add lzbd - a zoned block device target

Documentation/lightnvm/lzbd.txt | 122 +++++++++++
drivers/lightnvm/Kconfig | 11 +
drivers/lightnvm/Makefile | 3 +
drivers/lightnvm/lzbd-io.c | 342 +++++++++++++++++++++++++++++++
drivers/lightnvm/lzbd-target.c | 392 +++++++++++++++++++++++++++++++++++
drivers/lightnvm/lzbd-user.c | 310 ++++++++++++++++++++++++++++
drivers/lightnvm/lzbd-zone.c | 444 ++++++++++++++++++++++++++++++++++++++++
drivers/lightnvm/lzbd.h | 139 +++++++++++++
8 files changed, 1763 insertions(+)
create mode 100644 Documentation/lightnvm/lzbd.txt
create mode 100644 drivers/lightnvm/lzbd-io.c
create mode 100644 drivers/lightnvm/lzbd-target.c
create mode 100644 drivers/lightnvm/lzbd-user.c
create mode 100644 drivers/lightnvm/lzbd-zone.c
create mode 100644 drivers/lightnvm/lzbd.h

--
2.7.4