[PATCH v13 0/9] Implement copy offload support

From: Nitesh Shetty
Date: Wed Jun 28 2023 - 04:25:42 EST


The patch series covers the points discussed in past and most recently
in LSFMM'23[0].
We have covered the initial agreed requirements in this patchset and
further additional features suggested by community.

This is next iteration of our previous patchset v12[1].
We have changed the token based approach to request based approach,
instead of storing the info in token. We now try to merge the copy bio's
in request layer and send it to driver.
So this design works only for request based storage drivers.

Overall series supports:
========================
1. Driver
- NVMe Copy command (single NS, TP 4065), including support
in nvme-target (for block and file backend).

2. Block layer
- Block-generic copy (REQ_OP_COPY_DST/SRC), operation with
interface accommodating two block-devs
- Merging copy requests in request layer
- Emulation, for in-kernel user when offload is natively
absent
- dm-linear support (for cases not requiring split)

3. User-interface
- copy_file_range

Testing
=======
Copy offload can be tested on:
a. QEMU: NVME simple copy (TP 4065). By setting nvme-ns
parameters mssrl,mcl, msrc. For more info [2].
b. Null block device
c. NVMe Fabrics loopback.
d. blktests[3] (tests block/035-038, nvme/050-053)

Emulation can be tested on any device.

fio[4].

Infra and plumbing:
===================
We populate copy_file_range callback in def_blk_fops.
For devices that support copy-offload, use __blkdev_copy_offload to
achieve in-device copy.
However for cases, where device doesn't support offload,
fallback to generic_copy_file_range.
For in-kernel users (like NVMe fabrics), we use blkdev_issue_copy
which implements its own emulation, as fd is not available.
Modify checks in generic_copy_file_range to support block-device.

Blktests[3]
======================
tests/block/035,036: Runs copy offload and emulation on block
device.
tests/block/037,038: Runs copy offload and emulation on null
block device.
tests/nvme/050-053: Create a loop backed fabrics device and
run copy offload and emulation.

Future Work
===========
- loopback device copy offload support
- upstream fio to use copy offload
- upstream blktest to test copy offload

These are to be taken up after this minimal series is agreed upon.

Additional links:
=================
[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@xxxxxxxxxxxxxx/
https://lore.kernel.org/linux-nvme/f0e19ae4-b37a-e9a3-2be7-a5afb334a5c3@xxxxxxxxxx/
https://lore.kernel.org/linux-nvme/20230113094648.15614-1-nj.shetty@xxxxxxxxxxx/
[1] https://lore.kernel.org/linux-block/20230605121732.28468-1-nj.shetty@xxxxxxxxxxx/T/#m4db1801c86a5490dc736266609f8458fd52b9eb5
[2] https://qemu-project.gitlab.io/qemu/system/devices/nvme.html#simple-copy
[3] https://github.com/nitesh-shetty/blktests/tree/feat/copy_offload/v13
[4] https://github.com/vincentkfu/fio/commits/copyoffload-3.35-v13

Changes since v12:
=================
- block,nvme: Replaced token based approach with request based
single namespace capable approach (Christoph Hellwig)

Changes since v11:
=================
- Documentation: Improved documentation (Damien Le Moal)
- block,nvme: ssize_t return values (Darrick J. Wong)
- block: token is allocated to SECTOR_SIZE (Matthew Wilcox)
- block: mem leak fix (Maurizio Lombardi)

Changes since v10:
=================
- NVMeOF: optimization in NVMe fabrics (Chaitanya Kulkarni)
- NVMeOF: sparse warnings (kernel test robot)

Changes since v9:
=================
- null_blk, improved documentation, minor fixes(Chaitanya Kulkarni)
- fio, expanded testing and minor fixes (Vincent Fu)

Changes since v8:
=================
- null_blk, copy_max_bytes_hw is made config fs parameter
(Damien Le Moal)
- Negative error handling in copy_file_range (Christian Brauner)
- minor fixes, better documentation (Damien Le Moal)
- fio upgraded to 3.34 (Vincent Fu)

Changes since v7:
=================
- null block copy offload support for testing (Damien Le Moal)
- adding direct flag check for copy offload to block device,
as we are using generic_copy_file_range for cached cases.
- Minor fixes

Changes since v6:
=================
- copy_file_range instead of ioctl for direct block device
- Remove support for multi range (vectored) copy
- Remove ioctl interface for copy.
- Remove offload support in dm kcopyd.

Changes since v5:
=================
- Addition of blktests (Chaitanya Kulkarni)
- Minor fix for fabrics file backed path
- Remove buggy zonefs copy file range implementation.

Changes since v4:
=================
- make the offload and emulation design asynchronous (Hannes
Reinecke)
- fabrics loopback support
- sysfs naming improvements (Damien Le Moal)
- use kfree() instead of kvfree() in cio_await_completion
(Damien Le Moal)
- use ranges instead of rlist to represent range_entry (Damien
Le Moal)
- change argument ordering in blk_copy_offload suggested (Damien
Le Moal)
- removed multiple copy limit and merged into only one limit
(Damien Le Moal)
- wrap overly long lines (Damien Le Moal)
- other naming improvements and cleanups (Damien Le Moal)
- correctly format the code example in description (Damien Le
Moal)
- mark blk_copy_offload as static (kernel test robot)

Changes since v3:
=================
- added copy_file_range support for zonefs
- added documentation about new sysfs entries
- incorporated review comments on v3
- minor fixes

Changes since v2:
=================
- fixed possible race condition reported by Damien Le Moal
- new sysfs controls as suggested by Damien Le Moal
- fixed possible memory leak reported by Dan Carpenter, lkp
- minor fixes

Changes since v1:
=================
- sysfs documentation (Greg KH)
- 2 bios for copy operation (Bart Van Assche, Mikulas Patocka,
Martin K. Petersen, Douglas Gilbert)
- better payload design (Darrick J. Wong)


Nitesh Shetty (9):
block: Introduce queue limits for copy-offload support
block: Add copy offload support infrastructure
block: add emulation for copy
fs, block: copy_file_range for def_blk_ops for direct block device
nvme: add copy offload support
nvmet: add copy command support for bdev and file ns
dm: Add support for copy offload
dm: Enable copy offload for dm-linear target
null_blk: add support for copy offload

Documentation/ABI/stable/sysfs-block | 33 +++
Documentation/block/null_blk.rst | 5 +
block/blk-core.c | 5 +
block/blk-lib.c | 384 +++++++++++++++++++++++++++
block/blk-map.c | 4 +-
block/blk-merge.c | 21 ++
block/blk-settings.c | 24 ++
block/blk-sysfs.c | 63 +++++
block/blk.h | 9 +
block/elevator.h | 1 +
block/fops.c | 20 ++
drivers/block/null_blk/main.c | 85 +++++-
drivers/block/null_blk/null_blk.h | 1 +
drivers/md/dm-linear.c | 1 +
drivers/md/dm-table.c | 41 +++
drivers/md/dm.c | 7 +
drivers/nvme/host/constants.c | 1 +
drivers/nvme/host/core.c | 79 ++++++
drivers/nvme/host/trace.c | 19 ++
drivers/nvme/target/admin-cmd.c | 9 +-
drivers/nvme/target/io-cmd-bdev.c | 62 +++++
drivers/nvme/target/io-cmd-file.c | 52 ++++
drivers/nvme/target/nvmet.h | 1 +
fs/read_write.c | 7 +-
include/linux/bio.h | 4 +-
include/linux/blk_types.h | 26 ++
include/linux/blkdev.h | 23 ++
include/linux/device-mapper.h | 5 +
include/linux/nvme.h | 43 ++-
include/uapi/linux/fs.h | 3 +
30 files changed, 1025 insertions(+), 13 deletions(-)


base-commit: 53cdf865f90ba922a854c65ed05b519f9d728424
--
2.35.1.500.gb896f729e2