[PATCH v14 00/11] Implement copy offload support

From: Nitesh Shetty
Date: Fri Aug 11 2023 - 07:20:00 EST


The patch series covers the points discussed in past and most recently
in LSFMM'23[0].
We have covered the initial agreed requirements in this patch set and
further additional features suggested by community.

This is next iteration of our previous patch set v13[1].
We achieve copy offload by sending 2 bio's with source and destination
info and merge them to form a request. This request is sent to driver.
So this design works only for request based storage drivers.

Overall series supports:
========================
1. Driver
- NVMe Copy command (single NS, TP 4065), including support
in nvme-target (for block and file back end).

2. Block layer
- Block-generic copy (REQ_OP_COPY_DST/SRC), operation with
interface accommodating two block-devs
- Merging copy requests in request layer
- Emulation, for in-kernel user when offload is natively
absent
- dm-linear support (for cases not requiring split)

3. User-interface
- copy_file_range

Testing
=======
Copy offload can be tested on:
a. QEMU: NVME simple copy (TP 4065). By setting nvme-ns
parameters mssrl,mcl, msrc. For more info [2].
b. Null block device
c. NVMe Fabrics loopback.
d. blktests[3]

Emulation can be tested on any device.

fio[4].

Infra and plumbing:
===================
We populate copy_file_range callback in def_blk_fops.
For devices that support copy-offload, use blkdev_copy_offload to
achieve in-device copy.
However for cases, where device doesn't support offload,
fallback to generic_copy_file_range.
For in-kernel users (like NVMe fabrics), use blkdev_copy_offload
if device is copy offload capable or else fallback to emulation
using blkdev_copy_emulation.
Modify checks in generic_copy_file_range to support block-device.

Blktests[3]
======================
tests/block/035-040: Runs copy offload and emulation on null
block device.
tests/block/050,055: Runs copy offload and emulation on test
nvme block device.
tests/nvme/056-067: Create a loop backed fabrics device and
run copy offload and emulation.

Future Work
===========
- loopback device copy offload support
- upstream fio to use copy offload
- upstream blktest to test copy offload
- update man pages for copy_file_range
- expand in-kernel users of copy offload

These are to be taken up after this minimal series is agreed upon.

Additional links:
=================
[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@xxxxxxxxxxxxxx/
https://lore.kernel.org/linux-nvme/f0e19ae4-b37a-e9a3-2be7-a5afb334a5c3@xxxxxxxxxx/
https://lore.kernel.org/linux-nvme/20230113094648.15614-1-nj.shetty@xxxxxxxxxxx/
[1] https://lore.kernel.org/linux-nvme/20230627183629.26571-1-nj.shetty@xxxxxxxxxxx/
[2] https://qemu-project.gitlab.io/qemu/system/devices/nvme.html#simple-copy
[3] https://github.com/nitesh-shetty/blktests/tree/feat/copy_offload/v14
[4] https://github.com/OpenMPDK/fio/tree/copyoffload-3.35-v14

Changes since v13:
=================
- block:
1. Simplified copy offload and emulation helpers, now
caller needs to decide between offload/emulation fallback
2. src,dst bio order change (Christoph Hellwig)
3. refcount changes similar to dio (Christoph Hellwig)
4. Single outstanding IO for copy emulation (Christoph Hellwig)
5. use copy_max_sectors to identify copy offload
capability and other reviews (Damien, Christoph)
6. Return status in endio handler (Christoph Hellwig)
- nvme-fabrics: fallback to emulation in case of partial
offload completion
- in kernel user addition (Ming lei)
- indentation, documentation, minor fixes, misc changes (Damien,
Christoph)
- blktests changes to test kernel changes

Changes since v12:
=================
- block,nvme: Replaced token based approach with request based
single namespace capable approach (Christoph Hellwig)

Changes since v11:
=================
- Documentation: Improved documentation (Damien Le Moal)
- block,nvme: ssize_t return values (Darrick J. Wong)
- block: token is allocated to SECTOR_SIZE (Matthew Wilcox)
- block: mem leak fix (Maurizio Lombardi)

Changes since v10:
=================
- NVMeOF: optimization in NVMe fabrics (Chaitanya Kulkarni)
- NVMeOF: sparse warnings (kernel test robot)

Changes since v9:
=================
- null_blk, improved documentation, minor fixes(Chaitanya Kulkarni)
- fio, expanded testing and minor fixes (Vincent Fu)

Changes since v8:
=================
- null_blk, copy_max_bytes_hw is made config fs parameter
(Damien Le Moal)
- Negative error handling in copy_file_range (Christian Brauner)
- minor fixes, better documentation (Damien Le Moal)
- fio upgraded to 3.34 (Vincent Fu)

Changes since v7:
=================
- null block copy offload support for testing (Damien Le Moal)
- adding direct flag check for copy offload to block device,
as we are using generic_copy_file_range for cached cases.
- Minor fixes

Changes since v6:
=================
- copy_file_range instead of ioctl for direct block device
- Remove support for multi range (vectored) copy
- Remove ioctl interface for copy.
- Remove offload support in dm kcopyd.

Changes since v5:
=================
- Addition of blktests (Chaitanya Kulkarni)
- Minor fix for fabrics file backed path
- Remove buggy zonefs copy file range implementation.

Changes since v4:
=================
- make the offload and emulation design asynchronous (Hannes
Reinecke)
- fabrics loopback support
- sysfs naming improvements (Damien Le Moal)
- use kfree() instead of kvfree() in cio_await_completion
(Damien Le Moal)
- use ranges instead of rlist to represent range_entry (Damien
Le Moal)
- change argument ordering in blk_copy_offload suggested (Damien
Le Moal)
- removed multiple copy limit and merged into only one limit
(Damien Le Moal)
- wrap overly long lines (Damien Le Moal)
- other naming improvements and cleanups (Damien Le Moal)
- correctly format the code example in description (Damien Le
Moal)
- mark blk_copy_offload as static (kernel test robot)

Changes since v3:
=================
- added copy_file_range support for zonefs
- added documentation about new sysfs entries
- incorporated review comments on v3
- minor fixes

Changes since v2:
=================
- fixed possible race condition reported by Damien Le Moal
- new sysfs controls as suggested by Damien Le Moal
- fixed possible memory leak reported by Dan Carpenter, lkp
- minor fixes

Changes since v1:
=================
- sysfs documentation (Greg KH)
- 2 bios for copy operation (Bart Van Assche, Mikulas Patocka,
Martin K. Petersen, Douglas Gilbert)
- better payload design (Darrick J. Wong)

Anuj Gupta (1):
fs/read_write: Enable copy_file_range for block device.

Nitesh Shetty (10):
block: Introduce queue limits and sysfs for copy-offload support
Add infrastructure for copy offload in block and request layer.
block: add copy offload support
block: add emulation for copy
fs, block: copy_file_range for def_blk_ops for direct block device
nvme: add copy offload support
nvmet: add copy command support for bdev and file ns
dm: Add support for copy offload
dm: Enable copy offload for dm-linear target
null_blk: add support for copy offload

Documentation/ABI/stable/sysfs-block | 23 ++
Documentation/block/null_blk.rst | 5 +
block/blk-core.c | 7 +
block/blk-lib.c | 419 +++++++++++++++++++++++++++
block/blk-merge.c | 41 +++
block/blk-settings.c | 24 ++
block/blk-sysfs.c | 36 +++
block/blk.h | 16 +
block/elevator.h | 1 +
block/fops.c | 25 ++
drivers/block/null_blk/main.c | 99 ++++++-
drivers/block/null_blk/null_blk.h | 1 +
drivers/block/null_blk/trace.h | 23 ++
drivers/md/dm-linear.c | 1 +
drivers/md/dm-table.c | 37 +++
drivers/md/dm.c | 7 +
drivers/nvme/host/constants.c | 1 +
drivers/nvme/host/core.c | 79 +++++
drivers/nvme/host/trace.c | 19 ++
drivers/nvme/target/admin-cmd.c | 9 +-
drivers/nvme/target/io-cmd-bdev.c | 97 +++++++
drivers/nvme/target/io-cmd-file.c | 50 ++++
drivers/nvme/target/nvmet.h | 4 +
drivers/nvme/target/trace.c | 19 ++
fs/read_write.c | 8 +-
include/linux/bio.h | 6 +-
include/linux/blk_types.h | 10 +
include/linux/blkdev.h | 22 ++
include/linux/device-mapper.h | 3 +
include/linux/nvme.h | 43 ++-
30 files changed, 1119 insertions(+), 16 deletions(-)


base-commit: f7dc24b3413851109c4047b22997bd0d95ed52a2
--
2.35.1.500.gb896f729e2