Re: [RFC] extending splice for copy offloading

From: Ric Wheeler
Date: Mon Sep 30 2013 - 11:42:32 EST


On 09/30/2013 10:38 AM, Miklos Szeredi wrote:
On Mon, Sep 30, 2013 at 4:28 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote:
On 09/30/2013 10:24 AM, Miklos Szeredi wrote:
On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote:
On 09/30/2013 10:51 AM, Miklos Szeredi wrote:
On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx>
wrote:
My other worry is about interruptibility/restartability. Ideas?

What happens on splice(from, to, 4G) and it's a non-reflink copy?
Can the page cache copy be made restartable? Or should splice() be
allowed to return a short count? What happens on (non-reflink) remote
copies and huge request sizes?
If I were writing an application that required copies to be
restartable,
I'd probably use the largest possible range in the reflink case but
break the copy into smaller chunks in the splice case.

The app really doesn't want to care about that. And it doesn't want
to care about restartability, etc.. It's something the *kernel* has
to care about. You just can't have uninterruptible syscalls that
sleep for a "long" time, otherwise first you'll just have annoyed
users pressing ^C in vain; then, if the sleep is even longer, warnings
about task sleeping too long.

One idea is letting splice() return a short count, and so the app can
safely issue SIZE_MAX requests and the kernel can decide if it can
copy the whole file in one go or if it wants to do it in smaller
chunks.

You cannot rely on a short count. That implies that an offloaded copy
starts
at byte 0 and the short count first bytes are all valid.
Huh?

- app calls splice(from, 0, to, 0, SIZE_MAX)
1) VFS calls ->direct_splice(from, 0, to, 0, SIZE_MAX)
1.a) fs reflinks the whole file in a jiffy and returns the size of
the file
1 b) fs does copy offload of, say, 64MB and returns 64M
2) VFS does page copy of, say, 1MB and returns 1MB
- app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset
...

The point is: the app is always doing the same (incrementing offset
with the return value from splice) and the kernel can decide what is
the best size it can service within a single uninterruptible syscall.

Wouldn't that work?

No.

Keep in mind that the offload operation in (1) might fail partially. The
target file (the copy) is allocated, the question is what ranges have valid
data.
You are talking about case 1.a, right? So if the offload copy 0-64MB
fails partially, we return failure from splice, yet some of the copy
did succeed. Is that the problem? Why?

Thanks,
Miklos

The way the array based offload (and some software side reflink works) is not a byte by byte copy. We cannot assume that a valid count can be returned or that such a count would be an indication of a sequential segment of good data. The whole thing would normally have to be reissued.

To make that a true assumption, you would have to mandate that in each of the specifications (and sw targets)...

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/