Re: [patch v3] splice: fix race with page invalidation

From: Linus Torvalds
Date: Thu Jul 31 2008 - 14:59:11 EST




On Thu, 31 Jul 2008, Jamie Lokier wrote:
>
> But did you miss the bit where you DON'T COPY ANYTHING EVER*? COW is
> able provide _correctness_ for the rare corner cases which you're not
> optimising for. You don't actually copy more than 0.0% (*approx).

The thing is, just even _marking_ things COW is the expensive part. If we
have to walk page tables - we're screwed.

> The cost of COW is TLB flushes*. But for splice, there ARE NO TLB
> FLUSHES because such files are not mapped writable!

For splice, there are also no flags to set, no extra tracking costs, etc
etc.

But yes, we could make splice (from a file) do something like

- just fall back to copy if the page is already mapped (page->mapcount
gives us that)

- set a bit ("splicemapped") when we splice it in, and increment
page->mapcount for each splice copy.

- if a "splicemapped" page is ever mmap'ed or written to (either through
write or truncate), we COW it then (and actually move the page cache
page - it would be a "woc": a reverse cow, not a normal one).

- do all of this with page lock held, to make sure that there are no
writers or new mappers happening.

So it's probably doable.

(We could have a separate "splicecount", and actually allow non-writable
mappings, but I suspect we cannot afford the space in teh "struct space"
for a whole new count).

> You're missing the real point of network splice().
>
> It's not just for speed.
>
> It's for sharing data. Your TCP buffers can share data, when the same
> big lump is in flight to lots of clients. Think static file / web /
> FTP server, the kind with 80% of hits to 0.01% of the files roughly
> the same of your RAM.

Maybe. Does it really show up as a big thing?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/