Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of userspace data

From: Vladislav Bolkhovitin
Date: Tue Dec 23 2008 - 14:15:17 EST


Rusty Russell, on 12/22/2008 03:43 AM wrote:
On Sunday 21 December 2008 06:09:18 Jeremy Fitzhardinge wrote:
Evgeniy Polyakov wrote:
Things should work fine, since pskb_expand_head() copies whole shared
info structure (and thus will copy destructor), get all pages and then
copy all pointers into the new skb, and then release old skb's data.

So destructor for the pages should not rely on which skb it is called on
and check if pages are about to be really freed (i.e. check theirs
reference counter).
OK.

__pskb_pull_tail() is tricky, it just puts some pages it does not want
to be present in the skb, but it could be possible to add there
destructor callback from the original skb with partial flag (or just
having destructor with two parameters: skb and page, and if page is not
NULL, then actually only given page is freed, otherwise the whole skb).
Yes, that doesn't sound too bad.

That would be one approach. Actually, my patch solved this by keeping a
parent ref in various cases if the parent had a destructor: we only destroy
the parent when all the clones are gone.

Here's the patch for reference:

net: add destructor for skb data.

If we want to notify something when an skb is truly finished (such as
for tun vringfd support), we need a destructor on the data.

This turns out to be slightly non-trivial as fragments from one skb
get copied to another skb: if the first skb has a destructor (or its
parent does) we need to keep a reference to it and destroy it only
when (all the) children are destroyed. We add an 'orig' pointer to
the skb_shared_info to do this.

But there's currently no way to get from the shinfo to the head (to
kfree it), so we add a 'len' field. A better alternative to this
might be to move the skb_shared_info to before the head of the skb data.

Note that the destructor is responsible for calling kfree: for the tun
device, this is critical since the destructor can be called from any
context and it has to do a copy_to_user, so it queues the skb.

Rusty,

Can you describe how one should use your patch, please? Maybe, there is some code you use to test it?

Thanks,
Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/