Re: [PATCH 5/5] tun: vringfd xmit support.

From: Rusty Russell
Date: Fri Apr 18 2008 - 11:15:37 EST


On Friday 18 April 2008 21:31:20 Andrew Morton wrote:
> On Fri, 18 Apr 2008 14:43:24 +1000 Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote:
> > + /* How many pages will this take? */
> > + npages = 1 + (base + len - 1)/PAGE_SIZE - base/PAGE_SIZE;
>
> Brain hurts. I hope you got that right.

I tested it when I wrote it, but just wrote a tester again:

base len npages
0 1 1
0xfff 1 1
0x1000 1 1
0 4096 1
0x1 4096 2
0xfff 4096 2
0x1000 4096 1
0xfffff000 4096 1
0xfffff000 4097 4293918722

> > + if (unlikely(num_pg + npages > MAX_SKB_FRAGS)) {
> > + err = -ENOSPC;
> > + goto fail;
> > + }
> > + n = get_user_pages(current, current->mm, base, npages,
> > + 0, 0, pages, NULL);
>
> What is the maximum numbet of pages which an unpriviliged user can
> concurrently pin with this code?

Since only root can open the tun device, it's currently OK. The old code
kmalloced and copied: is there some mm-fu reason why pinning userspace memory
is worse?

But I actually think it's OK even for non-root, since these become skbs, which
means they either go into an outgoing device queue or a socket queue which is
accounted for exactly for this reason.

> > + if (unlikely(n < 0)) {
> > + err = n;
> > + goto fail;
> > + }
> > +
> > + /* Transfer pages to the frag array */
> > + for (j = 0; j < n; j++) {
> > + f[num_pg].page = pages[j];
> > + if (j == 0) {
> > + f[num_pg].page_offset = offset_in_page(base);
> > + f[num_pg].size = min(len, PAGE_SIZE -
> > + f[num_pg].page_offset);
> > + } else {
> > + f[num_pg].page_offset = 0;
> > + f[num_pg].size = min(len, PAGE_SIZE);
> > + }
> > + len -= f[num_pg].size;
> > + base += f[num_pg].size;
> > + num_pg++;
> > + }
>
> This loop is a fancy way of doing
>
> num_pg = n;

Damn, you had me reworking this until I realized why. It's not: we're
inside a loop, doing one iovec array element at a time.

> > + if (unlikely(n != npages)) {
> > + err = -EFAULT;
> > + goto fail;
> > + }
>
> why not do this immediately after running get_user_pages()?

To simplify the failure path. Hmm, I would use release_pages here...

> > +fail:
> > + for (i = 0; i < num_pg; i++)
> > + put_page(f[i].page);
>
> release_pages() could be a tad more efficient, but it's only error-path.

... but I didn't know that existed. Had to include pagemap.h, and it's not
exported. It seems to be a useful interface; see patch.

Cheers,
Rusty.

Subject: Export release_pages; nice undo for get_user_pages.

Andrew Morton suggests tun/tap use release_pages, but it's not
exported. It's not clear to me why this is in swap.c, but it exists
even without CONFIG_SWAP, so that's OK.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

diff -r abd2ad431e5c mm/swap.c
--- a/mm/swap.c Sat Apr 19 00:34:54 2008 +1000
+++ b/mm/swap.c Sat Apr 19 01:11:40 2008 +1000
@@ -346,6 +346,7 @@ void release_pages(struct page **pages,

pagevec_free(&pages_to_free);
}
+EXPORT_SYMBOL(release_pages);

/*
* The pages which we're about to release may be in the deferred lru-addition
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/