Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.

From: Michael S. Tsirkin
Date: Mon Sep 06 2010 - 07:17:46 EST


So - does this driver help reduce service demand signifiantly?
Some comments from looking at the code:

On Fri, Aug 06, 2010 at 05:23:41PM +0800, xiaohui.xin@xxxxxxxxx wrote:
> +static struct page_info *alloc_page_info(struct page_ctor *ctor,
> + struct kiocb *iocb, struct iovec *iov,
> + int count, struct frag *frags,
> + int npages, int total)
> +{
> + int rc;
> + int i, j, n = 0;
> + int len;
> + unsigned long base, lock_limit;
> + struct page_info *info = NULL;
> +
> + lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> + lock_limit >>= PAGE_SHIFT;

Playing with rlimit on data path, transparently to the application in this way
looks strange to me, I suspect this has unexpected security implications.
Further, applications may have other uses for locked memory
besides mpassthru - you should not just take it because it's there.

Can we have an ioctl that lets userspace configure how much
memory to lock? This ioctl will decrement the rlimit and store
the data in the device structure so we can do accounting
internally. Put it back on close or on another ioctl.
Need to be careful for when this operation gets called
again with 0 or another small value while we have locked memory -
maybe just fail with EBUSY? or wait until it gets unlocked?
Maybe 0 can be special-cased and deactivate zero-copy?.


> +
> + if (ctor->lock_pages + count > lock_limit && npages) {
> + printk(KERN_INFO "exceed the locked memory rlimit.");
> + return NULL;
> + }
> +
> + info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL);

You seem to fill in all memory, why zalloc? this is data path ...

> +
> + if (!info)
> + return NULL;
> +
> + for (i = j = 0; i < count; i++) {
> + base = (unsigned long)iov[i].iov_base;
> + len = iov[i].iov_len;
> +
> + if (!len)
> + continue;
> + n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> +
> + rc = get_user_pages_fast(base, n, npages ? 1 : 0,

npages controls whether this is a write? Why?

> + &info->pages[j]);
> + if (rc != n)
> + goto failed;
> +
> + while (n--) {
> + frags[j].offset = base & ~PAGE_MASK;
> + frags[j].size = min_t(int, len,
> + PAGE_SIZE - frags[j].offset);
> + len -= frags[j].size;
> + base += frags[j].size;
> + j++;
> + }
> + }
> +
> +#ifdef CONFIG_HIGHMEM
> + if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
> + for (i = 0; i < j; i++) {
> + if (PageHighMem(info->pages[i]))
> + goto failed;
> + }
> + }
> +#endif

Are non-highdma devices worth bothering with? If yes -
are there other limitations devices might have that we need to handle?
E.g. what about non-s/g devices or no checksum offloading?

> + skb_push(skb, ETH_HLEN);
> +
> + if (skb_is_gso(skb)) {
> + hdr.hdr.hdr_len = skb_headlen(skb);
> + hdr.hdr.gso_size = shinfo->gso_size;
> + if (shinfo->gso_type & SKB_GSO_TCPV4)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> + else if (shinfo->gso_type & SKB_GSO_TCPV6)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> + else if (shinfo->gso_type & SKB_GSO_UDP)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
> + else
> + BUG();
> + if (shinfo->gso_type & SKB_GSO_TCP_ECN)
> + hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
> +
> + } else
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE;
> +
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> + hdr.hdr.csum_start =
> + skb->csum_start - skb_headroom(skb);
> + hdr.hdr.csum_offset = skb->csum_offset;
> + }

We have this code in tun, macvtap and packet socket already.
Could this be a good time to move these into networking core?
I'm not asking you to do this right now, but could this generic
virtio-net to skb stuff be encapsulated in functions?

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/