Correct method of pinning pages for zero-copy

From: Steffen Persvold (sp@scali.com)
Date: Wed Jun 04 2003 - 16:34:31 EST


Dear all,

I've been struggeling for some time with a small driver doing zerocopy
network I/O. I can't seem to find the correct method of pinning down the
pages and releasing them again. I've tried get_user_pages directly and
also kiobufs. What I always seem to get into is :

  1. the pinning doesn't stick through a fork() (COW). I somehow solved
     this by setting the VM_RESERVED bit in the respective VMAs.

  2 when the skb stack releases the data after transmit and _if_ the
     application has decremented the page referenceces (exited or
     munmapped), the page somehow get the LRU bit set, and __free_pages_ok
     barfs at line 95 (2.4.20) :

        if (PageLRU(page)) {
                if (unlikely(in_interrupt()))
->> BUG();
                lru_cache_del(page);
        }

     A typical Oops looks like this (this is a RH kernel so the line
     number in page_alloc.c is different from plain 2.4.20) :

kernel BUG at page_alloc.c:97!
invalid operand: 0000
scadet nfs lockd sunrpc sg esm autofs tg3 iptable_filter ip_tables mousedev keybdev hid input usb-ohci usbcore ext3 jbd aic7xxx sd_mod scsi_mod
CPU: 0
EIP: 0010:[<c01439f4>] Tainted: PF
EFLAGS: 00010202

EIP is at __free_pages_ok [kernel] 0x364 (2.4.20-18.8smp)
eax: 00000001 ebx: c1d63518 ecx: 00000000 edx: 00000000
esi: f0c9f580 edi: 00000000 ebp: 00000000 esp: f7fa7f24
ds: 0018 es: 0018 ss: 0018
Process ksoftirqd_CPU0 (pid: 3, stackpage=f7fa7000)
Stack: 0000003e f3847c00 00000286 f10b9b80 f3847c00 c36b0018 f3840018 ffffff1c
       c013dd49 00000010 00000002 f0c9f580 00000000 f10b9b80 c020421e f10b9b80
       f0c9f580 f0c9f580 c0204257 f0c9f580 00000001 f0c9f580 f0c9f580 c02043c6
Call Trace: [<c013dd49>] kfree [kernel] 0x59 (0xf7fa7f44))
[<c020421e>] skb_release_data [kernel] 0x6e (0xf7fa7f5c))
[<c0204257>] kfree_skbmem [kernel] 0x17 (0xf7fa7f6c))
[<c02043c6>] __kfree_skb [kernel] 0x106 (0xf7fa7f80))
[<c0208fd7>] net_tx_action [kernel] 0x57 (0xf7fa7f94))
[<c0126129>] do_softirq [kernel] 0xd9 (0xf7fa7fb0))
[<c01266a5>] ksoftirqd [kernel] 0xe5 (0xf7fa7fcc))
[<c0105000>] stext [kernel] 0x0 (0xf7fa7fe8))
[<c010758e>] arch_kernel_thread [kernel] 0x2e (0xf7fa7ff0))
[<c01265c0>] ksoftirqd [kernel] 0x0 (0xf7fa7ff8))

I've also been looking at the implementation of mlock(), but I'm not
sure if it will handle the issues above.

I'll appreciate any hints you may have.

Regards,
Steffen Persvold

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jun 07 2003 - 22:00:25 EST