Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspaceverbs implementation

From: Roland Dreier
Date: Tue Apr 26 2005 - 16:24:35 EST


Andrew> Well I was vaguely proposing that the userspace library
Andrew> keep track of the byteranges and the underlying page
Andrew> states. So in the above scenario userspace would leave
Andrew> the page at 0x1000 registered until all registrations
Andrew> against that page have been undone.

OK, I already have code in userspace that keeps reference counts for
overlapping regions, etc. However I'm not sure how to tie this in
with reliable accounting of pinned memory -- we don't want malicious
userspace code to be able fool the accounting, right?

So I'm still trying to puzzle out what to do. I don't want to keep a
complicated data structure in the kernel keeping track of what memory
has been registered. Right now, I just keep a list of structs, one
for each region, and when a process dies, I just go through region by
region and do a put_page() to balance off the get_user_pages().

However I don't see how to make it work if I put the reference
counting for overlapping regions in userspace but when I want mlock()
accounting in the kernel. If a buggy/malicious app does:

a) register from 0x0000 to 0x2fff
b) register from 0x1000 to 0x1fff
c) unregister from 0x0000 to 0x2fff

then it seems the kernel is screwed unless it counts how many times a
vma has been pinned. And adding a pin_count member to vm_struct seems
like a pretty damn major step.

We definitely have to make sure that userspace is never able to either
unpin a page that is still registered with RDMA hardware, because that
can lead to DMA to into memory that someone else owns. On the other
hand, we don't want userspace to be able to defeat resource accounting
by tricking the kernel into keeping page_count elevated after it
credits the memory back to a process's limit on locked pages.

The limit on the number of locked pages seems like a natural thing to
check against, but perhaps we need a different limit for the number of
pages pinned for use by RDMA hardware. Sort of the same way that
there's a separate limit on the number of in-flight aios.

- R.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/