RE: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.

From: Xin, Xiaohui
Date: Thu Sep 16 2010 - 23:17:02 EST


>From: Michael S. Tsirkin [mailto:mst@xxxxxxxxxx]
>Sent: Wednesday, September 15, 2010 7:28 PM
>To: Xin, Xiaohui
>Cc: netdev@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>mingo@xxxxxxx; davem@xxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxxx;
>jdike@xxxxxxxxxxxxxxx
>Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
>
>On Wed, Sep 15, 2010 at 11:13:44AM +0800, Xin, Xiaohui wrote:
>> >From: Michael S. Tsirkin [mailto:mst@xxxxxxxxxx]
>> >Sent: Sunday, September 12, 2010 9:37 PM
>> >To: Xin, Xiaohui
>> >Cc: netdev@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>> >mingo@xxxxxxx; davem@xxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxxx;
>> >jdike@xxxxxxxxxxxxxxx
>> >Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
>> >
>> >On Sat, Sep 11, 2010 at 03:41:14PM +0800, Xin, Xiaohui wrote:
>> >> >>Playing with rlimit on data path, transparently to the application in this way
>> >> >>looks strange to me, I suspect this has unexpected security implications.
>> >> >>Further, applications may have other uses for locked memory
>> >> >>besides mpassthru - you should not just take it because it's there.
>> >> >>
>> >> >>Can we have an ioctl that lets userspace configure how much
>> >> >>memory to lock? This ioctl will decrement the rlimit and store
>> >> >>the data in the device structure so we can do accounting
>> >> >>internally. Put it back on close or on another ioctl.
>> >> >Yes, we can decrement the rlimit in ioctl in one time to avoid
>> >> >data path.
>> >> >
>> >> >>Need to be careful for when this operation gets called
>> >> >>again with 0 or another small value while we have locked memory -
>> >> >>maybe just fail with EBUSY? or wait until it gets unlocked?
>> >> >>Maybe 0 can be special-cased and deactivate zero-copy?.
>> >> >>
>> >>
>> >> How about we don't use a new ioctl, but just check the rlimit
>> >> in one MPASSTHRU_BINDDEV ioctl? If we find mp device
>> >> break the rlimit, then we fail the bind ioctl, and thus can't do
>> >> zero copy any more.
>> >
>> >Yes, and not just check, but decrement as well.
>> >I think we should give userspace control over
>> >how much memory we can lock and subtract from the rlimit.
>> >It's OK to add this as a parameter to MPASSTHRU_BINDDEV.
>> >Then increment the rlimit back on unbind and on close?
>> >
>> >This opens up an interesting condition: process 1
>> >calls bind, process 2 calls unbind or close.
>> >This will increment rlimit for process 2.
>> >Not sure how to fix this properly.
>> >
>> I can't too, can we do any synchronous operations on rlimit stuff?
>> I quite suspect in it.
>>
>> >--
>> >MST
>
>Here's what infiniband does: simply pass the amount of memory userspace
>wants you to lock on an ioctl, and verify that either you have
>CAP_IPC_LOCK or this number does not exceed the current rlimit. (must
>be on ioctl, not on open, as we likely want the fd passed around between
>processes), but do not decrement rlimit. Use this on following
>operations. Be careful if this can be changed while operations are in
>progress.
>
>This does mean that the effective amount of memory that userspace can
>lock is doubled, but at least it is not unlimited, and we sidestep all
>other issues such as userspace running out of lockable memory simply by
>virtue of using the driver.
>

What I have done in mp device is almost the same as it. The difference is that
I do not check the capability, and I use my own parameter ctor->pages instead
of mm->locked_vm.

So currently, 1) add the capability check 2) use mm->locked_vm 3) add
an ioctl for userspace to configure how much memory can lock.

>--
>MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/