Re: [PATCH 0/5] Volatile Ranges (v12) & LSF-MM discussion fodder

From: John Stultz
Date: Wed Apr 02 2014 - 00:04:15 EST


On 04/01/2014 02:21 PM, Johannes Weiner wrote:
> [ I tried to bring this up during LSFMM but it got drowned out.
> Trying again :) ]
>
> On Fri, Mar 21, 2014 at 02:17:30PM -0700, John Stultz wrote:
>> Optimistic method:
>> 1) Userland marks a large range of data as volatile
>> 2) Userland continues to access the data as it needs.
>> 3) If userland accesses a page that has been purged, the kernel will
>> send a SIGBUS
>> 4) Userspace can trap the SIGBUS, mark the affected pages as
>> non-volatile, and refill the data as needed before continuing on
> As far as I understand, if a pointer to volatile memory makes it into
> a syscall and the fault is trapped in kernel space, there won't be a
> SIGBUS, the syscall will just return -EFAULT.
>
> Handling this would mean annotating every syscall invocation to check
> for -EFAULT, refill the data, and then restart the syscall. This is
> complicated even before taking external libraries into account, which
> may not propagate syscall returns properly or may not be reentrant at
> the necessary granularity.
>
> Another option is to never pass volatile memory pointers into the
> kernel, but that too means that knowledge of volatility has to travel
> alongside the pointers, which will either result in more complexity
> throughout the application or severely limited scope of volatile
> memory usage.
>
> Either way, optimistic volatile pointers are nowhere near as
> transparent to the application as the above description suggests,
> which makes this usecase not very interesting, IMO. If we can support
> it at little cost, why not, but I don't think we should complicate the
> common usecases to support this one.

So yea, thanks again for all the feedback at LSF-MM! I'm trying to get
things integrated for a v13 here shortly (although with visitors in town
this week it may not happen until next week).


So, maybe its best to ignore the fact that folks want to do semi-crazy
user-space faulting via SIGBUS. At least to start with. Lets look at the
semantic for the "normal" mark volatile, never touch the pages until you
mark non-volatile - basically where accessing volatile pages is similar
to a use-after-free bug.

So, for the most part, I'd say the proposed SIGBUS semantics don't
complicate things for this basic use-case, at least when compared with
things like zero-fill. If an applications accidentally accessed a
purged volatile page, I think SIGBUS is the right thing to do. They most
likely immediately crash, but its better then them moving along with
silent corruption because they're mucking with zero-filled pages.

So between zero-fill and SIGBUS, I think SIGBUS makes the most sense. If
you have a third option you're thinking of, I'd of course be interested
in hearing it.

Now... once you've chosen SIGBUS semantics, there will be folks who will
try to exploit the fact that we get SIGBUS on purged page access (at
least on the user-space side) and will try to access pages that are
volatile until they are purged and try to then handle the SIGBUS to fix
things up. Those folks exploiting that will have to be particularly
careful not to pass volatile data to the kernel, and if they do they'll
have to be smart enough to handle the EFAULT, etc. That's really all
their problem, because they're being clever. :)

I've maybe made a mistake in talking at length about those use cases,
because I wanted to make sure folks didn't have suggestions on how to
better address those cases (so far I've not heard any), and it sort of
helps wrap folks heads around at least some of the potential variations
on the desired purging semantics (lru based cold page purging, or entire
object based purging).

Now, one other potential variant, which Keith brought up at LSF-MM, and
others have mentioned before, is to have *any* volatile page access
(purged or not) return a SIGBUS. This seems "safe" in that it protects
developers from themselves, and makes application behavior more
deterministic (rather then depending on memory pressure). However it
also has the overhead of setting up the pte swp entries for each page in
order to trip the SIGBUS. Since folks have explicitly asked for it,
allowing non-purged volatile page access seems more flexible. And its
cheaper. So that's what I've been leaning towards.

thanks again!
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/