Re: [PATCH v2 2/2] rust: arc: remove `ArcBorrow` in favour of `WithRef`

From: Alice Ryhl
Date: Tue Sep 26 2023 - 19:10:13 EST


On 9/26/23 19:43, Boqun Feng wrote:
On Tue, Sep 26, 2023 at 05:15:52PM +0000, Benno Lossin wrote:
On 26.09.23 18:35, Boqun Feng wrote:
On Tue, Sep 26, 2023 at 05:41:17PM +0200, Alice Ryhl wrote:
On Tue, Sep 26, 2023 at 5:24 PM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:

On Tue, Sep 26, 2023 at 04:26:59PM +0800, Gary Guo wrote:
On Mon, 25 Sep 2023 22:26:56 +0000
Benno Lossin <benno.lossin@xxxxxxxxx> wrote:

[...]

The pointer was originally derived by a call to `into_raw`:
```
pub fn into_raw(self) -> *const T {
let ptr = self.ptr.as_ptr();
core::mem::forget(self);
// SAFETY: The pointer is valid.
unsafe { core::ptr::addr_of!((*ptr).data) }
}
```
So in this function the origin (also the origin of the provenance)
of the pointer is `ptr` which is of type `NonNull<WithRef<T>>`.
Raw pointers do not lose this provenance information when you cast
it and when using `addr_of`/`addr_of_mut`. So provenance is something
that is not really represented in the type system for raw pointers.

When doing a round trip through a reference though, the provenance is
newly assigned and thus would only be valid for a `T`:
```
let raw = arc.into_raw();
let reference = unsafe { &*raw };
let raw: *const T = reference;
let arc = unsafe { Arc::from_raw(raw) };
```
Miri would complain about the above code.


One thing we can do is to opt from strict provenance, so:


A few questions about strict provenance:

```
let raw = arc.into_raw();
let _ = raw as usize; // expose the provenance of raw

Should this be a expose_addr()?

Pointer to integer cast is equivalent to expose_addr.

let reference = unsafe { &*raw };
let raw = reference as *const T as usize as *const T;

and this is a from_exposed_addr{_mut}(), right?

Integer to pointer cast is equivalent to from_exposed_addr.


Got it, thanks!

let arc = unsafe { Arc::from_raw(raw) };
```


One step back, If we were to use strict provenance API (i.e.
expose_addr()/from_exposed_addr()), we could use it to "fix" the
original problem? By:

* expose_addr() in as_with_ref()
* from_exposed_addr() in `impl From<&WithRef<T>> for Arc`

right?

More steps back, is the original issue only a real issue under strict
provenance rules? Don't make me wrong, I like the ideas behind strict
provenance, I just want to check, if we don't enable strict provenance
(as a matter of fact, we don't do it today),

Outside of miri, strict provenance is not really something you enable.
It's a set of rules that are stricter than the real rules, that are
designed such that when you follow them, your code will be correct
under any conceivable memory model we might end up with. They will
never be the rules that the compiler actually uses.

I think by "opt out from strict provenance", Gary just meant "use
int2ptr and ptr2int casts to reset the provenance".

will the original issue found by Alice be a UB?

Yes, it's UB under any ruleset that exists out there. There's no flag
to turn it off.

Or is there a way to disable Miri's check on
strict provenance? IIUC, the cause of the original issue is that "you
cannot reborrow a pointer derived from a `&` to get a `&mut`, even when
there is no other alias to the same object". Maybe I'm still missing
something, but without strict provenance, is this a problem? Or is there
a provenance model of Rust without strict provenance?

It's a problem under all of the memory models. The rule being violated
is exactly the same rule as the one behind this paragraph:

Transmuting an & to &mut is Undefined Behavior. While certain usages may appear safe, note that the Rust optimizer is free to assume that a shared reference won't change through its lifetime and thus such transmutation will run afoul of those assumptions. So:

Transmuting an & to &mut is always Undefined Behavior.
No you can't do it.
No you're not special.
From: https://doc.rust-lang.org/nomicon/transmutes.html

But here the difference it that we only derive a `*mut` from a `&`,
rather than transmute to a `&mut`, right? We only use `&` to get a
pointer value (a usize), so I don't think that rule applies here? Or in
other words, does the following implemenation look good to you?

impl<T: ?Sized> Arc<T> {
pub fn as_with_ref(&self) -> &WithRef<T> {
// expose
let _ = self.ptr.as_ptr() as usize;
unsafe { self.ptr.as_ref() }
}
}

impl<T: ?Sized> From<&WithRef<T>> for Arc<T> {
fn from(b: &WithRef<T>) -> Self {
// from exposed
let ptr = unsafe { NonNull::new_unchecked(b as *const _ as usize as *mut _) };
// SAFETY: The existence of `b` guarantees that the refcount is non-zero. `ManuallyDrop`
// guarantees that `drop` isn't called, so it's ok that the temporary `Arc` doesn't own the
// increment.
ManuallyDrop::new(unsafe { Arc::from_inner(ptr) })
.deref()
.clone()
}
}


An equivalent code snippet is as below (in case anyone wants to try it
in miri):
```rust
let raw = Box::into_raw(arc);

// as_with_ref()
let _ = raw as usize;
let reference = unsafe { &*raw };

// from()
let raw: *mut T = reference as *const _ as usize as *mut _ ;

// drop()
let arc = unsafe { Box::from_raw(raw) };
```

I don't understand why we are trying to use ptr2int to fix this.
Simply wrapping the `T` field inside `WithRef` with `UnsafeCell`
should be enough.


To me (and maybe the same for Wedson), it's actually Ok that we don't do
the change (i.e. the ArcBorrow -> &WithRef) at all. It's more a
code/concept simplification.

Not making the change is fine with me.

Fixing with an `UnsafeCell` seems more like a workaround to me, because
there is no interior mutable requirement here, so I just don't want to
patch something unnecessary here just to make things work.

I don't think it's a just a workaround. We're dealing with shared references to something, but they can be used to modify the value under some circumstances. That seems like rather standard interior mutability.

Put it in another way, if `UnsafeCell` can fix this and no interior
mutability is needed, we probably can fix this with another way or there
is an API like `UnsafeCell` is missing here.

Sorry for being stubborn here ;-) But I really want to find a better
solution for the similar problems.

What's the shortcoming of ptr2int?

To me, that's way more of a hack than using UnsafeCell is.

Regards,
Boqun

--
Cheers,
Benno