Re: [PATCH v3 1/9] rust: file: add Rust abstraction for `struct file`

From: Benno Lossin
Date: Fri Jan 26 2024 - 10:05:06 EST


On 18.01.24 15:36, Alice Ryhl wrote:
> +/// Wraps the kernel's `struct file`.
> +///
> +/// # Refcounting
> +///
> +/// Instances of this type are reference-counted. The reference count is incremented by the
> +/// `fget`/`get_file` functions and decremented by `fput`. The Rust type `ARef<File>` represents a
> +/// pointer that owns a reference count on the file.
> +///
> +/// Whenever a process opens a file descriptor (fd), it stores a pointer to the file in its `struct
> +/// files_struct`. This pointer owns a reference count to the file, ensuring the file isn't
> +/// prematurely deleted while the file descriptor is open. In Rust terminology, the pointers in
> +/// `struct files_struct` are `ARef<File>` pointers.
> +///
> +/// ## Light refcounts
> +///
> +/// Whenever a process has an fd to a file, it may use something called a "light refcount" as a
> +/// performance optimization. Light refcounts are acquired by calling `fdget` and released with
> +/// `fdput`. The idea behind light refcounts is that if the fd is not closed between the calls to
> +/// `fdget` and `fdput`, then the refcount cannot hit zero during that time, as the `struct
> +/// files_struct` holds a reference until the fd is closed. This means that it's safe to access the
> +/// file even if `fdget` does not increment the refcount.
> +///
> +/// The requirement that the fd is not closed during a light refcount applies globally across all
> +/// threads - not just on the thread using the light refcount. For this reason, light refcounts are
> +/// only used when the `struct files_struct` is not shared with other threads, since this ensures
> +/// that other unrelated threads cannot suddenly start using the fd and close it. Therefore,
> +/// calling `fdget` on a shared `struct files_struct` creates a normal refcount instead of a light
> +/// refcount.
> +///
> +/// Light reference counts must be released with `fdput` before the system call returns to
> +/// userspace. This means that if you wait until the current system call returns to userspace, then
> +/// all light refcounts that existed at the time have gone away.
> +///
> +/// ## Rust references
> +///
> +/// The reference type `&File` is similar to light refcounts:
> +///
> +/// * `&File` references don't own a reference count. They can only exist as long as the reference
> +/// count stays positive, and can only be created when there is some mechanism in place to ensure
> +/// this.
> +///
> +/// * The Rust borrow-checker normally ensures this by enforcing that the `ARef<File>` from which
> +/// a `&File` is created outlives the `&File`.
> +///
> +/// * Using the unsafe [`File::from_ptr`] means that it is up to the caller to ensure that the
> +/// `&File` only exists while the reference count is positive.
> +///
> +/// * You can think of `fdget` as using an fd to look up an `ARef<File>` in the `struct
> +/// files_struct` and create an `&File` from it. The "fd cannot be closed" rule is like the Rust
> +/// rule "the `ARef<File>` must outlive the `&File`".
> +///
> +/// # Invariants
> +///
> +/// * Instances of this type are refcounted using the `f_count` field.
> +/// * If an fd with active light refcounts is closed, then it must be the case that the file
> +/// refcount is positive until there are no more light refcounts created from the fd that got

I think this wording can be easily misinterpreted: "until there
are no more light refcounts created" could mean that you are allowed
to drop the refcount to zero after the last light refcount has been
created. But in reality you want all light refcounts to be released
first.
I would suggest "until all light refcounts of the fd have been dropped"
or similar.

> +/// closed.
> +/// * A light refcount must be dropped before returning to userspace.
> +#[repr(transparent)]
> +pub struct File(Opaque<bindings::file>);
> +
> +// SAFETY: By design, the only way to access a `File` is via an immutable reference or an `ARef`.
> +// This means that the only situation in which a `File` can be accessed mutably is when the
> +// refcount drops to zero and the destructor runs. It is safe for that to happen on any thread, so
> +// it is ok for this type to be `Send`.

Technically, `drop` is never called for `File`, since it is only used
via `ARef<File>` which calls `dec_ref` instead. Also since it only contains
an `Opaque`, dropping it is a noop.
But what does `Send` mean for this type? Since it is used together with
`ARef`, being `Send` means that `File::dec_ref` can be called from any
thread. I think we are missing this as a safety requirement on
`AlwaysRefCounted`, do you agree?
I think the safety justification here could be (with the requirement added
to `AlwaysRefCounted`):

SAFETY:
- `File::drop` can be called from any thread.
- `File::dec_ref` can be called from any thread.

--
Cheers,
Benno

> +unsafe impl Send for File {}
> +
> +// SAFETY: All methods defined on `File` that take `&self` are safe to call even if other threads
> +// are concurrently accessing the same `struct file`, because those methods either access immutable
> +// properties or have proper synchronization to ensure that such accesses are safe.
> +unsafe impl Sync for File {}