Re: [PATCH v3 1/9] rust: file: add Rust abstraction for `struct file`

From: Alice Ryhl
Date: Mon Jan 29 2024 - 11:40:18 EST


On Fri, Jan 26, 2024 at 4:04 PM Benno Lossin <benno.lossin@xxxxxxxxx> wrote:
>
> On 18.01.24 15:36, Alice Ryhl wrote:
> > +/// Wraps the kernel's `struct file`.
> > +///
> > +/// # Refcounting
> > +///
> > +/// Instances of this type are reference-counted. The reference count is incremented by the
> > +/// `fget`/`get_file` functions and decremented by `fput`. The Rust type `ARef<File>` represents a
> > +/// pointer that owns a reference count on the file.
> > +///
> > +/// Whenever a process opens a file descriptor (fd), it stores a pointer to the file in its `struct
> > +/// files_struct`. This pointer owns a reference count to the file, ensuring the file isn't
> > +/// prematurely deleted while the file descriptor is open. In Rust terminology, the pointers in
> > +/// `struct files_struct` are `ARef<File>` pointers.
> > +///
> > +/// ## Light refcounts
> > +///
> > +/// Whenever a process has an fd to a file, it may use something called a "light refcount" as a
> > +/// performance optimization. Light refcounts are acquired by calling `fdget` and released with
> > +/// `fdput`. The idea behind light refcounts is that if the fd is not closed between the calls to
> > +/// `fdget` and `fdput`, then the refcount cannot hit zero during that time, as the `struct
> > +/// files_struct` holds a reference until the fd is closed. This means that it's safe to access the
> > +/// file even if `fdget` does not increment the refcount.
> > +///
> > +/// The requirement that the fd is not closed during a light refcount applies globally across all
> > +/// threads - not just on the thread using the light refcount. For this reason, light refcounts are
> > +/// only used when the `struct files_struct` is not shared with other threads, since this ensures
> > +/// that other unrelated threads cannot suddenly start using the fd and close it. Therefore,
> > +/// calling `fdget` on a shared `struct files_struct` creates a normal refcount instead of a light
> > +/// refcount.
> > +///
> > +/// Light reference counts must be released with `fdput` before the system call returns to
> > +/// userspace. This means that if you wait until the current system call returns to userspace, then
> > +/// all light refcounts that existed at the time have gone away.
> > +///
> > +/// ## Rust references
> > +///
> > +/// The reference type `&File` is similar to light refcounts:
> > +///
> > +/// * `&File` references don't own a reference count. They can only exist as long as the reference
> > +/// count stays positive, and can only be created when there is some mechanism in place to ensure
> > +/// this.
> > +///
> > +/// * The Rust borrow-checker normally ensures this by enforcing that the `ARef<File>` from which
> > +/// a `&File` is created outlives the `&File`.
> > +///
> > +/// * Using the unsafe [`File::from_ptr`] means that it is up to the caller to ensure that the
> > +/// `&File` only exists while the reference count is positive.
> > +///
> > +/// * You can think of `fdget` as using an fd to look up an `ARef<File>` in the `struct
> > +/// files_struct` and create an `&File` from it. The "fd cannot be closed" rule is like the Rust
> > +/// rule "the `ARef<File>` must outlive the `&File`".
> > +///
> > +/// # Invariants
> > +///
> > +/// * Instances of this type are refcounted using the `f_count` field.
> > +/// * If an fd with active light refcounts is closed, then it must be the case that the file
> > +/// refcount is positive until there are no more light refcounts created from the fd that got
>
> I think this wording can be easily misinterpreted: "until there
> are no more light refcounts created" could mean that you are allowed
> to drop the refcount to zero after the last light refcount has been
> created. But in reality you want all light refcounts to be released
> first.
> I would suggest "until all light refcounts of the fd have been dropped"
> or similar.

Will do.

> > +/// closed.
> > +/// * A light refcount must be dropped before returning to userspace.
> > +#[repr(transparent)]
> > +pub struct File(Opaque<bindings::file>);
> > +
> > +// SAFETY: By design, the only way to access a `File` is via an immutable reference or an `ARef`.
> > +// This means that the only situation in which a `File` can be accessed mutably is when the
> > +// refcount drops to zero and the destructor runs. It is safe for that to happen on any thread, so
> > +// it is ok for this type to be `Send`.
>
> Technically, `drop` is never called for `File`, since it is only used
> via `ARef<File>` which calls `dec_ref` instead. Also since it only contains
> an `Opaque`, dropping it is a noop.
> But what does `Send` mean for this type? Since it is used together with
> `ARef`, being `Send` means that `File::dec_ref` can be called from any
> thread. I think we are missing this as a safety requirement on
> `AlwaysRefCounted`, do you agree?
> I think the safety justification here could be (with the requirement added
> to `AlwaysRefCounted`):
>
> SAFETY:
> - `File::drop` can be called from any thread.
> - `File::dec_ref` can be called from any thread.

This wording was taken from rust/kernel/task.rs. I think it's out of
scope to reword it.

Besides, it says "destructor runs", not "drop runs". The destructor
can be interpreted to mean the right thing for ARef.

The right safety comment would probably be that dec_ref can be called
from any thread.

Alice