Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers

From: Dave Chinner
Date: Sun Apr 16 2023 - 18:57:20 EST

Next message: Ivan Orlov: "Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file"
Previous message: Samuel Thibault: "Re: [PATCH] PPPoL2TP: Add more code snippets"
In reply to: Darrick J. Wong: "Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers"
Next in thread: Luis Chamberlain: "Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Apr 15, 2023 at 10:26:42PM -0700, Luis Chamberlain wrote:
> > > > Except ... we want to probe a dozen different
> > > > filesystems, and half of them keep their superblock at the same offset
> > > > from the start of the block device. So we do want to keep it cached.
> > > > That's arguing for using the page cache, at least to read it.
> > >
> > > Do we currently share anything from the bdev cache with the fs for this?
> > > Let's say that first block device blocksize in memory.
> >
> > sb_bread() is used by most filesystems, and the buffer cache aliases
> > into the page cache.
>
> I see thanks. I checked what xfs does and its xfs_readsb() uses its own
> xfs_buf_read_uncached(). It ends up calling xfs_buf_submit() and
> xfs_buf_ioapply_map() does it's own submit_bio(). So I'm curious why
> they did that.

XFS has it's own metadata address space for caching - it does not
use the block device page cache at all. This is not new, it never
has.

The xfs_buf buffer cache does not use the page cache, either. It
does it's own thing, has it's own indexing, locking, shrinkers, etc.
IOWs, it does not use the iomap infrastructure at all - iomap is
used by XFS exclusively for data IO.

As for why we use an uncached buffer for the superblock? That's
largely historic because prior to 2007 every modification that did
allocation/free needed to lock and modify the superblock at
transaction commit. Hence it's always needed in memory but a
critical fast path, so it is always directly available without
needing to do a cache lookup to callers that need it.

In 2007, lazy superblock counters got rid of the requirement to lock
the superblock buffer in every transaction commit, so the uncached
buffer optimisation hasn't really been needed for the past decade.
But if it ain't broke, don't try to fix it....

> > > > Now, do we want userspace to be able to dd a new superblock into place
> > > > and have the mounted filesystem see it?
> > >
> > > Not sure I follow this. dd a new super block?
> >
> > In userspace, if I run 'dd if=blah of=/dev/sda1 bs=512 count=1 seek=N',
> > I can overwrite the superblock. Do we want filesystems to see that
> > kind of vandalism, or do we want the mounted filesystem to have its
> > own copy of the data and overwrite what userspace wrote the next time it
> > updates the superblock?
>
> Oh, what happens today?

In XFS, it will completely ignore the fact the the superblock got
trashed like this. When the fs goes idle, or the sb modified for
some other reason, it will relog the in-memory superblock and write
it back to disk, thereby fixing the corruption. i.e. while the
filesystem is mounted, the superblock is _write-only_...

> > (the trick is that this may not be vandalism, it might be the sysadmin
> > updating the uuid or running some fsck-ish program or trying to update
> > the superblock to support fabulous-new-feature on next mount. does this
> > change the answer?)

If you need to change anything in the superblock while the XFS fs is
mounted, then you have to use ioctls to modify the superblock
contents through the running transaction subsystem. Editting the
block device directly breaks the security model of filesystems that
assume they have exclusive access to the block device whilst the
filesystem is mounted....

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

Next message: Ivan Orlov: "Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file"
Previous message: Samuel Thibault: "Re: [PATCH] PPPoL2TP: Add more code snippets"
In reply to: Darrick J. Wong: "Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers"
Next in thread: Luis Chamberlain: "Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]