Re: [GIT PULL] gfs2 fix

From: Andreas Gruenbacher
Date: Wed Apr 27 2022 - 15:43:51 EST


On Wed, Apr 27, 2022 at 7:13 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Apr 27, 2022 at 5:29 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
> >
> > Regular (buffered) reads and writes are expected to be atomic with
> > respect to each other.
>
> Linux has actually never honored that completely broken POSIX
> requirement, although I think some filesystems (notably XFS) have
> tried.

Okay, I can happily live with that.

I wonder if this could be documented in the read and write manual
pages. Or would that be asking too much?

> It's a completely broken concept. It's not possible to honor atomicity
> with mmap(), and nobody has *ever* cared.
>
> And it causes huge amounts of problems and basically makes any sane
> locking entirely impossible.
>
> The fact that you literally broke regular file writes in ways that are
> incompatible with (much MUCH more important) POSIX file behavior to
> try to get that broken read/write atomicity is only one example among
> many for why that alleged rule just has to be ignored.
>
> We do honor the PIPE_BUF atomicity on pipes, which is a completely
> different kind of atomicity wrt read/write, and doesn't have the
> fundamental issues that arbitrary regular file reads/writes have.
>
> There is absolutely no sane way to do that file atomicity wrt
> arbitrary read/write calls (*), and you shouldn't even try.
>
> That rule needs to be forgotten about, and buried 6ft deep.
>
> So please scrub any mention of that idiotic rule from documentation,
> and from your brain.
>
> And please don't break "partial write means disk full or IO error" due
> to trying to follow this broken rule, which was apparently what you
> did.
>
> Because that "regular file read/write is done in full" is a *MUCH*
> more important rule, and there is a shitton of applications that most
> definitely depend on *that* rule.
>
> Just go to debian code search, and look for
>
> "if (write("
>
> and you'll get thousands of hits, and on the first page of hits 9 out
> of 10 of the hits are literally about that "partial write is an
> error", eg code like this:
>
> if (write(fd,&triple,sizeof(triple)) != sizeof(triple))
> reporterr(1,NULL);
>
> from libreoffice.
>
> Linus
>
> (*) Yeah, if you never care about performance(**) of mixed read/write,
> and you don't care about mmap, and you have no other locking issues,
> it's certainly possible. The old rule came about from original UNIX
> literally taking an inode lock around the whole IO access, because
> that was simple, and back in the days you'd never have multiple
> concurrent readers/writers anyway.
>
> (**) It's also instructive how O_DIRECT literally throws that rule
> away, and then some direct-IO people said for years that direct-IO is
> superior and used this as one of their arguments. Probably the same
> people who thought that "oh, don't report partial success", because we
> can't deal with it.
>

Thanks a lot,
Andreas