Re: [PATCH] afs: Fix waiting for writeback then skipping folio

From: Andrew Morton
Date: Fri Jun 16 2023 - 19:22:11 EST


On Fri, 16 Jun 2023 23:43:02 +0100 David Howells <dhowells@xxxxxxxxxx> wrote:

> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > > Commit acc8d8588cb7 converted afs_writepages_region() to write back a
> > > folio batch. The function waits for writeback to a folio, but then
> > > proceeds to the rest of the batch without trying to write that folio
> > > again. This patch fixes has it attempt to write the folio again.
> > >
> > > This has only been compile tested.
> >
> > This seems fairly serious?
>
> We will try to write the again later, but sync()/fsync() might now have
> skipped it.
>
> > From my reading, we'll fail to write out the dirty data. Presumably
> > not easily observable, as it will get written out again later on.
>
> As it's a network filesystem, interactions with third parties could cause
> apparent corruption. Closing a file will flush it - but if there's a
> simultaneous op of some other kind, a bit of a flush or a sync may get missed
> and the copy visible to another user be temporarily missing that bit.
>
> > But we're also calling afs_write_back_from_locked_folio() with an unlocked
> > folio, which might cause mayhem.
>
> Without this patch, you mean? There's a "continue" statement that should send
> us back to the top of the loop before we get as far as
> afs_write_back_from_locked_folio() - and then the folio_unlock() there would
> go bang.
>

Well, what I'm really asking is the thing I ask seven times a day:

- what are the end-user visible effects of the bug

- should be fix be backported into earlier kernels