Re: [PATCH v2 06/17] mm: doc comment for scary spot in write_one_page

From: Jeff Layton
Date: Wed Apr 12 2017 - 11:53:03 EST


On Wed, 2017-04-12 at 07:38 -0700, Matthew Wilcox wrote:
> On Wed, Apr 12, 2017 at 09:01:34AM -0400, Jeff Layton wrote:
> > On Wed, 2017-04-12 at 08:06 -0400, Jeff Layton wrote:
> > > Not sure what to do here just yet.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > ---
> > > mm/page-writeback.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index de0dbf12e2c1..3ac8399dc984 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -2388,6 +2388,12 @@ int write_one_page(struct page *page)
> > > ret = mapping->a_ops->writepage(page, &wbc);
> > > if (ret == 0) {
> > > wait_on_page_writeback(page);
> > > + /*
> > > + * FIXME: is this racy? What guarantees that PG_error
> > > + * will still be set once we get around to checking it?
> > > + * What if writeback fails, but then a read is issued
> > > + * before we check this, and that calls ClearPageError?
> > > + */
> > > if (PageError(page))
> > > ret = -EIO;
> > > }
> >
> > Ahh, we are always under the page lock here, and this is generally used
> > for writing out directory pages anyway. I'm fine with dropping this
> > patch unless someone else sees a problem here.
>
> ->writepage drops the page lock. We're still holding a refcount on this
> page, but that's not going to prevent read being called. But maybe the
> filesystem won't call read on a page that's marked as PageError?

Hard to be sure there. I really wonder if that check is needed at all,
the more I look at it. After all, we are calling writepage with
WB_SYNC_ALL so we should get an error there.

Is it also possible these pages could be written back before that point
(due to memory pressure or something) and that fail?

Maybe we should just have a call to filemap_check_errors on exiting
this function?

With the the wb_err_t based stuff, we could change it to sample the
wb_err early, and then use that to see if an error has occurred since
then. Maybe we should even allow callers to pass a wb_err_t in here, so
we can report errors that have occurred since a known point?
--
Jeff Layton <jlayton@xxxxxxxxxx>