Re: [PATCH 17/17] fs: add CONFIG_BUFFER_HEAD

From: Pankaj Raghav
Date: Mon May 01 2023 - 12:01:06 EST


>> No but the only place to add that would be in the block cache. Adding
>> that alone to the block cache doesn't fix the issue. The below patch
>> however does get us by.
>
> That's "working around the error", not fixing it ... probably the same
> root cause as your other errors; at least I'm not diving into them until
> the obvious one is fixed.
>
>> >From my readings it does't seem like readahead_folio() should always
>> return non-NULL, and also I couldn't easily verify the math is right.
>
> readahead_folio() always returns non-NULL. That's guaranteed by how
> page_cache_ra_unbounded() and page_cache_ra_order() work. It allocates
> folios, until it can't (already-present folio, ENOMEM, EOF, max batch
> size) and then calls the filesystem to make those folios uptodate,
> telling it how many folios it put in the page cache, where they start.
>
> Hm. The fact that it's coming from page_cache_ra_unbounded() makes
> me wonder if you updated this line:
>
> folio = filemap_alloc_folio(gfp_mask, 0);
>
> without updating this line:
>
> ractl->_nr_pages++;
>
> This is actually number of pages, not number of folios, so needs to be
> ractl->_nr_pages += 1 << order;
>

I already had a patch which did the following:

ractl->_nr_pages += folio_nr_pages(folio);

but the variable `i` in the loop was not updated properly (assumption of zero order folio). This now
fixes the crash:

@@ -210,7 +210,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
unsigned long index = readahead_index(ractl);
gfp_t gfp_mask = readahead_gfp_mask(mapping);
unsigned long i;
-
+ int order = 0;
/*
* Partway through the readahead operation, we will have added
* locked pages to the page cache, but will not yet have submitted
@@ -223,6 +223,9 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
*/
unsigned int nofs = memalloc_nofs_save();

+ if (mapping->host->i_blkbits > PAGE_SHIFT)
+ order = mapping->host->i_blkbits - PAGE_SHIFT;
+
filemap_invalidate_lock_shared(mapping);
/*
* Preallocate as many pages as we will need.
@@ -245,7 +248,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
continue;
}

- folio = filemap_alloc_folio(gfp_mask, 0);
+ folio = filemap_alloc_folio(gfp_mask, order);
if (!folio)
break;
if (filemap_add_folio(mapping, folio, index + i,
@@ -259,7 +262,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
if (i == nr_to_read - lookahead_size)
folio_set_readahead(folio);
ractl->_workingset |= folio_test_workingset(folio);
- ractl->_nr_pages++;
+ ractl->_nr_pages += folio_nr_pages(folio);
+ i += folio_nr_pages(folio) - 1;
}

> various other parts of page_cache_ra_unbounded() need to be examined
> carefully for assumptions of order-0; it's never been used for that
> before. all the large folio work has concentrated on
> page_cache_ra_order()

As you have noted here, this needs to be examined more carefully. Even though the patches fix the
crash, fio with verify option fails (i.e write and read are not giving the same output).

I think it is better to send an RFC patch series on top of Christoph's work with optional
BUFFER_HEAD to iron out some core issues/bugs.