Re: [PATCH] mm/fadvise: discard partial pages iff endbyte is also eof

From: åå(Caspar)
Date: Thu Jan 04 2018 - 03:18:13 EST




On 2018/1/4 08:17, Andrew Morton wrote:
On Wed, 3 Jan 2018 10:48:00 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:

On Wed, Jan 03, 2018 at 02:53:43PM +0800, ??????(Caspar) wrote:


?? 2017??12??23????12:16?????? <shidao.ytt@xxxxxxxxxxxxxxx> ??????

From: "shidao.ytt" <shidao.ytt@xxxxxxxxxxxxxxx>

in commit 441c228f817f7 ("mm: fadvise: document the
fadvise(FADV_DONTNEED) behaviour for partial pages") Mel Gorman
explained why partial pages should be preserved instead of discarded
when using fadvise(FADV_DONTNEED), however the actual codes to calcuate
end_index was unexpectedly wrong, the code behavior didn't match to the
statement in comments; Luckily in another commit 18aba41cbf
("mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED")
Oleg Drokin fixed this behavior

Here I come up with a new idea that actually we can still discard the
last parital page iff the page-unaligned endbyte is also the end of
file, since no one else will use the rest of the page and it should be
safe enough to discard.

+akpm...

Hi Mel, Andrew:

Would you please take a look at this patch, to see if this proposal
is reasonable enough, thanks in advance!


I'm backlogged after being out for the Christmas. Superficially the patch
looks ok but I wondered how often it happened in practice as we already
would discard files smaller than a page on DONTNEED. It also requires
that the system call get the exact size of the file correct and would not
discard if the off + len was past the end of the file for whatever reason
(e.g. a stat to read the size, a truncate in parallel and fadvise using
stale data from stat) and that's why the patch looked like it might have
no impact in practice. Is the patch known to help a real workload or is
it motivated by a code inspection?

The current whole-pages-only logic was introduced (accidentally, I
think) by yours truly when fixing a bug in the initial fadvise()
commit in 2003.

https://kernel.opensuse.org/cgit/kernel/commit/?h=v2.6.0-test4&id=7161ee20fea6e25a32feb91503ca2b7c7333c886

Namely:

: invalidate_mapping_pages() takes start/end, but fadvise is currently passing
: it start/len.
:
:
:
: mm/fadvise.c | 8 ++++++--
: 1 files changed, 6 insertions(+), 2 deletions(-)
:
: diff -puN mm/fadvise.c~fadvise-fix mm/fadvise.c
: --- 25/mm/fadvise.c~fadvise-fix 2003-08-14 18:16:12.000000000 -0700
: +++ 25-akpm/mm/fadvise.c 2003-08-14 18:16:12.000000000 -0700
: @@ -26,6 +26,8 @@ long sys_fadvise64(int fd, loff_t offset
: struct inode *inode;
: struct address_space *mapping;
: struct backing_dev_info *bdi;
: + pgoff_t start_index;
: + pgoff_t end_index;
: int ret = 0;
:
: if (!file)
: @@ -65,8 +67,10 @@ long sys_fadvise64(int fd, loff_t offset
: case POSIX_FADV_DONTNEED:
: if (!bdi_write_congested(mapping->backing_dev_info))
: filemap_flush(mapping);
: - invalidate_mapping_pages(mapping, offset >> PAGE_CACHE_SHIFT,
: - (len >> PAGE_CACHE_SHIFT) + 1);
: + start_index = offset >> PAGE_CACHE_SHIFT;
: + end_index = (offset + len + PAGE_CACHE_SIZE - 1) >>
: + PAGE_CACHE_SHIFT;
: + invalidate_mapping_pages(mapping, start_index, end_index);
: break;
: default:
: ret = -EINVAL;
:

So I'm not sure that the whole "don't discard partial pages" thing is
well-founded and I see no reason why we cannot alter it.

So, thinking caps on: why not just discard them? After all, that's
what userspace asked us to do.

Hi Andrew, I doubt if "just discard them" is a proper action to match the userspace's expectation. Maybe we will never meet the userspace's expectation since we are doing pages in kernel while userspace is passing bytes offset/length to the kernel. Note that Mel Gorman has already documented page-unaligned behaviors in posix_fadvise() man page[1] but obviously not all people (including /me) are able to read the _latest_ version, so someone might still uses the syscall with page unaligned offset/length. The userspace might only ask for discarding certain *bytes*, instead of *pages*.

And I think we need to look back first why we thought "preserved is better than discard". If we throw the whole page, the rest part of the page might still be required (consider the offset and length is in the middle of a file) because it's untagged:

...|------------ PAGE --------------|...
...| DONTNEED |------ UNTAGGED -----|...

but the page has gone, page fault occurs and we need to reload it from the disk -- performance degradation happens.

Maybe that's why we would rather preserv the whole page before.

But if we don't throw the partial page at all, and if the tail partial page is _exactly the end of the file_, a page that advised to be NONEED would be left in memory. And we all know that it is safe to throw it.

So we come up with this patch -- to keep the partial page not been throwing away, and add a special case when the partial page is the end of the file, we can throw it safely. I guess it might be a better solution.

One thing I'm worrying about is that, this patch might lead to a new undocumented behavior, so maybe we need to document this special case in posix_fadvise() man page too? hmmm...

Thanks,
Caspar