Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed)

From: Dominique MARTINET
Date: Thu Jun 30 2022 - 03:36:01 EST


Hi Willy, linux-btrfs@vger,

Matthew Wilcox (Oracle) wrote on Sun, Jun 19, 2022 at 04:11:41PM +0100:
> We had an off-by-one error which meant that we never marked the first page
> in a read as accessed. This was visible as a slowdown when re-reading
> a file as pages were being evicted from cache too soon. In reviewing
> this code, we noticed a second bug where a multi-page folio would be
> marked as accessed multiple times when doing reads that were less than
> the size of the folio.

when debugging an unrelated issue (short reads on btrfs with io_uring
and O_DIRECT[1]), I noticed that my horrible big file copy speeds fell
down from ~2GB/s (there's compression and lots of zeroes) to ~100MB/s
the second time I was copying it with cp.

I've taken a moment to bisect this and came down to this patch.

[1] https://lore.kernel.org/all/YrrFGO4A1jS0GI0G@xxxxxxxxxxxxxxxxx/T/#u



Dropping caches (echo 3 > /proc/sys/vm/drop_caches) restore the speed,
so there appears to be some bad effect to having the file in cache for
fiemap?
To be fair that file is pretty horrible:
---
# compsize bigfile
Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 15% 3.7G 23G 23G
none 100% 477M 477M 514M
zstd 14% 3.2G 23G 23G
---

Here's what perf has to say about it on top of this patch when running
`cp bigfile /dev/null` the first time:

98.97% 0.00% cp [kernel.kallsyms] [k]
entry_SYSCALL_64_after_hwframe
entry_SYSCALL_64_after_hwframe
do_syscall_64
- 93.40% ksys_read
- 93.36% vfs_read
- 93.25% new_sync_read
- 93.20% filemap_read
- 83.38% filemap_get_pages
- 82.76% page_cache_ra_unbounded
+ 59.72% folio_alloc
+ 13.43% read_pages
+ 8.75% filemap_add_folio
0.64% xa_load
0.52% filemap_get_read_batch
+ 8.75% copy_page_to_iter
- 4.73% __x64_sys_ioctl
- 4.72% do_vfs_ioctl
- btrfs_fiemap
- 4.70% extent_fiemap
+ 3.95% btrfs_check_shared
+ 0.70% get_extent_skip_holes

and second time:
99.90% 0.00% cp [kernel.kallsyms] [k]
entry_SYSCALL_64_after_hwfram
entry_SYSCALL_64_after_hwframe
do_syscall_64
- 94.62% __x64_sys_ioctl
do_vfs_ioctl
btrfs_fiemap
- extent_fiemap
- 50.01% get_extent_skip_holes
- 50.00% btrfs_get_extent_fiemap
- 49.97% count_range_bits
rb_next
+ 28.72% lock_extent_bits
+ 15.55% __clear_extent_bit
- 5.21% ksys_read
+ 5.21% vfs_read

(if this isn't readable, 95% of the time is spent on fiemap the second
time around)




I've also been observing RCU stalls on my laptop with the same workload
(cp to /dev/null), but unfortunately I could not reproduce in qemu so I
could not take traces to confirm they are caused by the same commit but
given the workload I'd say that is it?
I can rebuild a kernel for my laptop and confirm if you think it should
be something else.


I didn't look at the patch itself (yet) so have no suggestion at this
point - it's plausible the patch fixed something and just exposed slow
code that had been there all along so it might be better to look at the
btrfs side first, I don't know.
If you don't manage to reproduce I'll be happy to test anything thrown
at me at the very least.


Thanks,
--
Dominique Martinet | Asmadeus