Re: [GIT PULL] Btrfs fixes for 6.8-rc2

From: Qu Wenruo
Date: Fri Jan 26 2024 - 16:06:46 EST




On 2024/1/27 05:55, Linus Torvalds wrote:
On Mon, 22 Jan 2024 at 10:34, David Sterba <dsterba@xxxxxxxx> wrote:

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.8-rc1-tag

I have no idea if this is related to the new fixes, but I have never
seen it before:

BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]

This is triggered during a btrfs btree search, for XATTR read.

The root=256 means the tree search operation is triggered from subvolume
256, which is completely sane.

But the other number, 11858205567642294356, which is still inside the
allowed subvolume range, but beyond the 0 level qgroup, thus it makes
is_fstree() return false, and triggered the error.

Normally we should not have this many subvolumes, and since 2015 we
already has such check against subvolume creation.

But I don't really believe that's the case, unless there are really that
many snapshots/subvolumes in the fs (beyond 1 << 48 snapshots)

SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737268
SELinux: inode_doinit_use_xattr: getxattr returned 117 for dev=dm-0
ino=5737267

and it caused an actual warning to be printed for my kernel tree from 'git':

error: failed to stat 'sound/pci/ice1712/se.c': Structure needs cleaning

(and yes, 117 is EUCLEAN, aka "Structure needs cleaning")

The problem seems to have self-corrected, because it didn't happen
when repeating the command, and that file that failed to stat looks
perfectly fine.

I guess since the error is self-corrected, there is no tree dump of
block 8550954455682405139 just after the error.

Personally speaking the number 11858205567642294356 is really too large,
so that it looks like some runtime garbage.
I don't have any clue for now, but if you hit it again, you may want to
run "btrfs ins dump-tree -b <bytenr> <device>" to dump the content.

Meanwhile I'll make kernel tree-checker to dump the content of the
offending tree block so that it's easier to debug.


But it is clearly worrisome.

The "owner mismatch" check isn't new - it was added back in 5.19 in
commit 88c602ab4460 ("btrfs: tree-checker: check extent buffer owner
against owner rootid"). So something else must have changed to trigger
it.

Anyway I'll keep an eye on the situation.

Thanks,
Qu


Linus