Nature of ext4 corruption fixed by recent patch?

From: josh
Date: Mon May 18 2015 - 18:58:39 EST


Hi,

I recently had my server's filesystem implode, and I'm currently in the
process of cleaning it up. It had widespread corruption in files and
directories scattered across the filesystem, though all vaguely recently
changed. Directories appeared corrupted or truncated, various files
showed up as piles of NULs, and 5000+ files and directories ended up in
lost+found. I observed this corruption shortly after a reboot into
4.0.2 (from a previous kernel of 3.16), with ext4 noticing an
inconsistency and mounting the filesystem read-only. The underling
disks had no errors.

Reading about the corruption issue fixed by
d2dc317d564a46dfc683978a2e5a4f91434e9711 ("ext4: fix data corruption
caused by unwritten and delayed extents"), it sounds plausible. Can
that strike both file data and directory data, assuming all of that data
ended up grouped with a delayed extent? Would that bug manifest as
corrupted directories and files filled with NULs? The system is a
72-way server on which I was doing piles of parallel git pulls and
builds, so hitting a race seems plausible.

I'm trying to track down potential causes of this so that I can feel
comfortable trusting that system again.

Thanks,
Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/