xfs bug, 2.6.17-rc2: unrepairable inode created

From: Bernhard Reiter
Date: Mon Apr 24 2006 - 13:59:25 EST


2.6.17-rc2 on powerpc

Summary of failure:
A bad inode was created that makes xfs stop on change attempts.
xfs_repair cannot repair it.
The creation most likely happened when running 2.6.17-rc2 on ppc.

Benjamin Herrenschmidt recommended to report this to lkml and xfs maintainers.
Attached my detailed report,
I have the 3 GByte partition copied as file and thus can do experiments
or send part to someone, if I get detailed instructions.

Please copy me on relevant replies, I am not subscribed to the list.

Bernhard
--
Professional Service for Free Software (intevation.net)
The FreeGIS Project (freegis.org)
FSFE (fsfeurope.org)
Details of XFS failure happening on 20060423
to Bernhard Reiter <bernhard@xxxxxxxxxxxxx>

Summary of failure:
A bad inode was created that make xfs stop on change attempts.
xfs_repair cannot repair it.
The creation most likely happened when running 2.6.17-rc2 on ppc.

How the failure appeared:

Build and ran a 2.6.17-rc2 kernel on a ppc (PowerBook 5,6) for the first time.
gcc-Version 3.3.5 (Debian 1:3.3.5-13)

Used konqueror.

Later ran 2.6.12-rc6 to use omninet.o (which is broken in linux beyond 2.6.12).
This kernel was used often before in production without problems.

Trying to delete an entry from the inode leads to an XFS internal error.
This was a directory usually written by konqueror.


How to reproduce the bad behaviour now:

cd /var/bad/a.out
rm file.in.bad.dir

xfs_da_do_buf: bno 16777216
dir: inode 733
Filesystem "dm-6": XFS internal error xfs_da_do_buf(1) at line 2174 of file fs/xfs/xfs_da_btree.c. Caller 0xc00f2314
Call trace:
[c0007ad0] dump_stack+0x18/0x28
[c010024c] xfs_error_report+0x60/0x64
[c00f20f8] xfs_da_do_buf+0x5b0/0x764
[c00f2314] xfs_da_read_buf+0x2c/0x3c
[c00fa238] xfs_dir2_leafn_remove+0x1bc/0x37c
[c00fb1d8] xfs_dir2_node_removename+0xa0/0x114
[c00f4644] xfs_dir2_removename+0x110/0x120
[c0124d58] xfs_remove+0x21c/0x434
[c0130ec4] linvfs_unlink+0x30/0x68
[c0075924] vfs_unlink+0x13c/0x254
[c0075b44] sys_unlink+0x108/0x1a8
[c0004660] ret_from_syscall+0x0/0x44
xfs_force_shutdown(dm-6,0x8) called from line 1091 of file fs/xfs/xfs_trans.c.Return address = 0xc013458c
Filesystem "dm-6": Corruption of in-memory data detected. Shutting down filesystem: dm-6
Please umount the filesystem, and rectify the problem(s)


I have copied the partition in a file with dd and
the behaviour is consistant.

xfs_check says:
missing free index for data block 0 in dir ino 733
missing free index for data block 1 in dir ino 733
missing free index for data block 2 in dir ino 733
missing free index for data block 3 in dir ino 733
missing free index for data block 4 in dir ino 733
missing free index for data block 5 in dir ino 733
missing free index for data block 6 in dir ino 733
missing free index for data block 7 in dir ino 733
missing free index for data block 8 in dir ino 733
missing free index for data block 9 in dir ino 733
missing free index for data block 12 in dir ino 733
missing free index for data block 13 in dir ino 733
missing free index for data block 14 in dir ino 733


xfs_repair on-the-device

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- clearing existing "lost+found" inode
- marking entry "lost+found" to be deleted
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
rebuilding directory inode 128

fatal error -- can't read block 16777216 for directory inode 733


Hypothesis: Block 16777216 is not within the valid range.

xfs_info /dev/mapper/vg0-var
meta-data=/var isize=256 agcount=8, agsize=98304 blks
= sectsz=512
data = bsize=4096 blocks=786432, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=2560, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0

Additional information:

I am using xfsprogs from Debian Sarge 2.6.20-1.
Retested with a backport of 2.7.16-1, still cannot repair.

Attachment: pgp00000.pgp
Description: PGP signature