Re: XFS: inode with st_mode == 0

From: Jakob Oestergaard
Date: Mon Jan 17 2005 - 05:13:49 EST


On Sun, Jan 16, 2005 at 01:51:12PM +0000, Christoph Hellwig wrote:
> On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote:
> > So apart from the general well known instability problems that will
> > occur when you actually start *using* the system, there should be no
>
> What known instabilities?

Where should I begin? ;)

Most of the following have already been posted to LKML - primarily by
Anders (as@xxxxxxxxxxxx) - it seems that noone cares, but I'll repost a
summary that Anders sent me below:

-------
Scenario 1: Mailservers:
2.6.10 (~24-40 hours uptime):
Running ext3 on mailqueue:

<SNIP>
Unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
c018a095
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc
CPU: 2
EIP: 0060:[<c018a095>] Not tainted
EFLAGS: 00010286 (2.6.8.1)
EIP is at journal_commit_transaction+0x535/0x10e5
eax: cac1e26c ebx: 00000000 ecx: f7cec400 edx: f7cec400
esi: f65f3000 edi: cac1e26c ebp: f65f3000 esp: f65f3dc0
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70)
Stack: f65f3e64 00000000 00000000 00000000 00000000 00000000 f7cec400 cda565fc
0000149a 00000004 f65f3e48 c01132d8 00000002 c202ad20 00000001 f65f3e5c
c202ad20 c202ad20 00000002 00000001 0000001e 01c1af60 f65f3e68 c0407dc0
Call Trace:
[<c01132d8>] scheduler_tick+0x468/0x470
[<c01127b5>] find_busiest_group+0x105/0x310
[<c011db8e>] del_timer_sync+0x7e/0xa0
[<c018cd4d>] kjournald+0xbd/0x230
[<c0114b10>] autoremove_wake_function+0x0/0x40
[<c0114b10>] autoremove_wake_function+0x0/0x40
[<c0103f16>] ret_from_fork+0x6/0x14
[<c018cc70>] commit_timeout+0x0/0x10
[<c018cc90>] kjournald+0x0/0x230
[<c01024bd>] kernel_thread_helper+0x5/0x18
Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81
<2>SoftDog: Initiating system reboot
</SNIP>

-------
Scenario 2: Mailservers:
Running XFS on mailqueue:

<SNIP>
Filesystem "sdb1": xfs_trans_delete_ail: attempting to delete a log item that
is not in the AIL
xfs_force_shutdown(sdb1,0x8) called from line 382 of file
fs/xfs/xfs_trans_ail.c. Return address = 0xc0216a56
@Linux version 2.6.9 (root@xxxxxxxxxxxxxxxx) (gcc version 2.96 20000731 (Red
Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004
</SNIP>


=======
Resolution to the mailserver problem:
2.4.28 is perfectly stable on these machines.

-------
Scenario 3: Webservers:

2.6.10, 2.6.10-ac8 (~3-12 hours uptime):

<SNIP>
Unable to handle kernel paging request
<2>SoftDog: Initiating system reboot.
<SNIP>
(No more...) :(

=======
Resolution to the webserver problem:
2.4.28/2.4.29-rc2 are stable here

-------
Scenario 4: Storageservers:
2.6.8.1:
Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses
anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on
these servers.


=======
Resolution to the storage server problem:
2.6.8.1 UP is stable (but oopses regularly after memory allocation
failures)



Hardware on all servers: IBM x335 and x345.

Mentioned errors seen on a total of 17 servers.

--

/ jakob

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/