Re: [OOPS] 2.6.9-rc4, dual Opteron, NUMA, 8GB

From: Randy.Dunlap
Date: Wed Oct 13 2004 - 15:43:44 EST


Brad Fitzpatrick wrote:
On Wed, 13 Oct 2004, Randy.Dunlap wrote:


Brad Fitzpatrick wrote:

On Wed, 13 Oct 2004, Jeff Garzik wrote:



Brad Fitzpatrick wrote:


I'm reporting an oops. Details follow.

I have two of these machines. I will happily be anybody's guinea pig
to debug this. (more details, access to machine, try patches, kernels...)
Machines aren't in production.

- Brad


Kernel: 2.6.9-rc4 vanilla (.config below)

Hardware: IBM eServer 325, Dual Opteron 8GB ram (more info below)

Pre-crash and crash:

a1:~# mke2fs /dev/mapper/raid10-data
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
25608192 inodes, 51200000 blocks
2560000 blocks (5.00%) reserved for the super user
First data block=0
1563 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: 1091/1563
Message from syslogd@localhost at Wed Oct 13 11:46:01 2004 ...
localhost kernel: Oops: 0000 [1] SMP

Message from syslogd@localhost at Wed Oct 13 11:46:01 2004 ...
localhost kernel: CR2: 0000000000001770


What's your block device configuration? What block devices are sitting
on top of what other block devices?


/dev/mapper/raid10-data is a LV taking 200GB of a 280GB VG ("raid10") with
a single PV in it: /dev/sdb1 -- ips driver, IBM ServeRAID 6M card,
representing a RAID 10 atop 8 SCSI disks.

I just made a new kernel without NUMA and made a filesystem on /dev/sdb1
directly instead of using LVM and it worked fine, if not a little slowly.

Now that I know it /can/ work, I'll try and narrow down whose fault it is:
NUMA or LVM.

Very similar to
http://marc.theaimsgroup.com/?l=linux-kernel&m=109328505204081&w=2
and its follow-up:
http://marc.theaimsgroup.com/?l=linux-kernel&m=109330259511819&w=2

but no solutions there.


Well, good to know I'm not alone? :-)

I was just about to mail and report that disabling NUMA does help:

NUMA + mke2fs on LVM: OOPS (mailed earlier)
no NUMA + mke2fs on LVM: okay
NUMA + mke2fs on sdb1: OOPS (below)
no NUMA + mke2fs on sdb1: okay

no NUMA + mount e2fs on LVM: okay
no NUMA + mount e2fs on sb1: okay
NUMA + mount e2fs on LVM: okay
NUMA + mount e2fs on sb1: untested, assume okay


OOPs when doing mke2fs on /dev/sdb1, with NUMA enabled:

Oct 13 13:24:37 localhost kernel: Unable to handle kernel paging request at 0000000000001770 RIP:
Oct 13 13:24:37 localhost kernel: <ffffffff8015efe4>{kmem_getpages+132}
Oct 13 13:24:37 localhost kernel: PML4 1f8fe6067 PGD 1f8fef067 PMD 0
Oct 13 13:24:37 localhost kernel: Oops: 0000 [1] SMP
Oct 13 13:24:37 localhost kernel: CPU 0
Oct 13 13:24:37 localhost kernel: Modules linked in: af_packet tsdev mousedev joydev usbhid ohci_hcd hw_random amd74xx evdev tg3 dm_mod ide_generic ide_cd ide_core cdrom rtc ext3 jbd mbcache sd_mod ips mptscsih mptbase scsi_mod unix
Oct 13 13:24:37 localhost kernel: Pid: 3145, comm: mke2fs Not tainted 2.6.9-rc4
Oct 13 13:24:37 localhost kernel: RIP: 0010:[kmem_getpages+132/432] <ffffffff8015efe4>{kmem_getpages+132}
Oct 13 13:24:37 localhost kernel: RSP: 0018:00000101f81b7aa8 EFLAGS: 00010213
Oct 13 13:24:37 localhost kernel: RAX: ffffffff7fffffff RBX: 00000101fffc9680 RCX: 0000000000000000
Oct 13 13:24:37 localhost kernel: RDX: 0000010000011700 RSI: 00000100000119c0 RDI: 0000010000012500
Oct 13 13:24:37 localhost kernel: RBP: 00000101fffc9680 R08: 000001016bc01000 R09: 00000101fffc96e8
Oct 13 13:24:37 localhost kernel: R10: 0000000000000000 R11: 00000000fffffffa R12: 00000101fffc9680
Oct 13 13:24:37 localhost kernel: R13: 0000000000000000 R14: 00000101fffc9728 R15: 0000000000000001
Oct 13 13:24:37 localhost kernel: FS: 0000002a95ddb4a0(0000) GS:ffffffff803df300(0000) knlGS:0000000000000000
Oct 13 13:24:37 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 13 13:24:37 localhost kernel: CR2: 0000000000001770 CR3: 0000000000101000 CR4: 00000000000006e0
Oct 13 13:24:37 localhost kernel: Process mke2fs (pid: 3145, threadinfo 00000101f81b6000, task 00000101fe9b4030)
Oct 13 13:24:37 localhost kernel: Stack: 000001016c3fa000 0000000000000000 0000000000000050 ffffffff8015ff6e
Oct 13 13:24:37 localhost kernel: 0000005000000010 000000000000003c 00000100fbf6b000 00000101fffc9680
Oct 13 13:24:37 localhost kernel: 00000101fffc96c8 00000101fffc9728
Oct 13 13:24:37 localhost kernel: Call Trace:<ffffffff8015ff6e>{cache_grow+190} <ffffffff801601c6>{cache_alloc_refill+422}
Oct 13 13:24:37 localhost kernel: <ffffffff801604b6>{kmem_cache_alloc+54} <ffffffff8017d761>{alloc_buffer_head+17}
Oct 13 13:24:37 localhost kernel: <ffffffff8017aeba>{create_buffers+42} <ffffffff8017b884>{create_empty_buffers+20}
Oct 13 13:24:37 localhost kernel: <ffffffff8017bcdf>{__block_prepare_write+175} <ffffffff80180190>{blkdev_get_block+0}
Oct 13 13:24:37 localhost kernel: <ffffffff8017c78a>{block_prepare_write+26} <ffffffff80158dc4>{generic_file_buffered_write+404}
Oct 13 13:24:37 localhost kernel: <ffffffff80193fae>{inode_update_time+158} <ffffffff801594dd>{generic_file_aio_write_nolock+765}
Oct 13 13:24:37 localhost kernel: <ffffffff801595b5>{generic_file_write_nolock+165} <ffffffff80134ef3>{__wake_up+67}
Oct 13 13:24:37 localhost kernel: <ffffffff802a46fe>{thread_return+41} <ffffffff80136890>{autoremove_wake_function+0}
Oct 13 13:24:37 localhost kernel: <ffffffff8018128a>{blkdev_file_write+26} <ffffffff801789e4>{vfs_write+228}
Oct 13 13:24:37 localhost kernel: <ffffffff80178b13>{sys_write+83} <ffffffff8011195a>{system_call+126}
Oct 13 13:24:37 localhost kernel:
Oct 13 13:24:37 localhost kernel:
Oct 13 13:24:37 localhost kernel: Code: 48 8b 91 70 17 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
Oct 13 13:24:37 localhost kernel: RIP <ffffffff8015efe4>{kmem_getpages+132} RSP <00000101f81b7aa8>
Oct 13 13:24:37 localhost kernel: CR2: 0000000000001770



Randy, if you're interested and you're actually at OSDL Beaverton, I'm
just across the street from you. I could carry this 1U server and 3U
drive cabinet over to you! :)

I am at the Round in Beaverton. It might be easier for me to go to
the server. :)

Who's responsible for the K8_NUMA stuff? I'd love to work with them to
narrow this down.

Andi Kleen (SUSE). Copied.

--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/