Re: 2.4.28 Oops in fs/locks.c:time_out_leases()

From: Marcelo Tosatti
Date: Tue Jan 18 2005 - 16:12:35 EST


On Tue, Jan 18, 2005 at 01:32:53PM +0100, Frank van Maarseveen wrote:
> got an Oops the same time for 9 days, at the same EIP:
>
> ksymoops 2.4.9 on i686 2.4.28-x97. Options used
> -V (default)
> -k /proc/ksyms (default)
> -l /proc/modules (default)
> -o /lib/modules/2.4.28-x97/ (default)
> -m /boot/System.map-2.4.28-x97 (default)
>
> kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 0000003c
> kernel: c0153cb8
> kernel: *pde = 00000000
> kernel: Oops: 0000
> kernel: CPU: 0
> kernel: EIP: 0010:[time_out_leases+24/128] Not tainted
> kernel: EFLAGS: 00010202
> kernel: eax: c1b42a14 ebx: 00000010 ecx: 00000000 edx: c349139c
> kernel: esi: c1b42ad0 edi: c06f6000 ebp: c06f7f0c esp: c06f7f04
> kernel: ds: 0018 es: 0018 ss: 0018
> kernel: Process find (pid: 17476, stackpage=c06f7000)
> kernel: Stack: c1b42a14 00018801 c06f7f38 c0153d79 c1b42a14 00000000 00000000 c06f7f38
> kernel: 00000000 c349139c ffffffff 00018801 c1b42a14 c06f7f64 c014d8c7 c1b42a14
> kernel: 00018801 00000000 00000000 00000004 c2f51898 00018800 c160f000 080665cb
> kernel: Call Trace: [__get_lease+89/640] [open_namei+487/1376] [filp_open+47/80] [sys_open+61/160] [system_call+51/64]
> kernel: Code: f6 43 2c 20 74 23 f6 43 2d 10 74 1d 8b 53 50 85 d2 75 1d 89
> Using defaults from ksymoops -t elf32-i386 -a i386

Frank,

I strongly suspect you are hitting a physical memory problem:

> Hi Frank,
> Can you please do
> gdb vmlinux
> disassemble time_out_leases

(gdb) disas time_out_leases
Dump of assembler code for function time_out_leases:
0xc0153ca0 <time_out_leases+0>: push %ebp
0xc0153ca1 <time_out_leases+1>: mov %esp,%ebp
0xc0153ca3 <time_out_leases+3>: push %esi
0xc0153ca4 <time_out_leases+4>: push %ebx
0xc0153ca5 <time_out_leases+5>: mov 0x8(%ebp),%eax
0xc0153ca8 <time_out_leases+8>: mov 0xbc(%eax),%ebx
0xc0153cae <time_out_leases+14>: lea 0xbc(%eax),%esi
0xc0153cb4 <time_out_leases+20>: test %ebx,%ebx
0xc0153cb6 <time_out_leases+22>: je 0xc0153ce1 <time_out_leases+65>
0xc0153cb8 <time_out_leases+24>: testb $0x20,0x2c(%ebx) <---------- (1)
0xc0153cbc <time_out_leases+28>: je 0xc0153ce1 <time_out_leases+65>
0xc0153cbe <time_out_leases+30>: testb $0x10,0x2d(%ebx) <---------- (2)
0xc0153cc2 <time_out_leases+34>: je 0xc0153ce1 <time_out_leases+65>
0xc0153cc4 <time_out_leases+36>: mov 0x50(%ebx),%edx <--------- (OOPS)
0xc0153cc7 <time_out_leases+39>: test %edx,%edx

The ebx register holds the "struct file_lock *fl" pointer, which is accessed twice
a few instructions before at (1) and (2).

Suddenly %ebx contains "00000010" (zero with fifth bit flipped) and the kernel crashes.


static void time_out_leases(struct inode *inode)
{
struct file_lock **before;
struct file_lock *fl;

before = &inode->i_flock;
while ((fl = *before) && (fl->fl_flags & FL_LEASE)
&& (fl->fl_type & F_INPROGRESS)) {
if ((fl->fl_break_time == 0)
|| time_before(jiffies, fl->fl_break_time)) {
before = &fl->fl_next;
continue;
}


I suggest you to run memtest86 on this box.


>
>
> >>eax; c1b42a14 <_end+164147c/4347ac8>
> >>edx; c349139c <_end+2f8fe04/4347ac8>
> >>esi; c1b42ad0 <_end+1641538/4347ac8>
> >>edi; c06f6000 <_end+1f4a68/4347ac8>
> >>ebp; c06f7f0c <_end+1f6974/4347ac8>
> >>esp; c06f7f04 <_end+1f696c/4347ac8>
>
> Code; 00000000 Before first symbol
> 00000000 <_EIP>:
> Code; 00000000 Before first symbol
> 0: f6 43 2c 20 testb $0x20,0x2c(%ebx)
> Code; 00000004 Before first symbol
> 4: 74 23 je 29 <_EIP+0x29>
> Code; 00000006 Before first symbol
> 6: f6 43 2d 10 testb $0x10,0x2d(%ebx)
> Code; 0000000a Before first symbol
> a: 74 1d je 29 <_EIP+0x29>
> Code; 0000000c Before first symbol
> c: 8b 53 50 mov 0x50(%ebx),%edx
> Code; 0000000f Before first symbol
> f: 85 d2 test %edx,%edx
> Code; 00000011 Before first symbol
> 11: 75 1d jne 30 <_EIP+0x30>
> Code; 00000013 Before first symbol
> 13: 89 00 mov %eax,(%eax)
>
>
> Details:
> - compiled with gcc version 3.3.4 (Debian 1:3.3.4-3)
> - only ext3 and NFSv3 mounts (and automounter).
> - The "find" causing the oops appears to be started from cron.daily
> and only touches local filesystems.
> - SMP kernel running on UP (pentium II)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/