Problems with Race condition (binfmt_misc)

Richard Guenther (richard.guenther@student.uni-tuebingen.de)
Thu, 15 May 1997 10:47:35 +0200 (MESZ)


Hi!

While attempting to make binfmt_misc stress-test-safe, I managed to fix
all races but a race that causes the following oops (it's a 2.0.30 oops,
since pre2.1.37-7 gives me no oops but deadlocks :( ):

May 14 22:26:00 localhost kernel: Unable to handle kernel paging request at virtual address c28982c0
May 14 22:26:00 localhost kernel: current->tss.cr3 = 017f7000, ^Xr3 = 017f7000
May 14 22:26:00 localhost kernel: *pde = 0009e067
May 14 22:26:00 localhost kernel: *pte = 00000000
May 14 22:26:00 localhost kernel: Oops: 0000
May 14 22:26:00 localhost kernel: CPU: 0
May 14 22:26:00 localhost kernel: EIP: 0010:[sys_read+88/176]
May 14 22:26:00 localhost kernel: EFLAGS: 00010202
May 14 22:26:00 localhost kernel: eax: 028982bc ebx: 01a978c0 ecx: 0804ceb8 edx: 00000003
May 14 22:26:00 localhost kernel: esi: ffffffea edi: 00001000 ebp: 0186e2f4 esp: 0197efa8
May 14 22:26:00 localhost kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
May 14 22:26:00 localhost kernel: Process cat (pid: 2708, process nr: 32, stackpage=0197e000)
May 14 22:26:00 localhost kernel: Stack: 01b91414 0804ceb8 00000003 bffff384 0010a5f5 00000003 0804ceb8 00001000
May 14 22:26:00 localhost kernel: 0804ceb8 00000003 bffff384 ffffffda 0000002b 0000002b 0000002b 0000002b
May 14 22:26:00 localhost kernel: 00000003 40039520 00000023 00000206 bffff380 0000002b
May 14 22:26:00 localhost kernel: Call Trace: [system_call+85/128]
May 14 22:26:00 localhost kernel: Code: 83 78 04 00 74 31 31 f6 85 ff 7e 2b 57 8b 4c 24 1c 51 6a 01

>From vmlinux:
0x1224be <sys_read+66>: movl $0xfffffff7,%esi
0x1224c3 <sys_read+71>: testb $0x1,(%ebx)
0x1224c6 <sys_read+74>: je 0x12250b <sys_read+143>
0x1224c8 <sys_read+76>: movl $0xffffffea,%esi
0x1224cd <sys_read+81>: movl 0x34(%ebx),%eax
0x1224d0 <sys_read+84>: testl %eax,%eax
0x1224d2 <sys_read+86>: je 0x12250b <sys_read+143>
0x1224d4 <sys_read+88>: cmpl $0x0,0x4(%eax)
0x1224d8 <sys_read+92>: je 0x12250b <sys_read+143>
0x1224da <sys_read+94>: xorl %esi,%esi
0x1224dc <sys_read+96>: testl %edi,%edi
0x1224de <sys_read+98>: jle 0x12250b <sys_read+143>

This is the following part from sys_read from fs/read_write.c:
error = -EBADF;
if (!(file->f_mode & 1))
goto out;
error = -EINVAL;
if (!file->f_op || !file->f_op->read)
goto out;
error = 0;
if (count <= 0)
goto out;

The Oops occures at the !file->f_op->read test, i.e. file->f_op is invalid!

The stress test procedure I use is to concurrently add, remove and cat
the status of entries from binfmt_misc.
So it's like cat opening the file, proc beginning to fill the inode but
beeing interrupted (scheduled), so that binfmt_misc can remove the proc
entry, freeing the mem. Then somebody mucks with the ram, so that proc
continuing to fill the inode writes garbage to it.
How can I avoid this race? Is this a bug in procfs (i.e. missing locking)?

Richard.

PS: Get the latest version of binfmt_misc at
http://www.anatom.uni-tuebingen.de/~richi/linux/binfmt_misc.html
(filename extension matching is implemented now!)