1.99.13: kernel NULL ptr deref. in sbpcd end_request(): CURRENT==NULL

Gonzalo Tornaria (tornaria@cmat.edu.uy)
Thu, 6 Jun 96 20:53:06 EST


While reading a file in a cdrom (sbpcd), I got lots of:

end_request: I/O error, dev 19:00, sector 858852

(different sectors in each one..) I think it's only the disk is not clean,
it happened before, and I solved just by cleaning the disk.

BUT, this time I got a NULL ptr deref. in kernel:

Unable to handle kernel NULL pointer dereference at virtual address c000000c
current->tss.cr3 = 006b1000, @r3 = 006b1000
*pde = 00102067
*pte = 00000027
Oops: 0002
CPU: 0
EIP: 0010:[<01020012>]
EFLAGS: 00010216
eax: 00000000 ebx: 00000000 ecx: 003a1de8 edx: 006ded58
esi: 00000000 edi: 003a1f4c ebp: 00000000 esp: 003a1f0c
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process cat (pid: 232, process nr: 21, stackpage=003a1000)
Stack: 00000000 00000001 003a1f4c 003a1f2c 010280ab 00000000 00000202 001a4fe0
001a4fe0 00155929 00000000 0011b03c 0019a5a8 004fb000 004fb000 003c1f80
001ff018 003a1f4c 0011b604 001a4fe0 003c1f80 00001000 004fb000 08008000
Call Trace: [<010280ab>] [<00155929>] [<0011b03c>] [<0011b604>] [<00121070>] [<0010a402>]
Code: c7 46 0c 00 00 00 00 85 ed 0f 85 8e 00 00 00 8b 46 10 50 0f

EIP: 1020012 <end_request+e/144>

Trace: 10280ab <do_sbpcd_request+203/224>
Trace: 155929 <unplug_device+25/2c>
Trace: 11b03c <__wait_on_page+70/e8>
Trace: 11b604 <generic_file_read+550/5e0>
Trace: 121070 <sys_read+80/90>
Trace: 10a402 <system_call+52/80>

end_request:
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
movl 20(%esp),%ebp
movl blk_dev+1804,%esi
EIP-> movl $0,12(%esi) ! <<<<< HERE <<<<<
testl %ebp,%ebp
jne .L1261
...

This happens to be at end_request() in include/linux/blk.h:

/* end_request() - SCSI devices have their own version */
/* - IDE drivers have their own copy too */

#if ! SCSI_MAJOR(MAJOR_NR)

#if defined(IDE_DRIVER) && !defined(_IDE_C) /* shared copy for IDE modules */
void ide_end_request(byte uptodate, ide_hwgroup_t *hwgroup);
#else

#ifdef IDE_DRIVER
void ide_end_request(byte uptodate, ide_hwgroup_t *hwgroup) {
struct request *req = hwgroup->rq;
#else
static void end_request(int uptodate) {
struct request *req = CURRENT;
#endif /* IDE_DRIVER */
struct buffer_head * bh;

req->errors = 0; /* <<<<< HERE <<<<< */
if (!uptodate) {
[...]

}
#endif /* defined(IDE_DRIVER) && !defined(_IDE_C) */
#endif /* ! SCSI_MAJOR(MAJOR_NR) */

#endif /* defined(MAJOR_NR) || defined(IDE_DRIVER) */

#endif /* _BLK_H */

Now, if I understood well the preprocesor magic, sbpcd is neither SCSI
nor IDE, so struct request *req is actually CURRENT and is NULL (CURRENT is
blk_dev[25].current_request == blk_dev+1804). So, we are calling end_request
with CURRENT==0, and that is bad.

Let's look at do_sbpcd_request in sbpcd.c:

/*==========================================================================*/
/*
* I/O request routine, called from Linux kernel.
*/
static void DO_SBPCD_REQUEST(void)
{
u_int block;
u_int nsect;
int i, status_tries, data_tries;

request_loop:
INIT_REQUEST;
sti();

if ((CURRENT == NULL) || CURRENT->rq_status == RQ_INACTIVE)
goto err_done;
[...]

err_done:
busy_data=0;
end_request(0);
sbp_sleep(0); /* wait a bit, try again */
goto request_loop;
}
/*==========================================================================*/

It seems that we test if CURRENT==NULL, but we go directly to
end_request(0)!! What's the sense of that test??

BUT, the expansion of INIT_REQUEST is:

#define INIT_REQUEST \
if (!CURRENT) {\
CLEAR_INTR; \
return; \
} \
if (MAJOR(CURRENT->rq_dev) != MAJOR_NR) \
panic(DEVICE_NAME ": request list destroyed"); \
if (CURRENT->bh) { \
if (!buffer_locked(CURRENT->bh)) \
panic(DEVICE_NAME ": block not locked"); \
}

#endif /* (MAJOR_NR != SCSI_TAPE_MAJOR) && !defined(IDE_DRIVER) */

i.e. if CURRENT==NULL, we would have returned in INIT_REQUEST!!!

It has no sense to me.. Could be that something is modifying CURRENT at
our back? (when we call sbp_sleep(), which calls schedule()?).

In any case, what's the (CURRENT==NULL) test for?? I think at this point
we already know that CURRENT!=NULL (we have just done INIT_REQUEST). And,
if indeed CURRENT were NULL, we would be going directly to hell... (well, not
so bad.. I haven't rebooted yet, and system seems stable, and CD works).

I've looked at the asm, and it seems ok to me..

This happened once, I've not been able to reproduce it..

I'm using gcc 2.7.2, kernel compiled for 486 (procesor is AMD DX4/120).
The cdrom is a creative labs (panasonic) 2x on a sb16. Mail me if you want
more details, or I can help in something...

Gonzalo