Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()

From: Satyam Sharma
Date: Sun Jul 22 2007 - 13:21:14 EST


Hi Walter,

Thanks for reporting this.

On 7/22/07, walter harms <wharms@xxxxxx> wrote:
hello all,
on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21

Did this happen when you were resuming from a suspend-to-ram/disk?
[ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]

....
Using IPI Shortcut mode
WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
[<c01ac87e>] blk_remove_plug+0x36/0x5a
[<c01ac8b6>] __generic_unplug_device+0x14/0x1f
[<c01ad587>] __make_request+0x39b/0x49c
[<c01abc8c>] generic_make_request+0x228/0x255
[<c01adb54>] submit_bio+0xa5/0xac
[<c013e233>] mempool_alloc+0x37/0xae
[<c01314dc>] submit+0xc2/0x11d
[<c0131585>] bio_read_page+0x24/0x27
[<c013188b>] swsusp_check+0x4f/0xaf
[<c012f6c2>] software_resume+0x5f/0x108
[<c037867e>] kernel_init+0xb0/0x212
[<c0103a16>] ret_from_fork+0x6/0x1c
[<c03785ce>] kernel_init+0x0/0x212
[<c03785ce>] kernel_init+0x0/0x212
[<c010465b>] kernel_thread_helper+0x7/0x10
=======================

Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
alright on that codepath. OTOH, __make_request() is heavily goto-driven,
uses the non-save/restore variants of spin_lock_irq, and does not even
balance locks / unlocks for some error paths ... gaah.

Freeing unused kernel memory: 272k freed
....

I attached two files with the kernel bootmessage

1. bug.txt.gz
kernel 2.6.22.1 created with netconsole, last lines were missing i added them by hand

2. out.txt.gz
simple a dmesg from kernel 2.6.18, works fine

I've reattached them in this mail, for linux-pm to see.

additional observations:
removing 'resume=/dev/hda1' from grub lets the crash disappear but the
drive still does not work because
ata_id[772]: main: HDIO_GET_IDENTITY failed for '/dev/.tmp-3-0'

If you're resuming from a suspend I don't see how you can avoid
giving the resume= parameter (unless you somehow specify some
default at build-time ...)

there is a bug in the hd detection code:
2.6.22.1
hda: 8032MB, CHS=1024/255/63
hda: hda1 hda2 hda3
hda: p2 exceeds device capacity
hda: p3 exceeds device capacity

2.6.18
hda: 39070080 sectors (20003 MB), CHS=38760/16/63, UDMA(100)
hda: cache flushes supported
hda: hda1 hda2 hda3
Probing IDE interface ide1...


here the output from fdisk:

fdisk -l

Disk /dev/hda: 20.0 GB, 20003880960 bytes
255 heads, 63 sectors/track, 2432 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 1 131 1052226 82 Linux swap / Solaris
/dev/hda2 * 132 1437 10490445 83 Linux
/dev/hda3 1438 2432 7992337+ 83 Linux



is it possible that this leads to a chain of events causing the crash in ll_rw_blk.c ?

if someone is interested in the .config please mail me directly, i have not
subscribed the lkml.

Yes, please post your .config also.

Thanks,
Satyam

Attachment: bug.txt.gz
Description: GNU Zip compressed data

Attachment: out.txt.gz
Description: GNU Zip compressed data