Probable regression: extremely high IOWAIT on system with Iomega ZIP drive (Parallel ATA interface) after 3.16->3.17 kernel upgrade

From: Sergio Callegari
Date: Mon Jun 08 2015 - 10:10:22 EST


Hi,

I am experiencing a weird issue on an AMD Phenom II system with an AsRock
N68S motherboard (NVIDIA GeForce 7025 / nForce 630a chipset). The system has
an Iomega Zip 100 drive attached via an IDE connector - not exactly recent
hardware.

Everything was working fine up to kernel 3.16.x.

After a kernel upgrade, I occasionally see the system IoWait jumping high
and staying consistently high (~ 50%). Typically this occurs between a few
minutes and a few hours after boot.

The high iowait is also coupled to the kernel detecting processes hanging.

In fact, to see a process hanging, it is sufficient to try mounting a disk
placed in the zip drive. The mount command does not exit. Interestingly, if
I try to mount the zip drive /before/ the iowait jumps high, it is mounted
just fine.

The high iowait seems to be associated to no output in dmesg/syslog.
However, when the mount process hangs, the following output is produced:

[11877.606063] INFO: task mount:14652 blocked for more than 120 seconds.
[11877.606077] Tainted: P C OE 3.19.0-18-generic #18-Ubuntu
[11877.606082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[11877.606088] mount D ffff88006ad038f8 0 14652 14651 0x00000000
[11877.606099] ffff88006ad038f8 ffff880062ecebf0 0000000000014200
ffff88006ad03fd8
[11877.606108] 0000000000014200 ffff88011a818000 ffff880062ecebf0
ffff88011fcd4200
[11877.606115] ffff88006ad03a50 7fffffffffffffff ffff88006ad03a48
ffff880062ecebf0
[11877.606121] Call Trace:
[11877.606139] [<ffffffff817c4f99>] schedule+0x29/0x70
[11877.606149] [<ffffffff817c857c>] schedule_timeout+0x20c/0x280
[11877.606161] [<ffffffff8109ed1d>] ? ttwu_do_activate.constprop.94+0x5d/0x70
[11877.606169] [<ffffffff810a1c19>] ? try_to_wake_up+0x1e9/0x340
[11877.606178] [<ffffffff817c6954>] wait_for_completion+0xa4/0x170
[11877.606183] [<ffffffff810a1de0>] ? wake_up_state+0x20/0x20
[11877.606191] [<ffffffff8108ef1a>] flush_work+0xea/0x1c0
[11877.606200] [<ffffffff8108bb10>] ? destroy_worker+0xa0/0xa0
[11877.606206] [<ffffffff8108f0f8>] __cancel_work_timer+0x98/0x1b0
[11877.606214] [<ffffffff813949f1>] ? exact_lock+0x11/0x20
[11877.606223] [<ffffffff81509d72>] ? kobj_lookup+0x112/0x170
[11877.606230] [<ffffffff813939f0>] ? disk_map_sector_rcu+0x80/0x80
[11877.606237] [<ffffffff8108f243>] cancel_delayed_work_sync+0x13/0x20
[11877.606243] [<ffffffff81395991>] disk_block_events+0x81/0x90
[11877.606252] [<ffffffff8122d64b>] __blkdev_get+0x5b/0x490
[11877.606259] [<ffffffff8122dac1>] blkdev_get+0x41/0x390
[11877.606266] [<ffffffff8122de70>] ? blkdev_get_by_dev+0x60/0x60
[11877.606273] [<ffffffff8122decf>] blkdev_open+0x5f/0x90
[11877.606281] [<ffffffff811f0d82>] do_dentry_open+0x1d2/0x330
[11877.606288] [<ffffffff811f1049>] vfs_open+0x49/0x50
[11877.606296] [<ffffffff81201b47>] do_last+0x227/0x12c0
[11877.606305] [<ffffffff812041e8>] path_openat+0x88/0x610
[11877.606313] [<ffffffff8120598a>] do_filp_open+0x3a/0xb0
[11877.606320] [<ffffffff81212777>] ? __alloc_fd+0xa7/0x130
[11877.606328] [<ffffffff811f299a>] do_sys_open+0x12a/0x280
[11877.606334] [<ffffffff810963ef>] ? __put_cred+0x3f/0x60
[11877.606341] [<ffffffff811f1e70>] ? SyS_access+0x1c0/0x210
[11877.606348] [<ffffffff811f2b0e>] SyS_open+0x1e/0x20
[11877.606356] [<ffffffff817c990d>] system_call_fastpath+0x16/0x1b

When the high Iowait occurs, it often gets impossible to cleanly shutdown
the machine and a hard reset is required. Similarly, with the high iowait it
gets hard to test new kernels since the makeinitramfs or the grub update
phases hang forever.

Detaching the Iomega drive from the system seems to stop the issue.

I have verified that the issue does not exist with kernel 3.16.x by trying
3.16.7. However, the issue is present in 3.17.x, 3.18.x and 3.19.x.

I wonder if someone can point out what changes that could be related to the
issue have been introduced in the 3.16->3.17 transition and what to test to
try to isolate the regression.

Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/