acpi battery: crash after inserting battery at wrong time duringhibernation

From: Alan Jenkins
Date: Sun Oct 18 2009 - 10:39:48 EST


Hi

This crash happened with 2.6.32-rc4+, but I suspect it's not a regression, just a rare race condition. As normal, I initiated hibernation, plugged in my battery, and removed the mains power. I did more or less the reverse on resume.


[87672.698198] HDA Intel 0000:00:1b.0: PCI INT A disabled
[87672.711285] pci 0000:00:02.0: PCI INT A disabled
[87672.712076] ACPI: Preparing to enter system sleep state S4
[87672.732153] PM: Saving platform NVS memory
[87672.734911] power_supply BAT0: parent PNP0C0A:00 should not be sleeping

This first error message is from device_pm_add() in drivers/base/power/main.c. It's clear what this means; BAT0 was created when the battery was inserted, even though it's parent device was supposed to be suspended. In general this sounds pretty bad - I guess it means we will suspend the system without suspending the new child device. I'm not sure why it would cause the specific backtrace below though.

[87672.763640] PM: Creating hibernation image:
[87672.764573] PM: Need to copy 56490 pages
[87672.764573] PM: Restoring platform NVS memory
[87672.764573] ACPI: Waking up from system sleep state S4

On resume, the battery was removed again, and this happens
(extracted from messages.log, which seems to miss certain standard BUG/OOPS lines).

[87673.506817] *pdpt = 00000000173b9001 *pde = 0000000000000000
[87673.507175] Modules linked in: eeepc_laptop pci_hotplug af_packet i915 drm_kms_helper drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect ipv6 loop joydev snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep ath5k snd_pcm_oss mac80211 uvcvideo snd_mixer_oss ath videodev snd_pcm v4l1_compat i2c_i801 cfg80211 snd_timer psmouse snd pcspkr i2c_core serio_raw rfkill snd_page_alloc battery ac processor evdev intel_agp video agpgart backlight output button thermal fan [last unloaded: pci_hotplug]
[87673.508520]
[87673.508520] Pid: 98, comm: kacpi_notify Not tainted (2.6.32-rc4eeepc-test #16) 701
[87673.508520] EIP: 0060:[<c02e5f4e>] EFLAGS: 00010246 CPU: 0
[87673.508520] EIP is at led_trigger_unregister+0x18/0x8a
[87673.508520] EAX: 00200200 EBX: dbec24a0 ECX: 00000000 EDX: 00100100
[87673.508520] ESI: dbec24a0 EDI: d7587a00 EBP: df12def4 ESP: df12dee8
[87673.508520] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[87673.508520] dbec24a0 00000000 d7587a00 df12df00 c02e5fcf d7587a0c df12df0c c02e168c
[87673.508520] <0> d7587a0c df12df18 c02e10bb d7587a00 df12df24 e008d04d d7587a00 df12df44
[87673.508520] <0> e008d2bd 000026c0 df12df54 c0198903 c0249319 00000081 df148800 df12df58
[87673.508520] [<c02e5fcf>] ? led_trigger_unregister_simple+0xf/0x19
[87673.508520] [<c02e168c>] ? power_supply_remove_triggers+0x14/0x4c
[87673.508520] [<c02e10bb>] ? power_supply_unregister+0x12/0x24
[87673.508520] [<e008d04d>] ? sysfs_remove_battery+0x1f/0x29 [battery]
[87673.508520] [<e008d2bd>] ? acpi_battery_update+0x3d/0x1e4 [battery]
[87673.508520] [<c0198903>] ? kmem_cache_free+0x7a/0xb1
[87673.508520] [<c0249319>] ? acpi_os_release_object+0x8/0xc
[87673.508520] [<e008d995>] ? acpi_battery_notify+0x1e/0x72 [battery]
[87673.508520] [<c024b4d2>] ? acpi_device_notify+0x12/0x15
[87673.508520] [<c0256142>] ? acpi_ev_notify_dispatch+0x4c/0x57
[87673.508520] [<c0249400>] ? acpi_os_execute_deferred+0x1d/0x28
[87673.508520] [<c013ca1a>] ? worker_thread+0x111/0x184
[87673.508520] [<c02493e3>] ? acpi_os_execute_deferred+0x0/0x28
[87673.508520] [<c013f601>] ? autoremove_wake_function+0x0/0x30
[87673.508520] [<c013c909>] ? worker_thread+0x0/0x184
[87673.508520] [<c013f472>] ? kthread+0x60/0x66
[87673.508520] [<c013f412>] ? kthread+0x0/0x66
[87673.508520] [<c0107aab>] ? kernel_thread_helper+0x7/0x10
[87673.517367] ---[ end trace a56e8fbd666eda59 ]---

My system was then rendered unusable by a storm of segfaults.

[87673.528512] pci 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
...
[87674.680592] Restarting tasks ... done.
[87674.758624] console-kit-dae[1757]: segfault at ac7dfff4 ip b76ff668 sp b74802c0 error 4 in libglib-2.0.so.0.2200.0[b769b000+b6000]
...
[87675.035585] in libglib-2.0.so.0.2200.0[b769b000+b6000]
[87696.282399] __ratelimit: 13 callbacks suppressed
...



So at minimum, we want to avoid the initial error message. We could easily stop the ACPI battery driver from doing anything if it's suspended (it will re-read the updated state on resume anyway). But perhaps the real problem is that the ACPI core calls notify() between suspend() and resume()? Should we fix that instead?

Regards
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/