Re: mmotm 2009-07-16-14-32 - sudden OOPS at boot in ACPI code

From: Lin Ming
Date: Tue Jul 21 2009 - 01:34:06 EST



> From: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx>
> Date: Tue, Jul 21, 2009 at 11:33 AM
> Subject: Re: mmotm 2009-07-16-14-32 - sudden OOPS at boot in ACPI code
> To: Valdis.Kletnieks@xxxxxx
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Bob Moore
> <robert.moore@xxxxxxxxx>, Len Brown <lenb@xxxxxxxxxx>,
> linux-kernel@xxxxxxxxxxxxxxx, linux-acpi@xxxxxxxxxxxxxxx
>
>
> On Mon, 20 Jul 2009, Valdis.Kletnieks@xxxxxx wrote:
> > On Thu, 16 Jul 2009 14:34:02 PDT, akpm@xxxxxxxxxxxxxxxxxxxx said:
> > > The mm-of-the-moment snapshot 2009-07-16-14-32 has been uploaded to
> >
> > Dies a horrid death during early boot. Dell Latitude D820, and this graphics:
> >
> > 01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1)
>
> Oh yes, I was getting just the same with Intel graphics (i915);
> but promptly forgot about it once I'd a workaround in place,
> and moved on to other things, sorry.
>
> >
> > Traceback (hand-copied from a very crappy cell-phone picture)
> >
> > strcmp+0x4/0x1f
> > acpi_device+probe+0xac/0x13e
> > driver_probe_device+0xc9/0x14e
> > __driver_attach+0x58/0x7c
> > ? __driver_attach+0x58/0x7c
> > ? __driver_attach+0x58/0x7c
> > bus_for_each_dev+0x54/0x89
> > driver_attach+0x19/0x1b
> > bus_add_driver+0xv4/0x1fe
> > driver_register+0xb7/0x128
> > ? acpi_video_init+0x0/0x17
> > acpi_bus_register_driver+0x3e/0x42
> > acpi_video_register+0x42/0x6e
> > acpi_video_init+0x15/0x17
> > do_one_initcall+0x56/0x130
> >
> > Analysis shows it's the following code from (inlined) acpi_device_install_notify_handler
> >
> > static int acpi_device_install_notify_handler(struct acpi_device *device)
> > {
> > acpi_status status;
> > char *hid;
> >
> > hid = acpi_device_hid(device);
> > if (!strcmp(hid, ACPI_BUTTON_HID_POWERF))
> >
> > but we never check if hid is non-trash before feeding it to strcmp. Looks
> > like something in this linux-next commit is involved:
> >
> > commit ed444824932d2a563858d82ec1ea29b0aa775e91
> > Author: Bob Moore <robert.moore@xxxxxxxxx>
> > Date: Mon Jun 29 13:39:29 2009 +0800
> >
> > I suspect something in acpi_get_object_info() is going astray, causing
> > acpi_device_set_id() to set the ->pnp.hardware_id to NULL in this code:
> >
> > if (hid) {
> > device->pnp.hardware_id = ACPI_ALLOCATE_ZEROED(strlen (hid) + 1);
> > if (device->pnp.hardware_id) {
> > strcpy(device->pnp.hardware_id, hid);
> > device->flags.hardware_id = 1;
> > }
> > } else
> > device->pnp.hardware_id = NULL;
> >
> > The else clause is new in this commit.
>
> I think pnp.hardware_id has changed from being a builtin array to
> an allocated pointer: so before there was always a zeroed array to

Yes, pnp.hardware_id and pnp.unique_id are now allocated pointer.
We made the change for acpi_get_object_info interface.

> strcmp against, whereas now there's a NULL pointer if you come to
> use acpi_device_install_notify_handler() "too early".
>
> Patch that works for me at the bottom.

Yes,
your patch can workaround the problem in
acpi_device_install_notify_handler.

But there are other places call strcmp to compare HID/UID.
So we'd better fix acpi_device_hid/_uid as below,

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 6e83a68..6c64366 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -188,8 +188,8 @@ struct acpi_device_pnp {

#define acpi_device_bid(d) ((d)->pnp.bus_id)
#define acpi_device_adr(d) ((d)->pnp.bus_address)
-#define acpi_device_hid(d) ((d)->pnp.hardware_id)
-#define acpi_device_uid(d) ((d)->pnp.unique_id)
+#define acpi_device_hid(d) ((d)->pnp.hardware_id ? (d)->pnp.hardware_id : "\0")
+#define acpi_device_uid(d) ((d)->pnp.unique_id ? (d)->pnp.unique_id : "\0")
#define acpi_device_name(d) ((d)->pnp.device_name)
#define acpi_device_class(d) ((d)->pnp.device_class)

---
Thanks,
Lin Ming

>
> >
> > Looking at the old code, it *may* be that the ACPI code on my laptop is just
> > busticated and/or there's no _HID method for the graphics card, and the old
> > code Just Happened To Work in previous kernels because ->pnp.hardware_id
> > wouldn't actually get set *at all* in acpi_device_set_id, so we'd get random
> > stale data that was bogus, but didn't give strcmp() indigestion...
> >
> > Any wisdom on debugging this further (including how to tell if the ACPI
> > tables have a sane _HID method for the graphics card) would be appreciated...
> >
> > Or is the correct fix in fact to just add a 'if (!hid) return -EINVAL;' to
> > acpi_device_install_notify_handler()?
>
> [PATCH mmotm] acpi: work around NULL hardware_id
>
> Work around NULL pnp.hardware_id in acpi_device_install_notify_handler()
> when probing video device.
>
> Signed-off-by: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx>
> ---
> Signoff provided to handle the unlikely event that this hack
> is actually the right fix!
>
> drivers/acpi/scan.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- mmotm/drivers/acpi/scan.c 2009-07-17 12:53:20.000000000 +0100
> +++ linux/drivers/acpi/scan.c 2009-07-17 21:19:10.000000000 +0100
> @@ -376,12 +376,12 @@ static int acpi_device_install_notify_ha
> char *hid;
>
> hid = acpi_device_hid(device);
> - if (!strcmp(hid, ACPI_BUTTON_HID_POWERF))
> + if (hid && !strcmp(hid, ACPI_BUTTON_HID_POWERF))
> status =
> acpi_install_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
> acpi_device_notify_fixed,
> device);
> - else if (!strcmp(hid, ACPI_BUTTON_HID_SLEEPF))
> + else if (hid && !strcmp(hid, ACPI_BUTTON_HID_SLEEPF))
> status =
> acpi_install_fixed_event_handler(ACPI_EVENT_SLEEP_BUTTON,
> acpi_device_notify_fixed,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/