Re: [BUG] 2.6.21-rc7 hpt366 driver broken

From: Mike Mattie
Date: Wed Apr 18 2007 - 00:31:23 EST


On Mon, 16 Apr 2007 20:25:15 -0700
Mike Mattie <codermattie@xxxxxxxxx> wrote:

I have added Sergei Shtylyov to the address list after seeing his recent posts on hpt366 issues, and the
git changelog for the hpt366.c driver. I am very confident that I have pinpointed the defect in the driver.

> On Mon, 16 Apr 2007 19:43:03 -0700
> Mike Mattie <codermattie@xxxxxxxxx> wrote:
>
> > On Mon, 16 Apr 2007 18:21:12 -0700
> > Mike Mattie <codermattie@xxxxxxxxx> wrote:
> >
> > > On Mon, 16 Apr 2007 16:36:13 +0200
> > > Adrian Bunk <bunk@xxxxxxxxx> wrote:
> > >
> > > > [ Cc's added, full bug report was in
> > > > http://lkml.org/lkml/2007/4/16/18 ]
> > > >
> > > > On Mon, Apr 16, 2007 at 04:38:22AM -0700, Mike Mattie wrote:
> > > > > On Sun, 15 Apr 2007 22:48:46 -0700
> > > > > Mike Mattie <codermattie@xxxxxxxxx> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I am testing the 2.6.21-rc7 kernel release. The IDE hpt366
> > > > > > driver is crashing hanging the boot. I have basically the
> > > > > > same config as 2.6.20.7 which works fine (except for
> > > > > > netconsole mentioned in a previous mail).
> > > > > >
> > > > > > here is the hand-copied info:
> > > > > >
> > > > > > * "unable to handle paging request" , null deref
> > > > > > * EIP @ init_chipset_hpt366
> > > > > >
> > > > >
> > > > > > I am running a git-bisect to see if I can resolve it to a
> > > > > > commit.
> > > > >
> > > > > This was identified as the first broken commit:
> > > > >
> > > > > commit 7b73ee05d0acb926923d43d78b61add776ea4bb1
> > > > > Author: Sergei Shtylyov <sshtylyov@xxxxxxxxxxxxx>
> > > > > Date: Wed Feb 7 18:18:16 2007 +0100
> > > > >
> > > > > hpt366: init code rewrite
> > > > >
> > > > > Reverting is conflicted so it will be a bit longer before I
> > > > > pin-point any other build-breaks.
> > > >
> > > > Thanks for your report.
> > > >
> > > > Can you use a digital camera for taking a photograph of the
> > > > crash?
> > >
> > > I can later on tonight, by about 11PM west coast. I also saw
> > > some hex offsets after the function pointed to by EIP, is there
> > > a way to decode that to a line number ? I have debugging symbols
> > > enabled.
> > >
> > > I am also doing printk breadcrumbs to pin it down to a block
> > > or a line.
> >
> > I have narrowed the crash with breadcrumbs down to these lines:
> >
> >
> > /*
> > * Only try the DPLL if we don't have a table for the PCI
> > clock that
> > * we are running at for HPT370/A, always use it for
> > anything newer... *
> > * NOTE: Using the internal DPLL results in slow reads on 33
> > MHz PCI.
> > * We also don't like using the DPLL because this causes
> > glitches
> > * on PRST-/SRST- when the state engine gets reset...
> > */
> > if (info->chip_type >= HPT374 || info->settings[clock] ==
> > NULL) { u16 f_low, delta = pci_clk < 50 ? 2 : 4;
> > int adjust;
> >
> > printk(KERN_INFO "inside the if\n");
> >
> > /*
> > * Select 66 MHz DPLL clock only if UltraATA/133
> > mode is
> > * supported/enabled, use 50 MHz DPLL clock
> > otherwise... */
> > if (info->max_mode == 0x04) {
> > dpll_clk = 66;
> > clock = ATA_CLOCK_66MHZ;
> > } else if (dpll_clk) { /* HPT36x chips don't
> > have DPLL */ dpll_clk = 50;
> > clock = ATA_CLOCK_50MHZ;
> > }
> >
> > if (info->settings[clock] == NULL) {
> ^^^^^^^^ crashes here
>
> since info is deref'd all over the place I am assuming it is the array
> that is blowing up.
>
> I printk'd the value of clock which is "4". that array is either not
> setup correctly , or it is out-of-bounds (speculation)

here on line 493: the hpt302n ( The chipset I have ) is the only struct without
a .settings field , I am extremely confident this is the exact location of the bug.

static struct hpt_info hpt302n __devinitdata = {
.chip_type = HPT302N,
.max_mode = HPT302_ALLOW_ATA133_6 ? 4 : 3,
.dpll_clk = 77,
};

I do not know enough about the HPT chips to correctly select which settings group
this field should be initialized to. Please take a look, the fix now should be very
easy.

> > printk(KERN_ERR "%s: unknown bus timing!\n",
> > name); kfree(info);
> > return -EIO;
> > }
> >
> > printk(KERN_INFO "select DPLL clock\n");
> >
> > This is right around 1171 , (skewed by the crumbs I added). The last
> > message I receive is "inside if" , it dies before "select DPLL
> > clock".
> >
> > Without knowing much about the structs I am not sure what to
> > print-out. I will narrow it further, and maybe even compare against
> > what the old working kernel had for variable values. That would take
> > some time though.
> >
> > >
> > > > cu
> > > > Adrian
> > > >
> > > > --
> > > >
> > > > "Is there not promise of rain?" Ling Tan asked suddenly
> > > > out of the darkness. There had been need of rain for many
> > > > days. "Only a promise," Lao Er said.
> > > > Pearl S. Buck - Dragon
> > > > Seed
> > > >

Attachment: signature.asc
Description: PGP signature