Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered

From: Ian Kumlien
Date: Sat Dec 13 2003 - 18:22:54 EST


On Sun, 2003-12-14 at 00:16, Ross Dickson wrote:
> On Sunday 14 December 2003 08:28, you wrote:
> > On Sat, 2003-12-13 at 19:07, Ross Dickson wrote:
> > > ..APIC TIMER ack delay, reload:16701, safe:16691
> >
> > calibrating APIC timer ...
> > ..... CPU clock speed is 2079.0146 MHz.
> > ..... host bus clock speed is 332.0663 MHz.
> > NET: Registered protocol family 16
> > ..APIC TIMER ack delay, reload:20791, safe:20779
> > ..APIC TIMER ack delay, predelay count: 20769
> > ..APIC TIMER ack delay, predelay count: 20786
> > ..APIC TIMER ack delay, predelay count: 20716
> > ..APIC TIMER ack delay, predelay count: 20731
> > ..APIC TIMER ack delay, predelay count: 20747
> > ..APIC TIMER ack delay, predelay count: 20762
> > ..APIC TIMER ack delay, predelay count: 20780
> > ..APIC TIMER ack delay, predelay count: 20729
> > ..APIC TIMER ack delay, predelay count: 20740
> > ..APIC TIMER ack delay, predelay count: 20757
>
> Thanks Ian.
> From this we see your local apic is indeed counting 1.2 times faster than mine
> ratio of 333/266 fsb. So the reload:20791 - safe:20779 gives 12 counts time.
> Given 20791 is 1ms on your system then your 12 counts is 577ns
> But more importantly from the ack delay theory as your machine like mine is
> prone to lockups then a lockup could likely have occured at count:20786 having
> only 240ns time expired. Next worst case was less likely to lockup at count:20780.

I just had a lockup running with preempt, now trying with preempt
disabled. This is a clean 2.6.0-test11 with just io-apic and apic v2
patches.

> The only ones any delay would have been added to by the patch would be the
> count:20786 and count:20780 and it would have been just enough to wait until
> the counter got below the safe:20779 so the patch contributes little overhead.

> > Survived my greptest which no non patched kernel has ever done on this
> > machine.
> >
> > Has anyone got that extended ringbuffer to work? I haven't been able to
> > get a complete "boot" dmesg in ages because of all the output all the
> > drivers make... Does it need a updated dmesg?
>
> This may be what you have already tried:
> I am not sure where it is in the 2.6 config or indeed if it is different but it is
> CONFIG_LOG_BUF_SHIFT under kernel hacking on 2.4.23 maybe try 16 for 64K.
> To match dmesg output try
>
> dmesg -s65536
>
> (unless dmesg can automatically pick up the expanded ring buffer size on 2.6?)

Ahhh great!, no, it doesn't auto detect it... Maybe there is a newer
version, i hate mdk for being so nice to new versions and ignoring the
old.

--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

Attachment: signature.asc
Description: This is a digitally signed message part