Re: 2.6.10-rc2 doesn't boot (if no floppy device)

From: Len Brown
Date: Mon Nov 22 2004 - 14:16:05 EST


On Sat, 2004-11-20 at 11:41, Linus Torvalds wrote:
>
> On Sat, 20 Nov 2004, Len Brown wrote:
> >
> > It clears the ELCR on Linux boot.
>
> I think this is _really_ wrong.
>
> Basically, you're screwing up more and more PIC state.
>
> Len, the PIC was _correct_ before ACPI touched it. We don't want to
> touch it MORE, we want to touch it LESS.
>
> I'll try this for debugging, but what I want to figure out is where
> ACPI is doing something it shouldn't be doing, and _removing_ that.
>
> We already had one patch where people tried to hide this problem by
> adding more code. Clearly, that patch was bogus. Yes, it hid the
> problem for floppies, but as shown by my other case (and as I was
> trying to say from the beginning), it's not about floppies. It's about
> _any_ non-PCI interrupt that apparently ACPI has done something
> _wrong_ for.

I agree that the system should work properly even if the legacy device
drivers are broken. Please understand, however, that the legacy device
drivers _are_ broken. The BIOS via ACPI clearly tells them if the
devices are present or not, and Linux isn't yet listening.

> So ACPI seems to assume that all interrupts are PCI interrupts, and
> that's just totally wrong. Clearing ELCR is more of this total
> wrongness. ELCR exists for a reason, namely that not all interrupts
> are PCI.

ACPI-compliant systems have three types of interrupts:
1. legacy
2. PCI
3. the ACPI SCI

The first two are described in the DSDT legacy devices and _PRT,
respectively. The third is described in the FADT. The MADT overrides
are available to handle any special cases, though that applies only to
IOAPIC mode.

If there are other interrupts, then it isn't an ACPI-compliant system
and it the BIOS should not enable ACPI. If the BIOS erroneously enables
ACPI on such a system, the workaround is to boot with acpi=off. I'd be
extremly interested to know of such a system, as I've not yet
encountered one.

> Also, you seem to still totally concentrate on PIRQ routing etc.
> Totally ignoring the fact that the problematic cases are about
> interrupts that have _nothing_ to do with PCI. Not the floppy, not the
> PS/2 mouse. NOT PCI! They're both on the southbridge behind a very
> special interface that may or may not look like a PCI bus internally,
> but might quite as well be something totally special-case (ie a
> perfectly normal case is that somebody literally just bolted an old
> 8042 controller core into the system and set up special case magic irq
> routing).

If somebody bolts motherboard hardware on and doesn't tell ACPI about
it, then they need to disable ACPI, which _owns_ configuration of
motherboard devices when it is enabled.

The problem at hand has everything to do with PCI interrupts, and how
they can conflict with legacy interrupts.

PIC hardware is level-HIGH sensitive, it cannot be programmed like
APIC INTIN's can. The only way to effectively use it as level-LOW
sensitive such as that supplied by PCI devices, it so attach those
interrupts sources through inverters. This is what the PIRQ routers
do. I printed out the underlying PIRQ routers for the ICH in
the debug patch because all of the failures at hand seemed
to be in ICH systems and these registers tell us the state
not of the abstract PCI Interrupt link, but of the actual hardware that
can be driving that (legacy) interrupt input.

> > ps. what I think is happening...
> >
> > To its credit, he BIOS correctly recognizes that there is
> > no floppy, and it routes a PIRQ to IRQ6. It correctly sets the
> > ELCR bit for this IRQ.
> >
> > Linux boots and disables all the PCI Interrupt Links,
> > which un-programs the PIRQ directed to IRQ6.
>
> And this is what I think is the bug. There is no reason to disable the
> PCI interrupt link unless you have a damn good reason to do so.

The damn good reason is that doing otherwise breaks systems.
This is the cset comment for the line of code disabling the links:

ChangeSet 1.1608.11.11 2004/06/17 23:21:03 len.brown@xxxxxxxxx
[ACPI] avoid spurious interrupts on VIA
http://bugzilla.kernel.org/show_bug.cgi?id=2243
drivers/acpi/pci_link.c 1.28.1.1 2004/06/11 10:38:46 len.brown@xxxxxxxxx
disable all PCI Interrupt Links to be enabled by _SRS

It would sure make my life easier if we didn't support
these VIA/Phoenix systems, but I don't think that breaking
them is what the community wants.

> > However, Linux doesn't clear the ELCR first,
> > and for some reason that causes an interrupt
> > to latch in IRQ6 -- though it is masked.
> >
> > Along comes the broken floppy driver before
> > the PCI devices probe. floppy
> > doesn't realize there is no hardware and
> > unwittingly does a request_irq(6).
>
> You are totally ignoring my other bug report which was for a
> (existing) PSAUX mouse driver on irq12.
>
> If I had had a mouse on that port, it would not have worked.
>
> So the fact is, ACPI does something WRONG.

The PS2 IRQ12 situation is exactly the same as the IRQ6 floppy
situation. If the mouse or floppy were present, the BIOS would not have
given that interrupt to PCI.

-Len


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/