Re: Reworking suspend-resume sequence (was: Re: PCI PM: Restorestandard config registers of all devices early)

From: Linus Torvalds
Date: Tue Feb 03 2009 - 15:19:22 EST




On Tue, 3 Feb 2009, Ingo Molnar wrote:
>
> - the screaming-irq observation i had - do you consider that valid?:
>
> >> [ In theory this also solves screaming level-triggered irqs that
> >> advertise themselves as edge-triggered [due to firmware/BIOS bug -
> >> these do happen] and then keep spamming the system. ]
>
> I wanted to have a pretty much interchangeable flow method between edge
> and level triggered - so that the BIOS cannot screw us by enumerating an
> irq as edge-triggered while it's level-triggered.

Yes, if we can't be 100% sure it's really edge-triggered, I guess the mask
thing is really worth it. So maybe "handle_edge_irq()" is actually doing
everything right.

Of course, with MSI, we can fundamentally really be sure that it's
edge-triggered (since it's literally a packet on the PCI bus that
generates it), and that actually brings up another possibility: assuming
handle_edge_irq() is doing the correct "safe" thing, maybe the answer is
to just get rid of the MSI "mask()" operation as being unnecessary, and
catch it at that level.

NOTE! From a correctness standpoint I think this is all irrelevant. Even
if we have turned off the power of some device, the msi irq masking isn't
going to hurt (apart from _possibly_ causing a machine check, but that's
nothing new - architectures that enable machine checks on accesses to
non-responding PCI hardware have to handle those anyway).

So I wouldn't worry too much. I think this is interesting mostly from a
performance standpoint - MSI interrupts are supposed to be fast, and under
heavy interrupt load I could easily see something like

- cpu1: handles interrupt, has acked it, calls down to the handler

- the handler clears the original irq source, but another packet (or disk
completion) happens almost immediately

- cpu2 takes the second interrupt, but it's still IRQ_INPROGRESS, so it
masks.

- cpu1 gets back and unmasks etc and now really handles it because of
IRQ_PENDING.

Note how the mask/unmask were all just costly extra overhead over the PCI
bus. If we're talking something like high-performance 10Gbit ethernet (or
even maybe fast SSD disks), driver writers actually do count PCI cycles,
because a single PCI read can be several hundred ns, and if you take a
thousand interrupts per second, it does add up.

Of course, ethernet tends to do things like interrupt mitigation to avoid
this, but that has its own downsides (longer latencies) and isn't really
considered optimal in some RT environments (wall street trading kind of
things).

I really don't know how big an issue this all is. It probably isn't really
noticeable.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/