Re: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message

From: Thomas Gleixner
Date: Fri Oct 23 2020 - 17:28:10 EST


On Fri, Oct 23 2020 at 11:10, David Woodhouse wrote:
> On 22 October 2020 22:43:52 BST, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> It makes the callers slightly more readable, not having to cast to uint32_t* from the struct.
>
> I did ponder defining a new struct with bitfields named along the
> lines of 'msi_addr_bits_19_to_4', but that seemed like overkill.

I did something like this in the meantime, because all of this just
sucks.

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/apic

Hot of the press and completely untested.

>>> + /*
>>> + * They're in a bit of a random order for historical reasons, but
>>> + * the IO/APIC is just a device for turning interrupt lines into
>>> + * MSIs, and various bits of the MSI addr/data are just swizzled
>>> + * into/from the bits of Redirection Table Entry.
>>> + */
>>> + entry[0] &= 0xfffff000;
>>> + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
>>> + MSI_DATA_VECTOR_MASK));
>>> + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
>>> +
>>> + entry[1] &= 0xffff;
>>> + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
>>
>>Sorry, but this is unreviewable gunk.
>
> Crap. Sure, want to look at the I/OAPIC and MSI documentation and
> observe that it's just shifting bits around and "these bits go there,
> those bits go here..." but there's no magic which will make that go
> away. At some point you end up swizzling bits around with seemingly
> random bitmasks and shifts which you have to have worked out from the
> docs.

Yes, we can't avoid the bit swizzling at all. But it can be made more
readable.

> Now that I killed off the various IOMMU bits which also horrifically
> touch the RTE directly, perhaps there is more scope for faffing around
> with it differently, but it won't really change much. It's just
> urinating into the atmospheric disturbance.

Well, you can look at it this way, but I surely have better things to do
than taking pencil and paper and drawing up mappings when reading
through that code or a patch against it.

>> Aside of that it works magically because polarity,trigger and mask bit
>> have been set up before. But of course a comment about this is
>> completely overrated.
>
> Well yes, there's a lot about the I/OAPIC code which is a bit horrid
> but this patch is doing just one thing: making the bits get from
> e.g. cfg->dest_apicid and cfg->vector into the RTE in a generic
> fashion which does the right thing for IR too. Other cleanups are
> somewhat orthogonal, but yes it's a good spot that one of these was
> somewhat redundant in the first place. We could fix that up in a
> separate patch which comes first perhaps.

Yes, that code is horrid, but adding a comment to that effect when
changing it is not asked too much, right?

> If we're starting to clean up I/OAPIC code I'd like to take a long
> hard look at the level ack code paths, and the fact that we have
> separate ack_apic_irq and apic_ack_irq functions (or something like
> that; on phone now) which do different things. Perhaps the order (or
> the capitals on APIC which one of them has) makes it sane and
> meaningful for them both to exist and do different things?

The naming conventions are definitely not sane.

> I also note the Hyper-V "remapping" takes the IR code path and I'm not
> sure that's OK. Because of the complete lack of overall documentation
> on what it's all doing and why.

I stared into that today as well. There is a fundamental difference
between real hardware remapping and this.

Especially the affinity setting mode changes with remapping and hyper-v
goes into the remap path, but that hyperv driver does not issue
"irq_set_status_flags(virq, IRQ_MOVE_PCNTXT)", which means that you
can't change the affinity of the I/O-APIC interrupts in that hyperv mode
at all.

If that flag is not set then affinity changes are done in actual
interrupt context in ioapic_ack_level() which is only used for the
non-IR chip ...

I'm still wrapping my head around getting rid of this thing completely
because now it's just a subset of your KVM case with the only
restriction that I/O-APIC cannot be affined to any CPU with a APIC id
greater than 255.

Thanks,

tglx