Re: [PATCH] acpi, pci, irq: account for early penalty assignment

From: Sinan Kaya
Date: Mon Feb 15 2016 - 21:18:44 EST


On 2/15/2016 7:26 PM, Rafael J. Wysocki wrote:
> On Mon, Feb 15, 2016 at 5:41 PM, Sinan Kaya <okaya@xxxxxxxxxxxxxx> wrote:
>> A crash has been observed when assigning penalty on x86 systems.
>>
>> It looks like this problem happens on x86 platforms with IOAPIC and an SCI
>> interrupt override in the ACPI table with interrupt number greater than
>> 16. (22 in this example)
>>
>> The bug has been introduced by "ACPI, PCI, irq: remove interrupt count
>> restriction" commit. The code was using kmalloc to resize the interrupt
>> list. In this use case, the set penalty call is coming from early phase
>> and the heap is not initialized yet.
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>> IP: [<ffffffff811e8b9d>] kmem_cache_alloc_trace+0xad/0x1c0
>> PGD 0
>> Oops: 0000 [#1] SMP
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc2Feb-3_RK #1
>> Hardware name: HP Superdome2 16s, BIOS Bundle: 007.006.000 SFW: 033.162.000
>> 10/30/2015
>> [<ffffffff813bc190>] acpi_irq_set_penalty+0x60/0x8e
>> [<ffffffff813bc1df>] acpi_irq_add_penalty+0x21/0x26
>> [<ffffffff813bc76d>] acpi_penalize_sci_irq+0x25/0x28
>> [<ffffffff81b8260d>] acpi_sci_ioapic_setup+0x68/0x78
>> [<ffffffff81b830fc>] acpi_boot_init+0x2cc/0x533
>> [<ffffffff810677c8>] ? set_pte_vaddr_pud+0x48/0x50
>> [<ffffffff81b828cf>] ? acpi_parse_x2apic+0x77/0x77
>> [<ffffffff81b82858>] ? dmi_ignore_irq0_timer_override+0x30/0x30
>> [<ffffffff81b77c1e>] setup_arch+0xc24/0xce9
>> [<ffffffff81b6e120>] ? early_idt_handler_array+0x120/0x120
>> [<ffffffff81b6ed94>] start_kernel+0xfc/0x506
>> [<ffffffff81b6e120>] ? early_idt_handler_array+0x120/0x120
>> [<ffffffff81b6e120>] ? early_idt_handler_array+0x120/0x120
>> [<ffffffff81b6e5ee>] x86_64_start_reservations+0x2a/0x2c
>> [<ffffffff81b6e73c>] x86_64_start_kernel+0x14c/0x16f
>>
>> Besides from the use case above, there is one more situation where
>> set_penalty is being called from the init context like. There is support
>> for setting the penalty through kernel command line.
>>
>> Adding support to be called from early context for limited number of
>> interrupts.
>>
>> Signed-off-by: Sinan Kaya <okaya@xxxxxxxxxxxxxx>
>
> This looks somewhat hackish to me to be honest.
I know.

This is the reason why I wanted to discuss this patch on the list. I hate the
fact that I broke something unintentionally (who would think that kzalloc
wouldn't work). I'm trying to restore the functionality and I don't like
what I see with the early_memxxx family of functions.

They all require a fixed size memory allocation from the system memory. Not a
general purpose early_kzalloc function for instance.

>
>> ---
>> drivers/acpi/pci_link.c | 32 ++++++++++++++++++++++++++++----
>> 1 file changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
>> index fa28635..24b69e1 100644
>> --- a/drivers/acpi/pci_link.c
>> +++ b/drivers/acpi/pci_link.c
>> @@ -47,6 +47,7 @@ ACPI_MODULE_NAME("pci_link");
>> #define ACPI_PCI_LINK_FILE_INFO "info"
>> #define ACPI_PCI_LINK_FILE_STATUS "state"
>> #define ACPI_PCI_LINK_MAX_POSSIBLE 16
>> +#define ACPI_PCI_LINK_MAX_EARLY_IRQINFO 1024
>>
>> static int acpi_pci_link_add(struct acpi_device *device,
>> const struct acpi_device_id *not_used);
>> @@ -470,9 +471,13 @@ struct irq_penalty_info {
>> int irq;
>> int penalty;
>> struct list_head node;
>> + bool early;
>
> Where is this field used ->
>
got rid of it.

>> };
>>
>> static LIST_HEAD(acpi_irq_penalty_list);
>> +static int early_init_done;
>> +static struct irq_penalty_info early_irq_infos[ACPI_PCI_LINK_MAX_EARLY_IRQINFO];
>> +static int early_irq_info_counter;
>>
>> static int acpi_irq_get_penalty(int irq)
>> {
>> @@ -507,10 +512,20 @@ static int acpi_irq_set_penalty(int irq, int new_penalty)
>> }
>> }
>>
>> - /* nope, let's allocate a slot for this IRQ */
>> - irq_info = kzalloc(sizeof(*irq_info), GFP_KERNEL);
>> - if (!irq_info)
>> - return -ENOMEM;
>> + if (!early_init_done) {
>> + if (early_irq_info_counter < ARRAY_SIZE(early_irq_infos)) {
>> + irq_info = &early_irq_infos[early_irq_info_counter];
>> + irq_info->early = true;
>
> -> except for being set here?
thanks, removed.

>
>> + early_irq_info_counter++;
>> + } else {
>> + return -ENOMEM;
>> + }
>> + } else {
>> + /* nope, let's allocate a slot for this IRQ */
>> + irq_info = kzalloc(sizeof(*irq_info), GFP_KERNEL);
>> + if (!irq_info)
>> + return -ENOMEM;
>> + }
>>
>> irq_info->irq = irq;
>> irq_info->penalty = new_penalty;
>> @@ -968,3 +983,12 @@ void __init acpi_pci_link_init(void)
>> register_syscore_ops(&irqrouter_syscore_ops);
>> acpi_scan_add_handler(&pci_link_handler);
>> }
>> +
>> +
>> +static int acpi_pci_link_subsys_init(void)
>> +{
>> + early_init_done = true;
>
> Why do you need yet another subsys_initcall do set this? Can't it be
> set in, say, acpi_init()?
>
> And isn't there any existing way to check that? Like checking
> acpi_gbl_permanent_mmap or something?

I'll use acpi_gbl_permanent_mmap.

>
>> + return 0;
>> +}
>> +
>> +subsys_initcall(acpi_pci_link_subsys_init)
>> --
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


--
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project