Re: [PATCH] arm64: kernel: Use a separate stack for irq interrupts.

From: James Morse
Date: Tue Sep 08 2015 - 12:48:20 EST


On 08/09/15 15:54, Jungseok Lee wrote:
> On Sep 7, 2015, at 11:36 PM, James Morse wrote:
>
> Hi James,
>
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index e16351819fed..d42371f3f5a1 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -190,10 +190,37 @@ tsk .req x28 // current thread_info
>> * Interrupt handling.
>> */
>> .macro irq_handler
>> - adrp x1, handle_arch_irq
>> - ldr x1, [x1, #:lo12:handle_arch_irq]
>> - mov x0, sp
>> + mrs x21, tpidr_el1
>> + adr_l x20, irq_sp
>> + add x20, x20, x21
>> +
>> + ldr x21, [x20]
>> + mov x20, sp
>> +
>> + mov x0, x21
>> + mov x1, x20
>> + bl irq_copy_thread_info
>> +
>> + /* test for recursive use of irq_sp */
>> + cbz w0, 1f
>> + mrs x30, elr_el1
>> + mov sp, x21
>> +
>> + /*
>> + * Create a fake stack frame to bump unwind_frame() onto the original
>> + * stack. This relies on x29 not being clobbered by kernel_entry().
>> + */
>> + push x29, x30
>
> It is the most challenging item to handle a frame pointer in this context.
>
> As I mentioned previously, a stack tracer on ftrace is not supported yet.
> How about decoupling "IRQ Stack feature" from "Stack Tracer Fix"?

Yes - I discovered today this was more complicated than I thought. I will
need to do some more reading.


>> +
>> +1: ldr_l x1, handle_arch_irq
>> + mov x0, x20
>> blr x1
>> +
>> + mov x0, x20
>> + mov x1, x21
>> + bl irq_copy_thread_info
>> + mov sp, x20
>> +
>> .endm
>>
>> .text
>> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
>> index 463fa2e7e34c..10b57a006da8 100644
>> --- a/arch/arm64/kernel/irq.c
>> +++ b/arch/arm64/kernel/irq.c
>> @@ -26,11 +26,14 @@
>> #include <linux/smp.h>
>> #include <linux/init.h>
>> #include <linux/irqchip.h>
>> +#include <linux/percpu.h>
>> #include <linux/seq_file.h>
>> #include <linux/ratelimit.h>
>>
>> unsigned long irq_err_count;
>>
>> +DEFINE_PER_CPU(unsigned long, irq_sp) = 0;
>> +
>> int arch_show_interrupts(struct seq_file *p, int prec)
>> {
>> #ifdef CONFIG_SMP
>> @@ -55,6 +58,10 @@ void __init init_IRQ(void)
>> irqchip_init();
>> if (!handle_arch_irq)
>> panic("No interrupt controller found.");
>> +
>> + /* Allocate an irq stack for the boot cpu */
>> + if (alloc_irq_stack(smp_processor_id()))
>> + panic("Failed to allocate irq stack for boot cpu.");
>> }
>>
>> #ifdef CONFIG_HOTPLUG_CPU
>> @@ -117,3 +124,48 @@ void migrate_irqs(void)
>> local_irq_restore(flags);
>> }
>> #endif /* CONFIG_HOTPLUG_CPU */
>> +
>> +/* Allocate an irq_stack for a cpu that is about to be brought up. */
>> +int alloc_irq_stack(unsigned int cpu)
>> +{
>> + struct page *irq_stack_page;
>> + union thread_union *irq_stack;
>> +
>> + /* reuse stack allocated previously */
>> + if (per_cpu(irq_sp, cpu))
>> + return 0;
>
> I'd like to avoid even this simple check since CPU hotplug could be heavily
> used for power management.

I don't think its a problem:
__cpu_up() contains a call to wait_for_completion_timeout() (which could
eventually end up in the scheduler), so I don't think it could ever be on a
'really hot' path.

For really frequent hotplug-like power management, cpu_suspend() makes use
of firmware support to power-off cores - from what I can see it doesn't use
__cpu_up().


>> +
>> + irq_stack_page = alloc_kmem_pages(THREADINFO_GFP, THREAD_SIZE_ORDER);
>> + if (!irq_stack_page) {
>> + pr_crit("CPU%u: failed to allocate irq stack for cpu %u\n",
>> + smp_processor_id(), cpu);
>> + return -ENOMEM;
>> + }
>> + irq_stack = page_address(irq_stack_page);
>> +
>> + per_cpu(irq_sp, cpu) = (unsigned long)irq_stack->stack
>> + + THREAD_START_SP;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Copy struct thread_info between two stacks, and update current->stack.
>> + * This is used when moving to the irq exception stack.
>> + * Changing current->stack is necessary so that non-arch thread_info writers
>> + * don't use the new thread_info->task->stack to find the old thread_info when
>> + * setting flags like TIF_NEED_RESCHED...
>> + */
>> +asmlinkage int irq_copy_thread_info(unsigned long dst_sp, unsigned long src_sp)
>> +{
>> + struct thread_info *src = get_thread_info(src_sp);
>> + struct thread_info *dst = get_thread_info(dst_sp);
>> +
>> + if (dst == src)
>> + return 0;
>> +
>> + *dst = *src;
>> + current->stack = dst;
>> +
>> + return 1;
>> +}
>
> Although this is 32-byte memcpy, sizeof(struct thread_info), this function is called
> twice to handle a single interrupt. Isn's it too expensive?
>
> This is a major difference between my approach and this patch. I think we should get
> some feedbacks on struct thread_info information management for v2 preparation.

Agreed.

The other difference, as Akashi Takahiro pointed out, is the behaviour of
object_is_on_stack(). What should this return for an object on an irq stack?


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/