Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Uros Bizjak
Date: Wed Oct 18 2023 - 05:05:19 EST


On Wed, Oct 18, 2023 at 9:46 AM Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> On Tue, Oct 17, 2023 at 11:53 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, 17 Oct 2023 at 14:06, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
> > >
> > > But adding the attached patch on top of both patches boots OK.
> >
> > Funky.
> >
> > Mind adding a
> >
> > WARN_ON_ONCE(!active_mm);
> >
> > to there to give a nice backtrace for the odd NULL case.
>
> [ 4.907840] Call Trace:
> [ 4.908909] <TASK>
> [ 4.909858] ? __warn+0x7b/0x120
> [ 4.911108] ? begin_new_exec+0x90f/0xa30
> [ 4.912602] ? report_bug+0x164/0x190
> [ 4.913929] ? handle_bug+0x3c/0x70
> [ 4.915179] ? exc_invalid_op+0x17/0x70
> [ 4.916569] ? asm_exc_invalid_op+0x1a/0x20
> [ 4.917969] ? begin_new_exec+0x90f/0xa30
> [ 4.919303] ? begin_new_exec+0x3ce/0xa30
> [ 4.920667] ? load_elf_phdrs+0x67/0xb0
> [ 4.921935] load_elf_binary+0x2bb/0x1770
> [ 4.923262] ? __kernel_read+0x136/0x2d0
> [ 4.924563] bprm_execve+0x277/0x630
> [ 4.925703] kernel_execve+0x145/0x1a0
> [ 4.926890] call_usermodehelper_exec_async+0xcb/0x180
> [ 4.928408] ? __pfx_call_usermodehelper_exec_async+0x10/0x10
> [ 4.930515] ret_from_fork+0x2f/0x50
> [ 4.931894] ? __pfx_call_usermodehelper_exec_async+0x10/0x10
> [ 4.933941] ret_from_fork_asm+0x1b/0x30
> [ 4.935371] </TASK>
> [ 4.936212] ---[ end trace 0000000000000000 ]---
>
> >
> > That code *is* related to 'current', in how we do
> >
> > tsk = current;
> > ...
> > local_irq_disable();
> > active_mm = tsk->active_mm;
> > tsk->active_mm = mm;
> > tsk->mm = mm;
> > ...
> > activate_mm(active_mm, mm);
> > ...
> > mmdrop_lazy_tlb(active_mm);
> >
> > but I don't see how 'active_mm' could *poossibly* be validly NULL
> > here, and why caching 'current' would matter and change it.
>
> I have also added "__attribute__((optimize(0)))" to exec_mmap() to
> weed out compiler bugs. The result was the same oops in
> mmdrop_lazy_tlb.
>
> Also, when using WARN_ON instead of WARN_ON_ONCE, it triggers only
> once during the whole boot, with the above trace.
>
> Another observation: adding WARN_ON to the top of exec_mmap:
>
> WARN_ON(!current->active_mm);
> /* Notify parent that we're no longer interested in the old VM */
> tsk = current;
> old_mm = current->mm;
>
> also triggers WARN, suggesting that current does not have active_mm
> set on the entry to the function.

Solved.

All that is needed is to patch cpu_init() from
arch/x86/kernel/cpu/common.c with:

--cut here--
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b14fc8c1c953..61b6fcdf6937 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2228,7 +2232,7 @@ void cpu_init_exception_handling(void)
*/
void cpu_init(void)
{
- struct task_struct *cur = current;
+ struct task_struct *cur = this_cpu_read_stable(pcpu_hot.current_task);
int cpu = raw_smp_processor_id();

#ifdef CONFIG_NUMA
--cut here--

This is effectively the old get_current(). Since we declare and export

+DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
+ const_pcpu_hot) __attribute__((alias("pcpu_hot")));
+EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);

in the same file, and the "new" current represents just

return const_pcpu_hot.current_task;

GCC assumes and over-optimizes something and seemingly doesn't fully
initialize the

cur->active_mm = &init_mm;

below.

Have to run now, but this will be easy to fix.

Uros.