[BUG] (alpha) kernel thread panics due to stale PTBR settings in 2.3.47

From: Dave Anderson (anderson@missioncriticallinux.com)
Date: Fri Feb 25 2000 - 14:46:16 EST


  Hello,

  Sorry for the wide distribution -- I'm not sure who this should be
directed
  to...

  We had been seeing panics in the alpha 2.3.41 stream where a kernel
thread,
  typically one of the nfsd daemons or kswapd, fault on the swap_info
swap_map
  address, which is a mapped (vmalloc'd) address. The problem was due
to
  the disconnect between the active_mm pgd value and what's actually
stored
  in the kernel task's ptbr value -- which is what gets loaded into the
PTBR
  register with each alpha context switch. Eventually kernel tasks will
find
  that the physical address stored in their thread_struct's ptbr become
stale,
  as the page that they reference is freed and re-used elsewhere.

  I note that in 2.3.47, the problem looked to have been addressed by
  the addition of the enter_lazy_tlb() call in schedule():

        if (!mm) {
                if (next->active_mm) BUG();
                next->active_mm = oldmm;
                atomic_inc(&oldmm->mm_count);
  +++ enter_lazy_tlb(oldmm, next, this_cpu);
        }

  Unfortunately the alpha enter_lazy_tlb() doesn't do anything:

static inline void enter_lazy_tlb(struct mm_struct *mm, struct
task_struct *tsk, unsigned cpu)
{
}

  If this is still a work in progress, excuse my interruption, but if
not,
  the alpha enter_lazy_tlb() should update the kernel task's ptbr with
the
  oldmm's pgd. Right?

  If you're interested in the details, here's the evidence from a 2.3.47
crash
  dump, in which kswapd panicked trying to reference a swap_map address
at
  fffffe0000000032:

crash> bt
PID: 2 TASK: fffffc001fd64000 CPU: 0 COMMAND: "kswapd"
 #0 [fffffc001fd67ad0] crash_save_current_state at fffffc0000336ffc
 #1 [fffffc001fd67ae0] panic at fffffc00003271f8
 #2 [fffffc001fd67b80] die_if_kernel at fffffc00003113d0
 #3 [fffffc001fd67bb0] do_page_fault at fffffc000031fecc
 #4 [fffffc001fd67bf0] entMM at fffffc000031055c
 EFRAME: fffffc001fd67c28 R24: 0000000000000cec
     R0: 0000000000000001 R25: 0000000000000007
     R1: fffffe0000000032 R26: fffffc0000350aec
<__delete_from_swap_cache+0x8c>
     R2: 0000000000000003 R27: fffffc00003514c0
     R3: 0000190000000000 R28: 0000000000000000
     R4: fffffc000052d888 HAE: 0000000000000000
     R5: 0000000000000200 TRAP_A0: fffffe0000000032
     R6: fffffc00006329d0 TRAP_A1: 0000000000000001
     R7: fffffc001fd67dc0 TRAP_A2: 0000000000000000
     R8: fffffc001fd64000 PS: 0000000000000000
    R19: 0000000000000400 PC: fffffc0000351544 <__swap_free+0x84>

    R20: fffffc00005317c0 GP: fffffc0000554030
    R21: 0000000000000000 R16: 0000190000000000
    R22: 0000000000000006 R17: 0000000000000001
    R23: fffffc0000345244 R18: 0000000000000059
 #5 [fffffc001fd67d10] __swap_free at fffffc0000351544
 #6 [fffffc001fd67d50] __delete_from_swap_cache at fffffc0000350aec
 #7 [fffffc001fd67d60] shrink_mmap at fffffc0000345460
 #8 [fffffc001fd67de0] do_try_to_free_pages at fffffc000034f87c
 #9 [fffffc001fd67e20] kswapd at fffffc000034fa2c
#10 [fffffc001fd67e60] kernel_thread at fffffc00003107f0

  In the case above, the kswapd's ptbr references physical address
  5bd8000, which has long since been freed and re-assigned to the
  kmem slab area:

crash> task fffffc001fd64000 | grep ptbr
    ptbr = 0x2dec,
crash> ptob 0x2dec
2dec: 5bd8000
crash> kmem -p 5bd8000
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffffc0000c212e0 5bd8000 0000000000000000 106 1 uptodate,slab

  At the same time as the panic above, the 8 nfsd daemons and the two
  idle tasks *all* contained ptbr values referencing physical addresses
that
  had been freed and re-used.

  Thanks,
     Dave Anderson
     anderson@missioncriticallinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 29 2000 - 21:00:13 EST