Re: shm bugs revisited (2)

From: Ingo Molnar (mingo@chiara.csoma.elte.hu)
Date: Fri Jan 28 2000 - 13:33:38 EST


On Fri, 28 Jan 2000, Stephen C. Tweedie wrote:

> > does it happen with smp-2.3.41-A2 (attached)? Reverted the optimization of
> > reading the pagetable address from current->. (like Manfred suggested too)
>
> No effect on my init segvs. BTW, even #define'ing _PAGE_GLOBAL to
> zero doesn't help, and I still see the problem when booting with
> maxcpus=1, so it's definitely not a cross-cpu TLB invalidation problem
> (and the individual invlpg() calls you previously had in the kmap code
> should affect even global pages on a single CPU system, shouldn't
> they?)

yep they should, unfortunately. Ok, the TLB thing can be ruled out pretty
safely.

this really is a mystery. Does it happen if you use 'init=/bin/bash' or
'init=/bin/ash' as an init process?

One interesting thing to see would be if you could print out a trace of
pagefaults and compare it with a highmem-pagecache-disabled kernel (but
otherwise compiled identically). In fact it might be better if you just
did an 'if (nmi_watchdog) alloc_highmem(); else alloc_normal();' thing in
pagemap.h's page_cache_alloc(), this way you can trigger the bad behavior
via a single unrelated boot-commandline flag. (and this makes the
pagefault printouts easily comparable)

also could add a hack that 'touches' freshly faulted-in user pages in
do_page_fault()? Some hack like:

                int fault = handle_mm_fault(tsk, vma, address, write);
                if (fault < 0)
                        goto out_of_memory;
                if (!fault)
                        goto do_sigbus;
+ { volatile int dummy = *(*int)address; }

and then could you add a printk to this place that calculates the checksum
of this freshly faulted in page:

                {
                        int * ptr = (int *)(address & ~4095)
                        int csum = 0;

                        for (i=0; i<1024; i++)
                                csum += ptr[i];
                        printk("page %08x's (%p) csum: %08x.\n",
                                address, lookup_page(address));
                }

(this of course is extremely ugly and unsafe, but good debugging info)

bootmem.c clears all data pages at bootup, so this should be
deterministic. (Probably it would be better to not do it in fault.c but
memory.c so that the struct *page structure and page table entry can be
printed out as well.)

Also, can you somehow rule out the effect of IO, or could you make it
deterministic? You can make IO deterministic by adding some hack like this
to ll_rw_blk.c's make_request():

        if (work)
                queue_fn()
+ mdelay(1000); // 1 second delay after adding

this will of course make the boot process a bit slow, but the point is
clear.

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:22 EST