Re: [BUG]linux-2.4.26 Quad-Opteron: panic when init scsi

From: Alexander v. Buelow
Date: Thu Apr 29 2004 - 04:25:22 EST


Hi,

> It looks like you compiled this on a SuSE 9.0/64bit system, right?
> I presume this means the SuSE 2.4.21 smp kernel worked on the same
> box, right?

Yes, that's right.

> Can you perhaps try to narrow down where it broke between (mainline)
> 2.4.21 and 2.4.26 ?

We tried: 2.4.21 -> ok
2.4.22 -> ok
2.4.23 -> not ok !!

Then we tried to find the error and I changed in mm/numa.c:

--- linux-2.4.26/mm/numa.c 2001-09-18 01:15:02.000000000 +0200
+++ linux-2.4.26-recoms/mm/numa.c 2004-04-27 18:25:28.000000000
+0200
@@ -105,6 +105,11 @@
return NULL;
#ifdef CONFIG_NUMA
temp = NODE_DATA(numa_node_id());
+ if((gfp_mask & GFP_DMA) == GFP_DMA)
+ {
+ printk(KERN_WARNING "RECOMS: Umleitung DMA auf CPU 0\n");
+ temp = NODE_DATA(0);
+ }
#else
spin_lock_irqsave(&node_lock, flags);
if (!next) next = pgdat_list;

And in mm/page_alloc.c I added in void __init free_area_init_core(..):

*gmap = pgdat->node_mem_map = lmem_map;
pgdat->node_size = totalpages;
pgdat->node_start_paddr = zone_start_paddr;
pgdat->node_start_mapnr = (lmem_map - mem_map);
pgdat->nr_zones = 0;
+// Alex:
+pgdat->node_id = nid;

offset = lmem_map - mem_map;
for (j = 0; j < MAX_NR_ZONES; j++) {

This seemed to work, the scsi error didn't occur any more.

But then we ran into various other problems: Sometimes we got MCEs (GART
TLB) and different kernel errors like page fault, NULL pointer
dereference and general protection fault. These errors are not
reproducible but occur frequently.

I hope you will find a solution!

Regards,

Jürgen and Alexander

--
-----------------------------------------------------------------------
Dipl.-Ing. Alexander von Buelow http://www.rcs.ei.tum.de
Institute for Real-Time Computersystems (RCS) fon +49/89-289-23556
Technische Universitaet Muenchen, D-80290 Muenchen fax +49/89-289-23555
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/