Re: 2.4.17 SMP hangs ..

From: Andre Hedrick (andre@linux-ide.org)
Date: Thu Nov 21 2002 - 01:33:05 EST


The problem may be associated w/ the interrupt mapping over the bridge
riser. I have seen this before and can reproduce it regardless of the
system.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Wed, 20 Nov 2002, Manish Lachwani wrote:

> I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
> drives in parallel. However, the hangs only occur when the I/O rate from
> vmstat is high:
>
>
> bash# vmstat 1
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 1 0 0 0 815800 3656 102612 0 0 163 1086 129 961 6 29
> 65
> 0 0 0 0 813096 3656 102992 0 0 363 0 265 928 44 26
> 31
> 0 0 0 0 813172 3656 102996 0 0 4 0 180 99 0 1
> 99
> 5 2 1 0 809476 3656 103044 0 0 44 0 206 274 18 11
> 71
> 0 12 0 0 802040 3656 103072 0 0 18 2 296 745 47 33
> 21
> 0 8 0 0 800696 3660 103152 0 0 34 244 349 501 10 5
> 85
> 0 8 0 0 800696 3660 103152 0 0 0 0 187 106 1 0
> 99
> 2 5 1 0 795864 3660 103200 0 0 13 45 278 717 14 29
> 57
> 1 0 0 0 795592 3660 103248 0 0 107 46 502 663 7 7
> 86
> 1 0 0 0 795592 3660 103248 0 0 0 0 184 95 0 1
> 99
> 1 0 0 0 795596 3660 103244 0 0 4 1 191 139 0 1
> 99
> 6 5 3 0 756932 3660 140192 0 0 194 36721 464 232 7 87
> 6
> 11 0 1 0 723276 3660 173784 0 0 0 33718 681 1560 0 39
> 61
>
> At that point the system hangs. The system consists of a 4-port and a 8-port
> 3ware controllers on an Intel 21154 bridge with 12 maxtor drives. When the
> IO rate is lower:
>
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 0 0 0 0 811888 3828 103476 0 0 0 0 172 91 0 1
> 99
> 0 0 0 0 811888 3828 103476 0 0 0 0 184 93 0 1
> 99
> 0 0 0 0 811888 3828 103476 0 0 0 0 171 153 0 1
> 99
> 0 0 0 0 811852 3828 103512 0 0 0 170 171 85 0 1
> 99
> 0 0 0 0 811852 3828 103512 0 0 0 0 174 95 1 0
> 99
> 0 0 0 0 811852 3828 103512 0 0 0 0 173 147 0 1
> 99
> 0 0 0 0 811852 3828 103512 0 0 0 0 176 98 0 1
> 99
> 0 0 0 0 811852 3828 103512 0 0 0 1 175 96 0 1
> 99
> 1 0 0 0 811852 3828 103512 0 0 0 0 173 110 1 3
> 96
> 1 0 0 0 811704 3828 103516 0 0 0 0 179 145 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 194 121 1 1
> 98
> 0 0 0 0 811688 3828 103516 0 0 0 19 186 119 1 1
> 98
> 0 0 0 0 811688 3828 103516 0 0 0 0 174 149 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 172 86 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 175 96 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 1 171 149 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 173 91 0 1
> 99
> 0 0 1 0 811688 3828 103516 0 0 0 0 173 86 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 179 154 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 1 174 91 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 179 98 1 0
> 99
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 1 0 0 0 811688 3828 103516 0 0 0 0 174 100 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 184 137 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 1 171 89 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 173 95 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 176 150 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 175 121 3 0
> 97
> 0 0 0 0 811688 3828 103516 0 0 0 15 171 116 3 0
> 97
> 0 0 0 0 811688 3828 103516 0 0 0 0 179 149 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 173 88 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 171 88 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 15 172 149 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 171 88 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 174 98 0 1
> 99
> 1 0 0 0 811688 3828 103516 0 0 0 0 178 127 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 1 179 153 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 171 88 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 173 88 0 1
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 175 158 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 1 175 96 1 0
> 99
> 0 0 0 0 811688 3828 103516 0 0 0 0 171 90 0 1
> 99
> 0 0 0 0 811924 3828 103516 0 0 0 0 173 204 1 5
> 94
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 1 0 0 0 811924 3828 103516 0 0 0 0 174 90 1 0
> 99
> 0 0 0 0 811924 3828 103516 0 0 0 24 182 93 1 0
> 99
> 1 0 0 0 811924 3828 103516 0 0 0 0 173 142 0 1
> 99
> 0 0 0 0 811924 3828 103516 0 0 0 0 171 93 1 0
> 99
> 0 0 0 0 811924 3828 103516 0 0 0 0 175 94 0 1
> 99
> 1 0 0 0 811924 3828 103516 0 0 0 2 173 92 1 0
> 99
> 0 0 0 0 811924 3828 103516 0 0 0 0 175 151 1 0
> 99
> 0 0 0 0 811924 3828 103516 0 0 0 0 173 87 1 0
> 99
> 0 0 1 0 811924 3828 103516 0 0 0 0 173 89 1 0
> 99
> 0 0 0 0 811924 3828 103516 0 0 0 1 171 142 0 1
> 99
>
> bash# vmstat 1
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 5 0 1 0 815896 3656 102228 0 0 156 1076 129 917 6 34
> 60
> 2 0 1 0 812860 3656 102992 0 0 737 0 252 1015 36 40
> 25
> 0 0 0 0 813104 3656 102992 0 0 0 0 187 129 2 1
> 97
> 0 0 0 0 813180 3656 102996 0 0 4 0 168 93 0 1
> 99
> 6 8 1 0 802392 3656 103220 0 0 222 0 251 757 62 37
> 1
> 0 10 0 0 804588 3684 103252 0 0 133 40 214 871 50 43
> 8
> 2 8 1 0 804324 3712 103272 0 0 162 40 222 817 43 40
> 18
> 9 4 0 0 805400 3712 103288 0 0 4 0 196 276 13 8
> 79
> 9 0 1 0 804724 3812 103348 0 0 644 96 297 1255 41 59
> 0
> 14 0 1 0 804144 3816 103344 0 0 0 0 167 888 56 44
> 0
> 10 0 1 0 804448 3820 103340 0 0 0 0 171 873 57 43
> 0
> 0 1 0 0 812288 3828 103436 0 0 97 0 222 1051 53 42
> 5
> 0 0 0 0 811868 3828 103476 0 0 84 0 395 429 0 3
> 97
> 1 0 0 0 811868 3828 103476 0 0 0 0 167 91 0 1
> 99
> 1 0 0 0 811868 3828 103476 0 0 0 0 167 141 1 0
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 177 100 1 0
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 171 96 0 1
> 99
> 1 0 0 0 811868 3828 103476 0 0 0 0 171 132 1 0
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 170 100 1 0
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 167 90 1 0
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 171 93 1 0
> 99
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 0 0 0 0 811868 3828 103476 0 0 0 0 170 148 1 0
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 176 83 0 1
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 167 86 0 1
> 99
> 0 0 0 0 811868 3828 103476 0 0 0 0 169 146 0 1
> 99
> 0 0 0 0 811832 3828 103512 0 0 0 173 168 84 1 0
> 99
> 0 0 0 0 811832 3828 103512 0 0 0 0 178 104 1 0
> 99
> 0 0 0 0 811832 3828 103512 0 0 0 0 167 141 0 1
> 99
> 0 0 0 0 811832 3828 103512 0 0 0 0 173 92 1 2
> 97
> 0 0 0 0 811832 3828 103512 0 0 0 1 169 89 0 2
> 98
> 1 0 0 0 811832 3828 103512 0 0 0 0 173 138 2 3
> 95
> 1 0 0 0 811684 3828 103516 0 0 0 0 183 132 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 178 103 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 19 180 108 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 169 148 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 169 88 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 171 94 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 1 170 147 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 168 94 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 171 95 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 173 175 3 0
> 97
> 0 0 0 0 811668 3828 103516 0 0 0 15 169 89 1 0
> 99
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 0 0 0 0 811668 3828 103516 0 0 0 0 168 87 1 0
> 99
> 1 0 0 0 811668 3828 103516 0 0 0 0 178 156 2 1
> 97
> 0 0 0 0 811668 3828 103516 0 0 0 0 175 122 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 15 167 86 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 171 92 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 177 151 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 169 93 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 1 169 86 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 167 146 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 172 95 0 1
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 167 84 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 1 178 189 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 171 89 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 0 171 89 0 1
> 99
> 1 0 0 0 811668 3828 103516 0 0 0 0 171 124 1 0
> 99
> 0 0 0 0 811668 3828 103516 0 0 0 1 183 127 0 1
> 99
>
> bash# df
> Filesystem 1k-blocks Used Available Use% Mounted on
> /dev/ram1 125011 85304 33154 73% /
> coreserver:/var/cores
> 74858752 475144 70580928 1%
> /var/.automount/cores
> /dev/sdj1 7827172 24 7429540 1% /disks/disk10.1
> /dev/sdi1 7827172 24 7429540 1% /disks/disk9.1
> /dev/sdl1 7827172 24 7429540 1% /disks/disk12.1
> /dev/sdk1 7827172 24 7429540 1% /disks/disk11.1
> /dev/sdh1 7827172 24 7429540 1% /disks/disk8.1
> /dev/sdf1 7827172 24 7429540 1% /disks/disk6.1
> /dev/sde1 7827172 24 7429540 1% /disks/disk5.1
> /dev/sdb1 7827172 24 7429540 1% /disks/disk2.1
> /dev/sdg1 7827172 24 7429540 1% /disks/disk7.1
> /dev/sda1 7827172 24 7429540 1% /disks/disk1.1
> /dev/sdd1 7827172 24 7429540 1% /disks/disk4.1
> /dev/sdc1 7827172 24 7429540 1% /disks/disk3.1
> bash#
>
>
> bash# vmstat 1
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 0 0 0 0 755440 3924 105192 0 0 83 8108 353 655 5 25
> 69
> 0 0 1 0 755440 3924 105192 0 0 0 0 167 70 0 0
> 100
> 0 0 0 0 755440 3924 105192 0 0 0 24 159 82 0 1
> 99
> 0 0 0 0 755052 3924 105192 0 0 0 0 162 133 0 0
> 100
> 0 0 0 0 755052 3924 105192 0 0 0 0 156 68 0 1
> 99
> 0 0 0 0 755052 3924 105192 0 0 0 0 155 59 0 0
> 100
> 0 0 0 0 755052 3924 105192 0 0 0 1 157 124 0 0
> 100
> 0 0 0 0 755052 3924 105192 0 0 0 0 174 82 0 1
> 99
> 0 0 0 0 755052 3924 105192 0 0 0 0 161 73 0 0
> 100
> 1 0 0 0 755052 3924 105192 0 0 0 0 159 101 0 0
> 100
> 0 0 0 0 755052 3924 105192 0 0 0 1 155 92 0 0
> 100
> 0 0 0 0 755052 3924 105192 0 0 0 0 155 57 0 0
> 100
> 0 0 0 0 755052 3924 105192 0 0 0 0 155 67 1 0
> 99
> 0 0 0 0 755052 3924 105192 0 0 0 0 155 112 0 0
> 100
> 0 0 0 0 754440 3924 105192 0 0 0 6 157 67 0 0
> 100
> 0 0 0 0 754440 3924 105192 0 0 0 0 155 62 0 0
> 100
> 0 0 0 0 754440 3924 105192 0 0 0 0 157 128 0 0
> 100
> 0 0 0 0 754440 3924 105192 0 0 0 0 160 66 0 1
> 99
> 0 0 0 0 754440 3924 105192 0 0 0 1 157 72 0 0
> 100
> 0 0 0 0 754440 3924 105192 0 0 0 0 155 117 0 0
> 100
> 0 0 0 0 754440 3924 105192 0 0 0 0 155 71 0 0
> 100
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us sy
> id
> 0 0 0 0 754440 3924 105192 0 0 0 0 157 66 0 0
> 100
> 1 0 0 0 754440 3924 105192 0 0 0 1 166 114 0 1
> 99
> 0 0 0 0 754440 3924 105192 0 0 0 0 158 93 0 0
> 100
>
> there are no hangs. On startup, I am doing parallel mje2fs accross all the
> drives. 3ware 4-port controller shows that LEDs are ON. I have tried
> replacing the controllers but that also does not help ...
>
> Thanks
> Manish
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Nov 23 2002 - 22:00:35 EST