Re: System freezes with high network activity

From: Jose Luis Salas
Date: Sat Dec 03 2011 - 17:05:18 EST


Hi,

attached is the output of the timer_list.

With the nohz option the system is stable too.

Other symptom of the problem is network drops performance to 50% ( 50 Mbps ).

Thanks again.

On Fri, Dec 2, 2011 at 11:28 PM, john stultz <johnstul@xxxxxxxxxx> wrote:
> On Fri, 2011-12-02 at 21:54 +0100, Jose Luis Salas wrote:
>> Hi,
>>
>> attached is the dmesg without the clocksource option.
>
> Thanks. After your done testing nohz=off, could you also
> send /proc/timer_list output from the system with no clocksource option,
> and no nohz options?
>
>
>> the nohz=off *seems* to avoid the problem, I'm testing with NFS and Iperf now.
>
> Sounds good. Let us know how the testing goes.
>
> thanks
> -john
>
>
root@tomberi:~# cat /proc/timer_list
Timer List Version: v0.5
HRTIMER_MAX_CLOCK_BASES: 2
now at 223634459785 nsecs

cpu: 0
clock 0:
.base: c20038f8
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1322949270758217542 nsecs
active timers:
clock 1:
.base: c2003924
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <c2003988>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/0
# expires at 223636000000-223636000000 nsecs [in 1540215 to 1540215 nsecs]
#1: <f5dfff44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishd/1718
# expires at 223646022402-223646072402 nsecs [in 11562617 to 11612617 nsecs]
#2: <f5dc9f44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishncsa/1700
# expires at 223667196113-223667246113 nsecs [in 32736328 to 32786328 nsecs]
#3: <f5e87b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, munin-node/1795
# expires at 223705422995-223707422992 nsecs [in 70963210 to 72963207 nsecs]
#4: <f5e03b88>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishd/1720
# expires at 223708346664-223709346662 nsecs [in 73886879 to 74886877 nsecs]
#5: <f67cfb88>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, php5-fpm/1181
# expires at 223733109437-223733239435 nsecs [in 98649652 to 98779650 nsecs]
#6: <f5df5f44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishd/1713
# expires at 224439802836-224439852836 nsecs [in 805343051 to 805393051 nsecs]
#7: <f5df9f44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishd/1715
# expires at 224439818760-224439868760 nsecs [in 805358975 to 805408975 nsecs]
#8: <f5dfdf44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishd/1717
# expires at 224439829376-224439879376 nsecs [in 805369591 to 805419591 nsecs]
#9: <f6575b88>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, apache2/1225
# expires at 224481716600-224484716598 nsecs [in 847256815 to 850256813 nsecs]
#10: <f6795b88>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rrdcached/1127
# expires at 224531231434-224532231432 nsecs [in 896771649 to 897771647 nsecs]
#11: <f64b3b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, apache2/1217
# expires at 224531270266-224532270264 nsecs [in 896810481 to 897810479 nsecs]
#12: <f5d31b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1453
# expires at 224535982121-224536982118 nsecs [in 901522336 to 902522333 nsecs]
#13: <f5d29b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1444
# expires at 224536149464-224537149453 nsecs [in 901689679 to 902689668 nsecs]
#14: <f5d35b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1455
# expires at 224536224056-224537224054 nsecs [in 901764271 to 902764269 nsecs]
#15: <f5d27b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1445
# expires at 224536320160-224537320158 nsecs [in 901860375 to 902860373 nsecs]
#16: <f5d1fb10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1452
# expires at 224536482474-224537482472 nsecs [in 902022689 to 903022687 nsecs]
#17: <f5d23b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1442
# expires at 224536541421-224537541419 nsecs [in 902081636 to 903081634 nsecs]
#18: <f642db10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1450
# expires at 224536581371-224537581369 nsecs [in 902121586 to 903121584 nsecs]
#19: <f5d37b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1456
# expires at 224536669093-224537669091 nsecs [in 902209308 to 903209306 nsecs]
#20: <f5d33b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1454
# expires at 224537194867-224538194865 nsecs [in 902735082 to 903735080 nsecs]
#21: <f5d2fb10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1448
# expires at 224537318628-224538318626 nsecs [in 902858843 to 903858841 nsecs]
#22: <f5d1db10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1451
# expires at 224537436243-224538436241 nsecs [in 902976458 to 903976456 nsecs]
#23: <f5d39b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1457
# expires at 224537494072-224538494070 nsecs [in 903034287 to 904034285 nsecs]
#24: <f6f5bf44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, powernowd/1486
# expires at 224563283840-224563333840 nsecs [in 928824055 to 928874055 nsecs]
#25: <f5d06030>, posix_timer_fn, S:01, hrtimer_start_range_ns, ntpd/1435
# expires at 224613224725-224613224725 nsecs [in 978764940 to 978764940 nsecs]
#26: <f6c2bb10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, init/1
# expires at 225879528033-225884528031 nsecs [in 2245068248 to 2250068246 nsecs]
#27: <f5dd9b88>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishd/1711
# expires at 226515274520-226518273519 nsecs [in 2880814735 to 2883813734 nsecs]
#28: <f66fdb88>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rpcbind/894
# expires at 227068145926-227098145923 nsecs [in 3433686141 to 3463686138 nsecs]
#29: <f5d45b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, nmbd/1615
# expires at 230422309607-230432309604 nsecs [in 6787849822 to 6797849819 nsecs]
#30: <f67efb10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, fail2ban-server/1411
# expires at 243848183633-243878183630 nsecs [in 20213723848 to 20243723845 nsecs]
#31: <f5ce1f44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, cron/1599
# expires at 270838974289-270839024289 nsecs [in 47204514504 to 47204564504 nsecs]
#32: <f5d6bb10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, smbd/1619
# expires at 276733246521-276793234139 nsecs [in 53098786736 to 53158774354 nsecs]
#33: <f6fdeb84>, it_real_fn, S:01, hrtimer_start, qmgr/1598
# expires at 368656511055-368656511055 nsecs [in 145022051270 to 145022051270 nsecs]
#34: <f5dfbf44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, varnishd/1716
# expires at 403407088880-403407138880 nsecs [in 179772629095 to 179772679095 nsecs]
#35: <f6f8e944>, it_real_fn, S:01, hrtimer_start, master/1589
# expires at 548656401260-548656401260 nsecs [in 325021941475 to 325021941475 nsecs]
#36: <f5dc5f44>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, smartd/1659
# expires at 1837849481974-1837849531974 nsecs [in 1614215022189 to 1614215072189 nsecs]
#37: <f6791dd4>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rrdcached/1142
# expires at 3625809309214-3625809359214 nsecs [in 3402174849429 to 3402174899429 nsecs]
#38: <f6f8e704>, it_real_fn, S:01, hrtimer_start, pickup/1597
# expires at 6215656362427-6215656362427 nsecs [in 5992021902642 to 5992021902642 nsecs]
#39: <f65f1b10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, smbd/1630
# expires at 10035509749904-10035609749904 nsecs [in 9811875290119 to 9811975290119 nsecs]
#40: <f64efb10>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rsyslogd/1148
# expires at 86426401619630-86426501619630 nsecs [in 86202767159845 to 86202867159845 nsecs]
.expires_next : 223636000000 nsecs
.hres_active : 1
.nr_events : 23600
.nr_retries : 2
.nr_hangs : 0
.max_hang_time : 0 nsecs
.nohz_mode : 2
.idle_tick : 223624000000 nsecs
.tick_stopped : 0
.idle_jiffies : 4294948201
.idle_calls : 207772
.idle_sleeps : 70046
.idle_entrytime : 223627956888 nsecs
.idle_waketime : 223624004081 nsecs
.idle_exittime : 223624050457 nsecs
.idle_sleeptime : 189861828990 nsecs
.last_jiffies : 4294948202
.next_jiffies : 4294948203
.idle_expires : 223628000000 nsecs
jiffies: 4294948204


Tick Device: mode: 1
Broadcast device
Clock Event Device: pit
max_delta_ns: 27461866
min_delta_ns: 12571
mult: 5124677
shift: 32
mode: 3
next_event: 9223372036854775807 nsecs
set_next_event: pit_next_event
set_mode: init_pit_timer
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000001
tick_broadcast_oneshot_mask: 00000000


Tick Device: mode: 1
Per CPU device: 0
Clock Event Device: lapic
max_delta_ns: 671068775
min_delta_ns: 1199
mult: 53688674
shift: 32
mode: 3
next_event: 223636000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt