Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c

From: Yuchung Cheng
Date: Tue Sep 26 2017 - 20:13:33 EST


On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@xxxxxx> wrote:
>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@xxxxxx> wrote:
>> >
>> > > Hello.
>> > >
>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
>> > > warning shown below. Most of the time it is harmless, but rarely it just
>> > > causes either freeze or (I believe, this is related too) panic in
>> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
>> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
>> > > to capture it if possible.
>> > >
>> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
>> > > used to shape traffic with tc.
>> > >
>> > > Please note this regression was already reported as BZ [1] and as a letter to
>> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
>> > > only) me on my home router since v4.11 till v4.13.1 incl.
>> > >
>> > > Please advise on how to deal with it. I'll provide any additional info if
>> > > necessary, also ready to test patches if any.
>> > >
>> > > Thanks.
>> > >
>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=
>> >
>> > We're experiencing the same problems on some machines in our fleet.
>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>> > sometimes panics in tcp_sacktag_walk().
>> >
>> > Here is an example of a backtrace with the panic log:
>
> Hi Yuchung!
>
>> do you still see the panics if you disable RACK?
>> sysctl net.ipv4.tcp_recovery=0?
>
> No, we haven't seen any crash since that.
I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
take an empty skb :-( Do you have stack trace or any hint on which call
to tcp-sacktag_walk triggered the panic? internally at Google we never
see that.


>
>>
>> also have you experience any sack reneg? could you post the output of
>> ' nstat |grep -i TCP' thanks
>
> hostname TcpActiveOpens 2289680 0.0
> hostname TcpPassiveOpens 3592758 0.0
> hostname TcpAttemptFails 746910 0.0
> hostname TcpEstabResets 154988 0.0
> hostname TcpInSegs 16258678255 0.0
> hostname TcpOutSegs 46967011611 0.0
> hostname TcpRetransSegs 13724310 0.0
> hostname TcpInErrs 2 0.0
> hostname TcpOutRsts 9418798 0.0
> hostname TcpExtEmbryonicRsts 2303 0.0
> hostname TcpExtPruneCalled 90192 0.0
> hostname TcpExtOfoPruned 57274 0.0
> hostname TcpExtOutOfWindowIcmps 3 0.0
> hostname TcpExtTW 1164705 0.0
> hostname TcpExtTWRecycled 2 0.0
> hostname TcpExtPAWSEstab 159 0.0
> hostname TcpExtDelayedACKs 209207209 0.0
> hostname TcpExtDelayedACKLocked 508571 0.0
> hostname TcpExtDelayedACKLost 1713248 0.0
> hostname TcpExtListenOverflows 625 0.0
> hostname TcpExtListenDrops 625 0.0
> hostname TcpExtTCPHPHits 9341188489 0.0
> hostname TcpExtTCPPureAcks 1434646465 0.0
> hostname TcpExtTCPHPAcks 5733614672 0.0
> hostname TcpExtTCPSackRecovery 3261698 0.0
> hostname TcpExtTCPSACKReneging 12203 0.0
> hostname TcpExtTCPSACKReorder 433189 0.0
> hostname TcpExtTCPTSReorder 22694 0.0
> hostname TcpExtTCPFullUndo 45092 0.0
> hostname TcpExtTCPPartialUndo 22016 0.0
> hostname TcpExtTCPLossUndo 2150040 0.0
> hostname TcpExtTCPLostRetransmit 60119 0.0
> hostname TcpExtTCPSackFailures 2626782 0.0
> hostname TcpExtTCPLossFailures 182999 0.0
> hostname TcpExtTCPFastRetrans 4334275 0.0
> hostname TcpExtTCPSlowStartRetrans 3453348 0.0
> hostname TcpExtTCPTimeouts 1070997 0.0
> hostname TcpExtTCPLossProbes 2633545 0.0
> hostname TcpExtTCPLossProbeRecovery 941647 0.0
> hostname TcpExtTCPSackRecoveryFail 336302 0.0
> hostname TcpExtTCPRcvCollapsed 461354 0.0
> hostname TcpExtTCPAbortOnData 349196 0.0
> hostname TcpExtTCPAbortOnClose 3395 0.0
> hostname TcpExtTCPAbortOnTimeout 51201 0.0
> hostname TcpExtTCPMemoryPressures 2 0.0
> hostname TcpExtTCPSpuriousRTOs 2120503 0.0
> hostname TcpExtTCPSackShifted 2613736 0.0
> hostname TcpExtTCPSackMerged 21358743 0.0
> hostname TcpExtTCPSackShiftFallback 8769387 0.0
> hostname TcpExtTCPBacklogDrop 5 0.0
> hostname TcpExtTCPRetransFail 843 0.0
> hostname TcpExtTCPRcvCoalesce 949068035 0.0
> hostname TcpExtTCPOFOQueue 470118 0.0
> hostname TcpExtTCPOFODrop 9915 0.0
> hostname TcpExtTCPOFOMerge 9 0.0
> hostname TcpExtTCPChallengeACK 90 0.0
> hostname TcpExtTCPSYNChallenge 3 0.0
> hostname TcpExtTCPFastOpenActive 2089 0.0
> hostname TcpExtTCPSpuriousRtxHostQueues 896596 0.0
> hostname TcpExtTCPAutoCorking 547386735 0.0
> hostname TcpExtTCPFromZeroWindowAdv 28757 0.0
> hostname TcpExtTCPToZeroWindowAdv 28761 0.0
> hostname TcpExtTCPWantZeroWindowAdv 322431 0.0
> hostname TcpExtTCPSynRetrans 3026 0.0
> hostname TcpExtTCPOrigDataSent 40976870977 0.0
> hostname TcpExtTCPHystartTrainDetect 453920 0.0
> hostname TcpExtTCPHystartTrainCwnd 11586273 0.0
> hostname TcpExtTCPHystartDelayDetect 10943 0.0
> hostname TcpExtTCPHystartDelayCwnd 763554 0.0
> hostname TcpExtTCPACKSkippedPAWS 30 0.0
> hostname TcpExtTCPACKSkippedSeq 218 0.0
> hostname TcpExtTCPWinProbe 2408 0.0
> hostname TcpExtTCPKeepAlive 213768 0.0
> hostname TcpExtTCPMTUPFail 69 0.0
> hostname TcpExtTCPMTUPSuccess 8811 0.0
>
> Thanks!