Revision history [back]
Intermittent Network Slowness/Complete loss of Connectivity
I have a network stood up with vSphere. Over the past couple of years I have been experiencing occasional drops in network latency, or a complete loss of connectivity between servers. The interesting part here is that it's always the same servers that seem to have the issue. (i.e. I have a script that I wrote to detect network instability between one host many others, grepping through months of that data, several servers have upwards of 200 detected events, while others have 0).
I have been trying desperately to determine the source of these network issues. Recently, I wrote a script that would fire at the end of a cron that I have that detects the network events. The script tests ssh latency between one host and many, but before it starts the latency test, I start a packet trace using tshark and filtering on traffic coming from or going to the host that I'm testing and coming from or going to the host that I'm running the script from. It also filters on traffic on port 22 as I use ssh commands with the latency test.
Any help on this would be greatly appreciated. I'm a software engineer, not a network guy, just have enough knowledge to get this far.
Here is the tshark output I collected when a server was experiencing network degradation:
10 0.574704304 host -> client TCP 74 56104 > ssh [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=443780045 TSecr=0 WS=128
33 0.622798903 client -> host TCP 74 ssh > 56104 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=443784521 TSecr=443780045 WS=128
34 0.622823233 host -> client TCP 66 56104 > ssh [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=443780093 TSecr=443784521
35 0.622872658 host -> client SSH 87 Client Protocol: SSH-2.0-OpenSSH_7.4\r
36 0.671884861 client -> host TCP 66 ssh > 56104 [ACK] Seq=1 Ack=22 Win=29056 Len=0 TSval=443784570 TSecr=443780093
530 16.713179167 client -> host SSHv2 527 Server: Key Exchange Init
531 16.713203567 host -> client TCP 66 56104 > ssh [ACK] Seq=22 Ack=462 Win=30336 Len=0 TSval=443796184 TSecr=443800624
532 16.714881821 host -> client SSHv2 1314 Client: Diffie-Hellman Key Exchange Init
533 16.715060860 client -> host TCP 66 ssh > 56104 [ACK] Seq=462 Ack=1270 Win=31872 Len=0 TSval=443800639 TSecr=443796185
534 16.717441079 client -> host SSHv2 474 Server: New Keys
535 16.718751241 host -> client SSHv2 146 Client: New Keys
536 16.719009654 client -> host TCP 130 [TCP segment of a reassembled PDU]
537 16.719097445 host -> client TCP 146 56104 > ssh [PSH, ACK] Seq=1350 Ack=934 Win=31360 Len=80 TSval=443796189 TSecr=443800643[Reassembly error, protocol TCP: New fragment past old data limits]
538 16.724163039 client -> host TCP 1514 [TCP segment of a reassembled PDU]
539 16.724173258 client -> host TCP 74 [TCP segment of a reassembled PDU]
540 16.724179307 host -> client TCP 66 56104 > ssh [ACK] Seq=1430 Ack=2390 Win=34304 Len=0 TSval=443796195 TSecr=443800648
541 16.724268787 host -> client TCP 450 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796195 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
547 16.957149181 host -> client TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796428 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
551 17.190125831 host -> client TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796661 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
565 17.657163520 host -> client TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443797128 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
566 17.657491865 client -> host TCP 78 [TCP Previous segment not captured] ssh > 56104 [ACK] Seq=2742 Ack=1814 Win=34432 Len=0 TSval=443801581 TSecr=443797128 SLE=1430 SRE=1814
567 17.704925741 client -> host TCP 418 [TCP Retransmission] [TCP segment of a reassembled PDU]
568 17.707069100 host -> client TCP 738 56104 > ssh [PSH, ACK] Seq=1814 Ack=2742 Win=37248 Len=672 TSval=443797177 TSecr=443801629[Reassembly error, protocol TCP: New fragment past old data limits]
569 17.707270598 client -> host TCP 66 ssh > 56104 [ACK] Seq=2742 Ack=2486 Win=36864 Len=0 TSval=443801631 TSecr=443797177
597 18.685946777 client -> host TCP 642 [TCP segment of a reassembled PDU]
598 18.725132574 host -> client TCP 66 56104 > ssh [ACK] Seq=2486 Ack=3318 Win=40064 Len=0 TSval=443798196 TSecr=443802610
599 18.747191756 host -> client TCP 146 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798218 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
604 18.973126314 host -> client TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798444 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
608 19.199141784 host -> client TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798670 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
626 19.652156589 host -> client TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443799123 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
627 19.652531210 client -> host TCP 66 ssh > 56104 [ACK] Seq=3318 Ack=2566 Win=36864 Len=0 TSval=443803576 TSecr=443799123
628 19.652554872 client -> host TCP 130 [TCP segment of a reassembled PDU]
629 19.652563170 host -> client TCP 66 56104 > ssh [ACK] Seq=2566 Ack=3382 Win=40064 Len=0 TSval=443799123 TSecr=443803576
630 19.652762859 host -> client TCP 210 56104 > ssh [PSH, ACK] Seq=2566 Ack=3382 Win=40064 Len=144 TSval=443799123 TSecr=443803576[Reassembly error, protocol TCP: New fragment past old data limits]
631 19.657257053 client -> host TCP 178 [TCP segment of a reassembled PDU]
632 19.657447700 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2710 Ack=3494 Win=40064 Len=64 TSval=443799128 TSecr=443803581[Reassembly error, protocol TCP: New fragment past old data limits]
633 19.677567903 client -> host TCP 130 [TCP segment of a reassembled PDU]
634 19.677893785 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2774 Ack=3558 Win=40064 Len=64 TSval=443799148 TSecr=443803601[Reassembly error, protocol TCP: New fragment past old data limits]
635 19.678241891 client -> host TCP 130 [TCP segment of a reassembled PDU]
636 19.678488520 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2838 Ack=3622 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
637 19.678689498 client -> host TCP 130 [TCP segment of a reassembled PDU]
638 19.678894272 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2902 Ack=3686 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
639 19.679095118 client -> host TCP 130 [TCP segment of a reassembled PDU]
640 19.679313200 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2966 Ack=3750 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
641 19.679619877 client -> host TCP 130 [TCP segment of a reassembled PDU]
642 19.679949334 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3030 Ack=3814 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
643 19.680344884 client -> host TCP 130 [TCP segment of a reassembled PDU]
644 19.680551898 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3094 Ack=3878 Win=40064 Len=64 TSval=443799151 TSecr=443803604[Reassembly error, protocol TCP: New fragment past old data limits]
645 19.680901329 client -> host TCP 130 [TCP segment of a reassembled PDU]
646 19.681128070 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3158 Ack=3942 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
647 19.681431225 client -> host TCP 130 [TCP segment of a reassembled PDU]
648 19.681629690 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3222 Ack=4006 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
649 19.681870571 client -> host TCP 130 [TCP segment of a reassembled PDU]
650 19.682068642 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3286 Ack=4070 Win=40064 Len=64 TSval=443799152 TSecr=443803606[Reassembly error, protocol TCP: New fragment past old data limits]
Intermittent Network Slowness/Complete loss of Connectivity
I have a network stood up with vSphere. Over the past couple of years I have been experiencing occasional drops in network latency, or a complete loss of connectivity between servers. The interesting part here is that it's always the same servers that seem to have the issue. (i.e. I have a script that I wrote to detect network instability between one host many others, grepping through months of that data, several servers have upwards of 200 detected events, while others have 0).
I have been trying desperately to determine the source of these network issues. Recently, I wrote a script that would fire at the end of a cron that I have that detects the network events. The script tests ssh latency between one host and many, but before it starts the latency test, I start a packet trace using tshark and filtering on traffic coming from or going to the host that I'm testing and coming from or going to the host that I'm running the script from. It also filters on traffic on port 22 as I use ssh commands with the latency test.
Any help on this would be greatly appreciated. I'm a software engineer, not a network guy, just have enough knowledge to get this far.
Here is the tshark output a stack trace I collected when a server was experiencing network degradation:
10 0.574704304 host -> client TCP 74 56104 > ssh [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=443780045 TSecr=0 WS=128
33 0.622798903 client -> host TCP 74 ssh > 56104 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=443784521 TSecr=443780045 WS=128
34 0.622823233 host -> client TCP 66 56104 > ssh [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=443780093 TSecr=443784521
35 0.622872658 host -> client SSH 87 Client Protocol: SSH-2.0-OpenSSH_7.4\r
36 0.671884861 client -> host TCP 66 ssh > 56104 [ACK] Seq=1 Ack=22 Win=29056 Len=0 TSval=443784570 TSecr=443780093
530 16.713179167 client -> host SSHv2 527 Server: Key Exchange Init
531 16.713203567 host -> client TCP 66 56104 > ssh [ACK] Seq=22 Ack=462 Win=30336 Len=0 TSval=443796184 TSecr=443800624
532 16.714881821 host -> client SSHv2 1314 Client: Diffie-Hellman Key Exchange Init
533 16.715060860 client -> host TCP 66 ssh > 56104 [ACK] Seq=462 Ack=1270 Win=31872 Len=0 TSval=443800639 TSecr=443796185
534 16.717441079 client -> host SSHv2 474 Server: New Keys
535 16.718751241 host -> client SSHv2 146 Client: New Keys
536 16.719009654 client -> host TCP 130 [TCP segment of a reassembled PDU]
537 16.719097445 host -> client TCP 146 56104 > ssh [PSH, ACK] Seq=1350 Ack=934 Win=31360 Len=80 TSval=443796189 TSecr=443800643[Reassembly error, protocol TCP: New fragment past old data limits]
538 16.724163039 client -> host TCP 1514 [TCP segment of a reassembled PDU]
539 16.724173258 client -> host TCP 74 [TCP segment of a reassembled PDU]
540 16.724179307 host -> client TCP 66 56104 > ssh [ACK] Seq=1430 Ack=2390 Win=34304 Len=0 TSval=443796195 TSecr=443800648
541 16.724268787 host -> client TCP 450 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796195 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
547 16.957149181 host -> client TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796428 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
551 17.190125831 host -> client TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796661 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
565 17.657163520 host -> client TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443797128 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
566 17.657491865 client -> host TCP 78 [TCP Previous segment not captured] ssh > 56104 [ACK] Seq=2742 Ack=1814 Win=34432 Len=0 TSval=443801581 TSecr=443797128 SLE=1430 SRE=1814
567 17.704925741 client -> host TCP 418 [TCP Retransmission] [TCP segment of a reassembled PDU]
568 17.707069100 host -> client TCP 738 56104 > ssh [PSH, ACK] Seq=1814 Ack=2742 Win=37248 Len=672 TSval=443797177 TSecr=443801629[Reassembly error, protocol TCP: New fragment past old data limits]
569 17.707270598 client -> host TCP 66 ssh > 56104 [ACK] Seq=2742 Ack=2486 Win=36864 Len=0 TSval=443801631 TSecr=443797177
597 18.685946777 client -> host TCP 642 [TCP segment of a reassembled PDU]
598 18.725132574 host -> client TCP 66 56104 > ssh [ACK] Seq=2486 Ack=3318 Win=40064 Len=0 TSval=443798196 TSecr=443802610
599 18.747191756 host -> client TCP 146 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798218 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
604 18.973126314 host -> client TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798444 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
608 19.199141784 host -> client TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798670 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
626 19.652156589 host -> client TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443799123 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
627 19.652531210 client -> host TCP 66 ssh > 56104 [ACK] Seq=3318 Ack=2566 Win=36864 Len=0 TSval=443803576 TSecr=443799123
628 19.652554872 client -> host TCP 130 [TCP segment of a reassembled PDU]
629 19.652563170 host -> client TCP 66 56104 > ssh [ACK] Seq=2566 Ack=3382 Win=40064 Len=0 TSval=443799123 TSecr=443803576
630 19.652762859 host -> client TCP 210 56104 > ssh [PSH, ACK] Seq=2566 Ack=3382 Win=40064 Len=144 TSval=443799123 TSecr=443803576[Reassembly error, protocol TCP: New fragment past old data limits]
631 19.657257053 client -> host TCP 178 [TCP segment of a reassembled PDU]
632 19.657447700 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2710 Ack=3494 Win=40064 Len=64 TSval=443799128 TSecr=443803581[Reassembly error, protocol TCP: New fragment past old data limits]
633 19.677567903 client -> host TCP 130 [TCP segment of a reassembled PDU]
634 19.677893785 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2774 Ack=3558 Win=40064 Len=64 TSval=443799148 TSecr=443803601[Reassembly error, protocol TCP: New fragment past old data limits]
635 19.678241891 client -> host TCP 130 [TCP segment of a reassembled PDU]
636 19.678488520 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2838 Ack=3622 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
637 19.678689498 client -> host TCP 130 [TCP segment of a reassembled PDU]
638 19.678894272 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2902 Ack=3686 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
639 19.679095118 client -> host TCP 130 [TCP segment of a reassembled PDU]
640 19.679313200 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=2966 Ack=3750 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
641 19.679619877 client -> host TCP 130 [TCP segment of a reassembled PDU]
642 19.679949334 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3030 Ack=3814 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
643 19.680344884 client -> host TCP 130 [TCP segment of a reassembled PDU]
644 19.680551898 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3094 Ack=3878 Win=40064 Len=64 TSval=443799151 TSecr=443803604[Reassembly error, protocol TCP: New fragment past old data limits]
645 19.680901329 client -> host TCP 130 [TCP segment of a reassembled PDU]
646 19.681128070 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3158 Ack=3942 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
647 19.681431225 client -> host TCP 130 [TCP segment of a reassembled PDU]
648 19.681629690 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3222 Ack=4006 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
649 19.681870571 client -> host TCP 130 [TCP segment of a reassembled PDU]
650 19.682068642 host -> client TCP 130 56104 > ssh [PSH, ACK] Seq=3286 Ack=4070 Win=40064 Len=64 TSval=443799152 TSecr=443803606[Reassembly error, protocol TCP: New fragment past old data limits]