First time here? Check out the FAQ!
THIS IS A TEST INSTANCE. Feel free to ask and answer questions, but take care to avoid triggering too many notifications.
0

RST + ACK + CWR message, beginner needs help (FIX application)

  • retag add tags

Good day,

I am trying to debug problematic case with FIX message is being lost on regular basis. What I mean by this, is there are two applications (FIX client and FIX server). Those applications establish connection and start exchanging Hearbeat messages. When finally client application decides to send order message (this message is of longer size and can not fit in one packet, it takes 2-3 packets normally), connection drops with "An existing connection was forcibly closed by the remote host." message produced by both application's network api (Windows).

I have recoded pcap file

I wonder would could be possible reason, or if unclear, what should I review / check next.

Thank you for your help.

Update 1: Adding previous day pcap file as requested by Christian_R in comments.

Update 2: Adding both server pcap and client pcap files. It looks that checking client side pcap has something very interesting in it at around Frame #325. What does that "TCP Out-Of-Order" sent by client itself mean?

topoden's avatar
3
topoden
asked 2019-02-12 16:13:27 +0000, updated 2019-02-14 14:11:38 +0000
edit flag offensive 0 remove flag close merge delete

Comments

Interesting case. But can you trace a whole session: -Session setup - Heartbeat - start sending - Session Drop

I have some question marks in mind, especially the used IP Flags are not clear to me at the moment.

Christian_R's avatar Christian_R (2019-02-12 16:49:51 +0000) edit

Unfortunately, I started recording this session too late and do not have all the messages since TCP connection is established. So even if I upload entire pcap file, all it would have is continuous set of Heartbeat messages coming to / from server / client. Do you think this will help?

Otherwise I can provide pcap file from previous day, where you can see the initial session setup, but not error. I must say that the error happens in most (but not all the days) we have some days (rarely though) when the issue does not happen at all. What is also worth mentioning is that if the issue happens, it tends to happen to the first order (longer) message of the day. All the rest order messages get delivered no problems after that (when the new connection and new session is setup). Let me know if the pcap file from day ... (more)

topoden's avatar topoden (2019-02-12 18:37:10 +0000) edit

A better trace would help.

Christian_R's avatar Christian_R (2019-02-12 19:30:11 +0000) edit

But in the meantime the session initiation of the old trace could help, too.

Christian_R's avatar Christian_R (2019-02-12 19:31:10 +0000) edit

OK, I've updated the question with the link to the previous day pcap file. It has the messages since the connection was established until the first successfull order message comes.

topoden's avatar topoden (2019-02-12 19:43:59 +0000) edit
add a comment see more comments

1 Answer

0

Answer for trace of Update1:

The trace at server side I guess, too. The session at all looks a little bit strange in some details.

But I would guess there is something inside the oder packet which causes the application to crash.

Another hint is that before session finally is initiated the SYN gots often an RST as an answer.

=================================================

Answer for Update2 traces:

First of all we see differences in the 3way-Handshake of client side and server side. Handshake at client side: - Client advertises 1460 MSS - Server advertises 1460 MSS

Handshake at server side: - Client advertises 1398 MSS - Server advertises 1460 MSS

Paket 325-330, are to big for the tunnel, and didn´t make it through the tunnel. At the end the client resets the session. Then we must change to the server side trace as the trace is longer. After that resets a few session retries happen and in the end the client tries a session with Fragmentation allowed. Which mostly won´t work well on tunnels.

So my recommendation is: Please try to advertise an adjusted server MSS to to the client. Like the client does on Server side trace.

Some routers are able to do so. see here: https://www.cisco.com/c/en/us/support...

Here you can find an explanation about MSS in general: https://crnetpackets.com/2016/01/27/t...

Christian_R's avatar
2.1k
Christian_R
answered 2019-02-13 20:08:02 +0000, updated 2019-02-15 10:41:14 +0000
edit flag offensive 0 remove flag delete link

Comments

"...But I would guess there is something inside the oder packet which causes the application to crash...."

Unfortunately searching for the issues in software (applications) is where we had started the challenge before we decided to move to packet capture area. There is no indication of either side (client or server) to crash, or have any errors. Both applications keep working (no restarts involved, they keep working in the same thread) and, in fact, re-establish connection in several seconds (as you may see in the initial pcap I provided). After the new connection is re-established and new FIX session is set, the client (upon server request - following FIX protocol) re-sends the lost 'order' message and this time it gets delivered no problems.

Just to clarify, I am not saying the issue is not in applications, I am rather saying that we are trying to see if packet capture gives us ... (more)

topoden's avatar topoden (2019-02-13 20:23:37 +0000) edit

"...Another hint is that before session finally is initiated the SYN gots often an RST as an answer."

Could you please elaborate a little more what you mean by this.

topoden's avatar topoden (2019-02-13 20:24:37 +0000) edit

That means, that the port was was not ready to establish a session. Most likely because the service was down.

Christian_R's avatar Christian_R (2019-02-13 22:08:36 +0000) edit

"...That means, that the port was was not ready to establish a session. Most likely because the service was down...."

Ah, yes, that is true. Client starts up a little before the server scheduled start up time is. So client keeps making connection attempts untill server is there and starts accepting new connections. So this is just the 'may be funny' approach the two applications use now. I do not think this is related to the issue, do you?

topoden's avatar topoden (2019-02-13 22:29:13 +0000) edit

While capturing, did you have capture filter applied? Packets 324 and 325 (client trace) have the same Seq.N. (not progressing), but reduced MTU. This is why #325 is called "out-of order".

There is only 0.2 ms delay between the two packets, which means the client had received an (ICMP?) instruction to reduce MTU. But we don't see it in the trace.

It looks like ICMP is filtered out of the trace.

The second question is - why even reduced packets didn't get through? Do you have double tunnel encapsulation on the path performed on two different routers?

Packet_vlad's avatar Packet_vlad (2019-02-14 14:28:56 +0000) edit
add a comment see more comments

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss.

Add Answer