Discussion:
Network connection is broken (HDR vs RSS)
(too old to reply)
Steve Nixon
2015-03-16 22:19:46 UTC
Permalink
I have an Informix instance that is replicated to two other servers.

PRIMARY
IDS Version 11.50.FC9W2XJ -- On-Line (Prim) -- Up

HDR
IDS Version 11.50.FC9W2XJ -- Read-Only (Sec) -- Up

RSS
IDS Version 11.50.FC9W2XJ -- Read-Only (RSS) -- Up


Every once in a while, I get the following error in the log on the primary:

listener-thread: err = -25582: oserr = 0:
errstr = : Network connection is broken.

Now as I understand it, this happens as a result of a temporary loss of connectivity on the TCP port when it tries to send the logical log info to one of the other two servers.

My question is:

Is there any way to tell WHICH of the two servers it was not able to connect to?

The log on the primary only has that error message without a paired server info or host name. And the logs on the HDR and RSS don't seem to report any network issues when the problem occurs on the primary.

Is there a way to tell which one had the issue based on any other messages in the logs? In case it matters, here are the messages from the time frame in question of my most recent "broken" message.


16:27:14 Logical Log 428205 Complete, timestamp: 0xe8c073ee.
16:27:16 Logical Log 428205 - Backup Started
16:27:16 Logical Log 428205 - Backup Completed
16:30:22 Logical Log 428206 Complete, timestamp: 0xe8c6da8a.
16:30:26 Logical Log 428206 - Backup Started
16:30:26 Logical Log 428206 - Backup Completed
16:30:32 Checkpoint Completed: duration was 0 seconds.
16:30:32 Mon Mar 16 - loguniq 428207, logpos 0x44338, timestamp: 0xe8c72572 Int
erval: 601499

16:30:32 Maximum server connections 734
16:30:32 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, P
log used 10549, Llog used 4044

16:32:05 listener-thread: err = -25582: oserr = 0: errstr = : Network connectio
n is broken.

16:35:32 Checkpoint Completed: duration was 0 seconds.
16:35:32 Mon Mar 16 - loguniq 428207, logpos 0x7da834, timestamp: 0xe8ceb0be In
terval: 601500

16:35:32 Maximum server connections 734




Thanks in advance,

Steve N.
Justin Killen
2015-03-16 23:24:03 UTC
Permalink
Not sure that this helps, but in 12.10 the error happens for any lost connection, not just ones between servers. The log is more descriptive, so they must have gotten this complaint and fixed it. Example log from 12.10:

16:18:05 listener-thread: err = -25582: oserr = 0: errstr = from dozer.sg1.allamericanasphalt.com to server moeaix : Network connection is broken.

This log often gets paired with one like this:

16:17:12 listener-thread: err = -25580: oserr = -1: errstr = : System error occurred in network function.
System error = -1.


-Justin

-----Original Message-----
From: informix-list-***@iiug.org [mailto:informix-list-***@iiug.org] On Behalf Of Steve Nixon
Sent: Monday, March 16, 2015 3:20 PM
To: informix-***@iiug.org
Subject: Network connection is broken (HDR vs RSS)

I have an Informix instance that is replicated to two other servers.

PRIMARY
IDS Version 11.50.FC9W2XJ -- On-Line (Prim) -- Up

HDR
IDS Version 11.50.FC9W2XJ -- Read-Only (Sec) -- Up

RSS
IDS Version 11.50.FC9W2XJ -- Read-Only (RSS) -- Up


Every once in a while, I get the following error in the log on the primary:

listener-thread: err = -25582: oserr = 0:
errstr = : Network connection is broken.

Now as I understand it, this happens as a result of a temporary loss of connectivity on the TCP port when it tries to send the logical log info to one of the other two servers.

My question is:

Is there any way to tell WHICH of the two servers it was not able to connect to?

The log on the primary only has that error message without a paired server info or host name. And the logs on the HDR and RSS don't seem to report any network issues when the problem occurs on the primary.

Is there a way to tell which one had the issue based on any other messages in the logs? In case it matters, here are the messages from the time frame in question of my most recent "broken" message.


16:27:14 Logical Log 428205 Complete, timestamp: 0xe8c073ee.
16:27:16 Logical Log 428205 - Backup Started
16:27:16 Logical Log 428205 - Backup Completed
16:30:22 Logical Log 428206 Complete, timestamp: 0xe8c6da8a.
16:30:26 Logical Log 428206 - Backup Started
16:30:26 Logical Log 428206 - Backup Completed
16:30:32 Checkpoint Completed: duration was 0 seconds.
16:30:32 Mon Mar 16 - loguniq 428207, logpos 0x44338, timestamp: 0xe8c72572 Int
erval: 601499

16:30:32 Maximum server connections 734
16:30:32 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, P log used 10549, Llog used 4044

16:32:05 listener-thread: err = -25582: oserr = 0: errstr = : Network connectio n is broken.

16:35:32 Checkpoint Completed: duration was 0 seconds.
16:35:32 Mon Mar 16 - loguniq 428207, logpos 0x7da834, timestamp: 0xe8ceb0be In
terval: 601500

16:35:32 Maximum server connections 734




Thanks in advance,

Steve N.
Eric Vercelletto
2015-03-17 09:04:12 UTC
Permalink
Steve,

your issue is very likely a network issue, involving some problem with the tcp layer.

I have a customer who had this error one or twice per month, then when we installed HDR, the error occurred in a very heavy frequency, causing sometimes disruptions in the HDR pair.

We finally found out that setting the HDR port on another network interface was an efficient way to remove load on the initial port/interface, and the error disapppeared for ever. Ping on this server/port stated some unexpected load peaks.

The root cause was that the Connection Manager did not redirect the client connections to the right port, which was not supposed to happen.

Do you use CM ?
Steve Nixon
2015-03-17 17:40:08 UTC
Permalink
Post by Eric Vercelletto
Steve,
your issue is very likely a network issue, involving some problem with the tcp layer.
I have a customer who had this error one or twice per month, then when we installed HDR, the error occurred in a very heavy frequency, causing sometimes disruptions in the HDR pair.
We finally found out that setting the HDR port on another network interface was an efficient way to remove load on the initial port/interface, and the error disapppeared for ever. Ping on this server/port stated some unexpected load peaks.
The root cause was that the Connection Manager did not redirect the client connections to the right port, which was not supposed to happen.
Do you use CM ?
Hi Eric,

No we don't use CM.

And thanks for the heads up Justin. We will be moving to IDS 12 later this year probably.

Another of the tech gurus here PM'd me and suggested that the message actually may be a failure of someone trying to connect INTO the database rather than trying to send data over to the HDR or RSS, so I am looking into that. It made sense since the error was from the "listener" which I would expect was handling inbound connections.

Thanks for all the suggestions.

Steve
Justin Killen
2015-03-18 21:13:02 UTC
Permalink
Steve,

We get this error frequently, as we have a network monitor in place that regularly attempts to open the socket. To duplicate the error, just telnet to your socket port and then disconnect - you'll see the entry in the log on disconnect.

-Justin

-----Original Message-----
From: informix-list-***@iiug.org [mailto:informix-list-***@iiug.org] On Behalf Of Steve Nixon
Sent: Tuesday, March 17, 2015 10:40 AM
To: informix-***@iiug.org
Subject: Re: Network connection is broken (HDR vs RSS)
Post by Eric Vercelletto
Steve,
your issue is very likely a network issue, involving some problem with the tcp layer.
I have a customer who had this error one or twice per month, then when we installed HDR, the error occurred in a very heavy frequency, causing sometimes disruptions in the HDR pair.
We finally found out that setting the HDR port on another network interface was an efficient way to remove load on the initial port/interface, and the error disapppeared for ever. Ping on this server/port stated some unexpected load peaks.
The root cause was that the Connection Manager did not redirect the client connections to the right port, which was not supposed to happen.
Do you use CM ?
Hi Eric,

No we don't use CM.

And thanks for the heads up Justin. We will be moving to IDS 12 later this year probably.

Another of the tech gurus here PM'd me and suggested that the message actually may be a failure of someone trying to connect INTO the database rather than trying to send data over to the HDR or RSS, so I am looking into that. It made sense since the error was from the "listener" which I would expect was handling inbound connections.

Thanks for all the suggestions.

Steve

Loading...