HDR+Connection Manager in intermittent network scenario

Discussion:

(too old to reply)

Nate Woodward

2010-12-03 21:46:14 UTC

I'm trying to acheive high availability without data loss. I have an HDR
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers in
them, and the connection managers are configured to redirect the clients
to the current primary server in the pair. The connection managers are
also configured for failover, with FOC = HDR,10, so that the secondary
takes over the primary's job if it fails.

Now, I'm worried about this scenario:
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?

I'm considering writing an ALARMPROGRAM script to bring down the primary
if it loses network connectivity, so that HDR can be re-established with
the old secondary as the new primary. Is there a better way to do what
I'm trying to accomplish?

More info:

Informix 11.70.UC1GE (Linux 32-bit, although it'll be 64-bit in
production if I get a working setup)

sqlhosts (same on both boxes):

barmgr group - -
primbar1 onsoctcp host1 foobar_1526 g=barmgr
primbar2 onsoctcp host2 foobar_1526 g=barmgr

barcluster group - - i=10
#foobar onsoctcp host1 foobar_1526
g=barcluster
foobar1 onsoctcp host1 foobar1_1527 g=barcluster
foobar2 onsoctcp host2 foobar2_1528 g=barcluster

cmsm.cfg on host1:

NAME barmgr1
SLA primbar1=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10

cmsm.cfg on host2:

NAME barmgr2
SLA primbar2=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10

'onstat -g dri' on host1:

IBM Informix Dynamic Server Version 11.70.UC1GE -- On-Line (Prim) -- Up
02:23:24 -- 354840 Kbytes

Data Replication at 0x841a81a0:
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
primary on foobar2 417 / 40
NA

DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0

'onstat -g dri' on host2:

IBM Informix Dynamic Server Version 11.70.UC1GE -- Read-Only (Sec) -- Up
01:46:53 -- 354840 Kbytes

Data Replication at 0x841ae1a0:
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
HDR Secondary on foobar1 417 / 40
N

DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0

Cesar Inacio Martins

2010-12-08 17:52:46 UTC

Permalink

Nate, you will found information about this on the manuals.
Check for DRAUTO, FAILOVER_CALLBACK...

Just sharing a issues what I got with IFX 11.50 FC7W1GE + CM 3.50 FC7,
Linux.

On environment with similar configuration, using HDR, I have trouble
with CM FOC and DRAUTO configuration.
Situations like, if primary fail, and back later, they don't detect the
secondary is take over and mess all configuration.
(configuring DRAUTO = 0 or 3 , just don't work what I expected...)

The unique configuration what I found to work with HDR with out need to
remake the "lost instance" is DRAUTO=1 and for this I need to disabling
the FOC on CM.
With DRAUTO=1 and FOC=disabled, On my tests, if the primary goes down,
the secondary take over (new primary) immediately, when the original
primary comeback , them automatically resynchronize and the back to
original configuration.

All other configurations (I try a lot), always something happen, the
"lost instance" don't recognize the new primary or they don't
synchronize or they just don't goes up or goes up in standard mode (the
worst situation)....

I still not sure if I missing something when I try to configure this
environment or if this is the default behave with used HDR with CM FOC
(don't detect the primary is back and treat the situation).

Just for curiosity, with this same IFX version, onto other environment,
using *SDS*, all works fine with CM.
All switchover and identification of the new primary by the old primary
or others secondaries...

Regards
Cesar

Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an HDR
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers in
them, and the connection managers are configured to redirect the clients
to the current primary server in the pair. The connection managers are
also configured for failover, with FOC = HDR,10, so that the secondary
takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the primary
if it loses network connectivity, so that HDR can be re-established with
the old secondary as the new primary. Is there a better way to do what
I'm trying to accomplish?
Informix 11.70.UC1GE (Linux 32-bit, although it'll be 64-bit in
production if I get a working setup)
barmgr group - -
primbar1 onsoctcp host1 foobar_1526 g=barmgr
primbar2 onsoctcp host2 foobar_1526 g=barmgr
barcluster group - - i=10
#foobar onsoctcp host1 foobar_1526
g=barcluster
foobar1 onsoctcp host1 foobar1_1527 g=barcluster
foobar2 onsoctcp host2 foobar2_1528 g=barcluster
NAME barmgr1
SLA primbar1=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
NAME barmgr2
SLA primbar2=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
IBM Informix Dynamic Server Version 11.70.UC1GE -- On-Line (Prim) -- Up
02:23:24 -- 354840 Kbytes
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
primary on foobar2 417 / 40
NA
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
IBM Informix Dynamic Server Version 11.70.UC1GE -- Read-Only (Sec) -- Up
01:46:53 -- 354840 Kbytes
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
HDR Secondary on foobar1 417 / 40
N
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list

Pravin Kedia

2010-12-08 18:18:02 UTC

Permalink

Hi Nate,

I guess the manual failover (DRAUTO=0) is the best configuration for HADR
and CM environment for handling the conditions described by you in you mail
below. (dealing with network issues)

Hi Cesar,

This look like a little different behavior than in the earlier version (I
guess it was 11.50.FC5) something. In that version the DRAUTO=0 with FOC ON
was the only way I could get failover to work properly, but what you
described below looks very different.

Thanks & Regards,
Pravin

From: Cesar Inacio Martins <***@yahoo.com.br>
To: informix-***@iiug.org
Date: 12/08/2010 11:29 PM
Subject: Re: HDR+Connection Manager in intermittent network scenario
Sent by: informix-list-***@iiug.org

Nate, you will found information about this on the manuals.
Check for DRAUTO, FAILOVER_CALLBACK...

Just sharing a issues what I got with IFX 11.50 FC7W1GE + CM 3.50 FC7,
Linux.

On environment with similar configuration, using HDR, I have trouble
with CM FOC and DRAUTO configuration.
Situations like, if primary fail, and back later, they don't detect the
secondary is take over and mess all configuration.
(configuring DRAUTO = 0 or 3 , just don't work what I expected...)

The unique configuration what I found to work with HDR with out need to
remake the "lost instance" is DRAUTO=1 and for this I need to disabling
the FOC on CM.
With DRAUTO=1 and FOC=disabled, On my tests, if the primary goes down,
the secondary take over (new primary) immediately, when the original
primary comeback , them automatically resynchronize and the back to
original configuration.

All other configurations (I try a lot), always something happen, the
"lost instance" don't recognize the new primary or they don't
synchronize or they just don't goes up or goes up in standard mode (the
worst situation)....

I still not sure if I missing something when I try to configure this
environment or if this is the default behave with used HDR with CM FOC
(don't detect the primary is back and treat the situation).

Just for curiosity, with this same IFX version, onto other environment,
using *SDS*, all works fine with CM.
All switchover and identification of the new primary by the old primary
or others secondaries...

Regards
Cesar

m***@wellsfargo.com

2010-12-08 19:46:39 UTC

Permalink

Post by Nate Woodward

Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an

HDR

Post by Nate Woodward
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers

Post by Nate Woodward
them, and the connection managers are configured to redirect the

clients

Post by Nate Woodward
to the current primary server in the pair. The connection managers

are

Post by Nate Woodward
also configured for failover, with FOC = HDR,10, so that the

secondary

Post by Nate Woodward
takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the

primary

Post by Nate Woodward
if it loses network connectivity, so that HDR can be re-established

with

Post by Nate Woodward
the old secondary as the new primary. Is there a better way to do

what

Post by Nate Woodward
I'm trying to accomplish?

<snipped>

It is mainly because of dangerous situations like this (dual primaries) that I maintain that there should be a *third* server to monitor both the primary and secondary. Besides monitoring, that third server should also be the one to execute the switchover of the primary duties to the secondary server (when the old primary becomes unresponsive), and should also be the one to re-direct the client traffic to the new primary db server. This is the very successful model that we put in place here in 2004 for a highly critical HDR pair -- obviously, this was *before* MACH 11 and Connection Manager. (for more info, see my presentation from the 2006 NA IIUG/IDUG Conference, available in the member area of iiug.org)

My recommendation would be to put the Connection Manager on a small *third* server, ideally situated very close network-wise to the majority of your client connections. The redundant CM should ideally be on a fourth server, but also located close by client servers. The main point here, of course, is to *NOT* have your CM on the same server as your DBMS.

HTH,
Paul Mosser

Art Kagel

2010-12-08 20:19:31 UTC

Permalink

I agree with Paul, especially if you have the CM managing failover, but even
if not it's a very good idea!

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (***@iiug.org)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.

Post by m***@wellsfargo.com

Post by Nate Woodward

Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an

HDR

Post by Nate Woodward
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers

Post by Nate Woodward
them, and the connection managers are configured to redirect the

clients

Post by Nate Woodward
to the current primary server in the pair. The connection managers

are

Post by Nate Woodward
also configured for failover, with FOC = HDR,10, so that the

secondary

primary

Post by Nate Woodward
if it loses network connectivity, so that HDR can be re-established

with

Post by Nate Woodward
the old secondary as the new primary. Is there a better way to do

what

Post by Nate Woodward
I'm trying to accomplish?

<snipped>
It is mainly because of dangerous situations like this (dual primaries)
that I maintain that there should be a *third* server to monitor both the
primary and secondary. Besides monitoring, that third server should also be
the one to execute the switchover of the primary duties to the secondary
server (when the old primary becomes unresponsive), and should also be the
one to re-direct the client traffic to the new primary db server. This is
the very successful model that we put in place here in 2004 for a highly
critical HDR pair -- obviously, this was *before* MACH 11 and Connection
Manager. (for more info, see my presentation from the 2006 NA IIUG/IDUG
Conference, available in the member area of iiug.org)
My recommendation would be to put the Connection Manager on a small *third*
server, ideally situated very close network-wise to the majority of your
client connections. The redundant CM should ideally be on a fourth server,
but also located close by client servers. The main point here, of course,
is to *NOT* have your CM on the same server as your DBMS.
HTH,
Paul Mosser
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list

Nate Woodward

2010-12-09 18:03:29 UTC

Permalink

Cesar, I must have missed the FAILOVER_CALLBACK parameter in the docs.
Thanks for pointing it out, I'll take a look.

Speaking more generally to both you and Pravin, I must confess, I
haven't experimented much with DRAUTO set to 0 or 1. The impression I
got from the documentation was that DRAUTO=3 was the correct way to let
the CM handle failover. Guess I'll have to re-check that assumption.

Thanks for the info,
-Nate

-----Original Message-----
From: Pravin Kedia [mailto:***@in.ibm.com]
Sent: Wednesday, December 08, 2010 12:18 PM
To: Cesar Inacio Martins
Cc: informix-list-***@iiug.org; informix-***@iiug.org
Subject: Re: HDR+Connection Manager in intermittent network scenario

Hi Nate,

I guess the manual failover (DRAUTO=0) is the best configuration for
HADR and CM environment for handling the conditions described by you in
you mail below. (dealing with network issues)

Hi Cesar,

This look like a little different behavior than in the earlier version
(I guess it was 11.50.FC5) something. In that version the DRAUTO=0 with
FOC ON was the only way I could get failover to work properly, but what
you described below looks very different.

Thanks & Regards,
Pravin

From: Cesar Inacio Martins <***@yahoo.com.br>
To: informix-***@iiug.org
Date: 12/08/2010 11:29 PM
Subject: Re: HDR+Connection Manager in intermittent network scenario
Sent by: informix-list-***@iiug.org

Nate, you will found information about this on the manuals.
Check for DRAUTO, FAILOVER_CALLBACK...

Just sharing a issues what I got with IFX 11.50 FC7W1GE + CM 3.50 FC7,
Linux.

On environment with similar configuration, using HDR, I have trouble
with CM FOC and DRAUTO configuration.
Situations like, if primary fail, and back later, they don't detect the
secondary is take over and mess all configuration.
(configuring DRAUTO = 0 or 3 , just don't work what I expected...)

The unique configuration what I found to work with HDR with out need to
remake the "lost instance" is DRAUTO=1 and for this I need to disabling
the FOC on CM.
With DRAUTO=1 and FOC=disabled, On my tests, if the primary goes down,
the secondary take over (new primary) immediately, when the original
primary comeback , them automatically resynchronize and the back to
original configuration.

All other configurations (I try a lot), always something happen, the
"lost instance" don't recognize the new primary or they don't
synchronize or they just don't goes up or goes up in standard mode (the
worst situation)....

I still not sure if I missing something when I try to configure this
environment or if this is the default behave with used HDR with CM FOC
(don't detect the primary is back and treat the situation).

Just for curiosity, with this same IFX version, onto other environment,
using *SDS*, all works fine with CM.
All switchover and identification of the new primary by the old primary
or others secondaries...

Regards
Cesar

Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an
HDR pair configured with a connection manager on each box for

redundancy.

Post by Nate Woodward
Clients are pointed at an sqlhosts group with the connection managers
in them, and the connection managers are configured to redirect the
clients to the current primary server in the pair. The connection
managers are also configured for failover, with FOC = HDR,10, so that
the secondary takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the
primary if it loses network connectivity, so that HDR can be
re-established with the old secondary as the new primary. Is there a
better way to do what I'm trying to accomplish?
Informix 11.70.UC1GE (Linux 32-bit, although it'll be 64-bit in
production if I get a working setup)
barmgr group - -
primbar1 onsoctcp host1 foobar_1526 g=barmgr
primbar2 onsoctcp host2 foobar_1526 g=barmgr
barcluster group - - i=10
#foobar onsoctcp host1 foobar_1526
g=barcluster
foobar1 onsoctcp host1 foobar1_1527

g=barcluster

Post by Nate Woodward
foobar2 onsoctcp host2 foobar2_1528 g=barcluster
NAME barmgr1
SLA primbar1=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
NAME barmgr2
SLA primbar2=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
IBM Informix Dynamic Server Version 11.70.UC1GE -- On-Line (Prim) --
Up
02:23:24 -- 354840 Kbytes
Type State Paired server Last DR CKPT

(id/pg)

Post by Nate Woodward
Supports Proxy Writes
primary on foobar2 417 / 40
NA
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
IBM Informix Dynamic Server Version 11.70.UC1GE -- Read-Only (Sec) --
Up
01:46:53 -- 354840 Kbytes
Type State Paired server Last DR CKPT

(id/pg)

Post by Nate Woodward
Supports Proxy Writes
HDR Secondary on foobar1 417 / 40
N
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list

Nate Woodward

2010-12-09 18:04:23 UTC

Permalink

Thanks for the advice, I'll give it a try.

-Nate

PS: I'm new to this list, and the Informix-list Digest emails I get say
to include the message number in square brackets in the subject line. I
don't see this number anywhere -- am I doing something wrong?

________________________________

From: Art Kagel [mailto:***@gmail.com]
Sent: Wednesday, December 08, 2010 2:20 PM
To: ***@wellsfargo.com
Cc: informix-***@iiug.org
Subject: Re: HDR+Connection Manager in intermittent network scenario

I agree with Paul, especially if you have the CM managing failover, but
even if not it's a very good idea!

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (***@iiug.org)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated
nor those of the entities themselves.

Post by Nate Woodward

Post by Nate Woodward
I'm trying to acheive high availability without data loss. I

have an

Post by Nate Woodward
HDR

Post by Nate Woodward
pair configured with a connection manager on each box for

redundancy.

Post by Nate Woodward

Post by Nate Woodward
Clients are pointed at an sqlhosts group with the connection

managers

Post by Nate Woodward
in

Post by Nate Woodward
them, and the connection managers are configured to redirect

the

Post by Nate Woodward
clients

Post by Nate Woodward
to the current primary server in the pair. The connection

managers

Post by Nate Woodward
are

Post by Nate Woodward
also configured for failover, with FOC = HDR,10, so that the

secondary

the

Post by Nate Woodward
primary

Post by Nate Woodward
if it loses network connectivity, so that HDR can be

re-established

Post by Nate Woodward
with

Post by Nate Woodward
the old secondary as the new primary. Is there a better way to

Post by Nate Woodward
what

Post by Nate Woodward
I'm trying to accomplish?

<snipped>

It is mainly because of dangerous situations like this (dual
primaries) that I maintain that there should be a *third* server to
monitor both the primary and secondary. Besides monitoring, that third
server should also be the one to execute the switchover of the primary
duties to the secondary server (when the old primary becomes
unresponsive), and should also be the one to re-direct the client
traffic to the new primary db server. This is the very successful model
that we put in place here in 2004 for a highly critical HDR pair --
obviously, this was *before* MACH 11 and Connection Manager. (for more
info, see my presentation from the 2006 NA IIUG/IDUG Conference,
available in the member area of iiug.org)

My recommendation would be to put the Connection Manager on a small
*third* server, ideally situated very close network-wise to the majority
of your client connections. The redundant CM should ideally be on a
fourth server, but also located close by client servers. The main point
here, of course, is to *NOT* have your CM on the same server as your
DBMS.

HTH,
Paul Mosser
_______________________________________________
Informix-list mailing list
Informix-***@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list