Discussion:
HDR+Connection Manager in intermittent network scenario
(too old to reply)
Nate Woodward
2010-12-03 21:46:14 UTC
Permalink
I'm trying to acheive high availability without data loss. I have an HDR
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers in
them, and the connection managers are configured to redirect the clients
to the current primary server in the pair. The connection managers are
also configured for failover, with FOC = HDR,10, so that the secondary
takes over the primary's job if it fails.

Now, I'm worried about this scenario:
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?

I'm considering writing an ALARMPROGRAM script to bring down the primary
if it loses network connectivity, so that HDR can be re-established with
the old secondary as the new primary. Is there a better way to do what
I'm trying to accomplish?

More info:

Informix 11.70.UC1GE (Linux 32-bit, although it'll be 64-bit in
production if I get a working setup)

sqlhosts (same on both boxes):

barmgr group - -
primbar1 onsoctcp host1 foobar_1526 g=barmgr
primbar2 onsoctcp host2 foobar_1526 g=barmgr

barcluster group - - i=10
#foobar onsoctcp host1 foobar_1526
g=barcluster
foobar1 onsoctcp host1 foobar1_1527 g=barcluster
foobar2 onsoctcp host2 foobar2_1528 g=barcluster



cmsm.cfg on host1:

NAME barmgr1
SLA primbar1=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10



cmsm.cfg on host2:

NAME barmgr2
SLA primbar2=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10



'onstat -g dri' on host1:

IBM Informix Dynamic Server Version 11.70.UC1GE -- On-Line (Prim) -- Up
02:23:24 -- 354840 Kbytes

Data Replication at 0x841a81a0:
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
primary on foobar2 417 / 40
NA

DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0



'onstat -g dri' on host2:

IBM Informix Dynamic Server Version 11.70.UC1GE -- Read-Only (Sec) -- Up
01:46:53 -- 354840 Kbytes

Data Replication at 0x841ae1a0:
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
HDR Secondary on foobar1 417 / 40
N

DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
Cesar Inacio Martins
2010-12-08 17:52:46 UTC
Permalink
Nate, you will found information about this on the manuals.
Check for DRAUTO, FAILOVER_CALLBACK...

Just sharing a issues what I got with IFX 11.50 FC7W1GE + CM 3.50 FC7,
Linux.

On environment with similar configuration, using HDR, I have trouble
with CM FOC and DRAUTO configuration.
Situations like, if primary fail, and back later, they don't detect the
secondary is take over and mess all configuration.
(configuring DRAUTO = 0 or 3 , just don't work what I expected...)

The unique configuration what I found to work with HDR with out need to
remake the "lost instance" is DRAUTO=1 and for this I need to disabling
the FOC on CM.
With DRAUTO=1 and FOC=disabled, On my tests, if the primary goes down,
the secondary take over (new primary) immediately, when the original
primary comeback , them automatically resynchronize and the back to
original configuration.

All other configurations (I try a lot), always something happen, the
"lost instance" don't recognize the new primary or they don't
synchronize or they just don't goes up or goes up in standard mode (the
worst situation)....

I still not sure if I missing something when I try to configure this
environment or if this is the default behave with used HDR with CM FOC
(don't detect the primary is back and treat the situation).


Just for curiosity, with this same IFX version, onto other environment,
using *SDS*, all works fine with CM.
All switchover and identification of the new primary by the old primary
or others secondaries...


Regards
Cesar
Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an HDR
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers in
them, and the connection managers are configured to redirect the clients
to the current primary server in the pair. The connection managers are
also configured for failover, with FOC = HDR,10, so that the secondary
takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the primary
if it loses network connectivity, so that HDR can be re-established with
the old secondary as the new primary. Is there a better way to do what
I'm trying to accomplish?
Informix 11.70.UC1GE (Linux 32-bit, although it'll be 64-bit in
production if I get a working setup)
barmgr group - -
primbar1 onsoctcp host1 foobar_1526 g=barmgr
primbar2 onsoctcp host2 foobar_1526 g=barmgr
barcluster group - - i=10
#foobar onsoctcp host1 foobar_1526
g=barcluster
foobar1 onsoctcp host1 foobar1_1527 g=barcluster
foobar2 onsoctcp host2 foobar2_1528 g=barcluster
NAME barmgr1
SLA primbar1=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
NAME barmgr2
SLA primbar2=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
IBM Informix Dynamic Server Version 11.70.UC1GE -- On-Line (Prim) -- Up
02:23:24 -- 354840 Kbytes
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
primary on foobar2 417 / 40
NA
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
IBM Informix Dynamic Server Version 11.70.UC1GE -- Read-Only (Sec) -- Up
01:46:53 -- 354840 Kbytes
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
HDR Secondary on foobar1 417 / 40
N
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Pravin Kedia
2010-12-08 18:18:02 UTC
Permalink
Hi Nate,

I guess the manual failover (DRAUTO=0) is the best configuration for HADR
and CM environment for handling the conditions described by you in you mail
below. (dealing with network issues)

Hi Cesar,

This look like a little different behavior than in the earlier version (I
guess it was 11.50.FC5) something. In that version the DRAUTO=0 with FOC ON
was the only way I could get failover to work properly, but what you
described below looks very different.

Thanks & Regards,
Pravin


From: Cesar Inacio Martins <***@yahoo.com.br>
To: informix-***@iiug.org
Date: 12/08/2010 11:29 PM
Subject: Re: HDR+Connection Manager in intermittent network scenario
Sent by: informix-list-***@iiug.org



Nate, you will found information about this on the manuals.
Check for DRAUTO, FAILOVER_CALLBACK...

Just sharing a issues what I got with IFX 11.50 FC7W1GE + CM 3.50 FC7,
Linux.

On environment with similar configuration, using HDR, I have trouble
with CM FOC and DRAUTO configuration.
Situations like, if primary fail, and back later, they don't detect the
secondary is take over and mess all configuration.
(configuring DRAUTO = 0 or 3 , just don't work what I expected...)

The unique configuration what I found to work with HDR with out need to
remake the "lost instance" is DRAUTO=1 and for this I need to disabling
the FOC on CM.
With DRAUTO=1 and FOC=disabled, On my tests, if the primary goes down,
the secondary take over (new primary) immediately, when the original
primary comeback , them automatically resynchronize and the back to
original configuration.

All other configurations (I try a lot), always something happen, the
"lost instance" don't recognize the new primary or they don't
synchronize or they just don't goes up or goes up in standard mode (the
worst situation)....

I still not sure if I missing something when I try to configure this
environment or if this is the default behave with used HDR with CM FOC
(don't detect the primary is back and treat the situation).


Just for curiosity, with this same IFX version, onto other environment,
using *SDS*, all works fine with CM.
All switchover and identification of the new primary by the old primary
or others secondaries...


Regards
Cesar
Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an HDR
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers in
them, and the connection managers are configured to redirect the clients
to the current primary server in the pair. The connection managers are
also configured for failover, with FOC = HDR,10, so that the secondary
takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the primary
if it loses network connectivity, so that HDR can be re-established with
the old secondary as the new primary. Is there a better way to do what
I'm trying to accomplish?
Informix 11.70.UC1GE (Linux 32-bit, although it'll be 64-bit in
production if I get a working setup)
barmgr group - -
primbar1 onsoctcp host1 foobar_1526 g=barmgr
primbar2 onsoctcp host2 foobar_1526 g=barmgr
barcluster group - - i=10
#foobar onsoctcp host1 foobar_1526
g=barcluster
foobar1 onsoctcp host1 foobar1_1527 g=barcluster
foobar2 onsoctcp host2 foobar2_1528 g=barcluster
NAME barmgr1
SLA primbar1=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
NAME barmgr2
SLA primbar2=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
IBM Informix Dynamic Server Version 11.70.UC1GE -- On-Line (Prim) -- Up
02:23:24 -- 354840 Kbytes
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
primary on foobar2 417 / 40
NA
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
IBM Informix Dynamic Server Version 11.70.UC1GE -- Read-Only (Sec) -- Up
01:46:53 -- 354840 Kbytes
Type State Paired server Last DR CKPT (id/pg)
Supports Proxy Writes
HDR Secondary on foobar1 417 / 40
N
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
m***@wellsfargo.com
2010-12-08 19:46:39 UTC
Permalink
Post by Nate Woodward
Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an
HDR
Post by Nate Woodward
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers
in
Post by Nate Woodward
them, and the connection managers are configured to redirect the
clients
Post by Nate Woodward
to the current primary server in the pair. The connection managers
are
Post by Nate Woodward
also configured for failover, with FOC = HDR,10, so that the
secondary
Post by Nate Woodward
takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the
primary
Post by Nate Woodward
if it loses network connectivity, so that HDR can be re-established
with
Post by Nate Woodward
the old secondary as the new primary. Is there a better way to do
what
Post by Nate Woodward
I'm trying to accomplish?
<snipped>

It is mainly because of dangerous situations like this (dual primaries) that I maintain that there should be a *third* server to monitor both the primary and secondary. Besides monitoring, that third server should also be the one to execute the switchover of the primary duties to the secondary server (when the old primary becomes unresponsive), and should also be the one to re-direct the client traffic to the new primary db server. This is the very successful model that we put in place here in 2004 for a highly critical HDR pair -- obviously, this was *before* MACH 11 and Connection Manager. (for more info, see my presentation from the 2006 NA IIUG/IDUG Conference, available in the member area of iiug.org)

My recommendation would be to put the Connection Manager on a small *third* server, ideally situated very close network-wise to the majority of your client connections. The redundant CM should ideally be on a fourth server, but also located close by client servers. The main point here, of course, is to *NOT* have your CM on the same server as your DBMS.

HTH,
Paul Mosser
Art Kagel
2010-12-08 20:19:31 UTC
Permalink
I agree with Paul, especially if you have the CM managing failover, but even
if not it's a very good idea!

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (***@iiug.org)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.
Post by m***@wellsfargo.com
Post by Nate Woodward
Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an
HDR
Post by Nate Woodward
pair configured with a connection manager on each box for redundancy.
Clients are pointed at an sqlhosts group with the connection managers
in
Post by Nate Woodward
them, and the connection managers are configured to redirect the
clients
Post by Nate Woodward
to the current primary server in the pair. The connection managers
are
Post by Nate Woodward
also configured for failover, with FOC = HDR,10, so that the
secondary
Post by Nate Woodward
takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the
primary
Post by Nate Woodward
if it loses network connectivity, so that HDR can be re-established
with
Post by Nate Woodward
the old secondary as the new primary. Is there a better way to do
what
Post by Nate Woodward
I'm trying to accomplish?
<snipped>
It is mainly because of dangerous situations like this (dual primaries)
that I maintain that there should be a *third* server to monitor both the
primary and secondary. Besides monitoring, that third server should also be
the one to execute the switchover of the primary duties to the secondary
server (when the old primary becomes unresponsive), and should also be the
one to re-direct the client traffic to the new primary db server. This is
the very successful model that we put in place here in 2004 for a highly
critical HDR pair -- obviously, this was *before* MACH 11 and Connection
Manager. (for more info, see my presentation from the 2006 NA IIUG/IDUG
Conference, available in the member area of iiug.org)
My recommendation would be to put the Connection Manager on a small *third*
server, ideally situated very close network-wise to the majority of your
client connections. The redundant CM should ideally be on a fourth server,
but also located close by client servers. The main point here, of course,
is to *NOT* have your CM on the same server as your DBMS.
HTH,
Paul Mosser
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Nate Woodward
2010-12-09 18:03:29 UTC
Permalink
Cesar, I must have missed the FAILOVER_CALLBACK parameter in the docs.
Thanks for pointing it out, I'll take a look.

Speaking more generally to both you and Pravin, I must confess, I
haven't experimented much with DRAUTO set to 0 or 1. The impression I
got from the documentation was that DRAUTO=3 was the correct way to let
the CM handle failover. Guess I'll have to re-check that assumption.

Thanks for the info,
-Nate


-----Original Message-----
From: Pravin Kedia [mailto:***@in.ibm.com]
Sent: Wednesday, December 08, 2010 12:18 PM
To: Cesar Inacio Martins
Cc: informix-list-***@iiug.org; informix-***@iiug.org
Subject: Re: HDR+Connection Manager in intermittent network scenario

Hi Nate,

I guess the manual failover (DRAUTO=0) is the best configuration for
HADR and CM environment for handling the conditions described by you in
you mail below. (dealing with network issues)

Hi Cesar,

This look like a little different behavior than in the earlier version
(I guess it was 11.50.FC5) something. In that version the DRAUTO=0 with
FOC ON was the only way I could get failover to work properly, but what
you described below looks very different.

Thanks & Regards,
Pravin


From: Cesar Inacio Martins <***@yahoo.com.br>
To: informix-***@iiug.org
Date: 12/08/2010 11:29 PM
Subject: Re: HDR+Connection Manager in intermittent network scenario
Sent by: informix-list-***@iiug.org



Nate, you will found information about this on the manuals.
Check for DRAUTO, FAILOVER_CALLBACK...

Just sharing a issues what I got with IFX 11.50 FC7W1GE + CM 3.50 FC7,
Linux.

On environment with similar configuration, using HDR, I have trouble
with CM FOC and DRAUTO configuration.
Situations like, if primary fail, and back later, they don't detect the
secondary is take over and mess all configuration.
(configuring DRAUTO = 0 or 3 , just don't work what I expected...)

The unique configuration what I found to work with HDR with out need to
remake the "lost instance" is DRAUTO=1 and for this I need to disabling
the FOC on CM.
With DRAUTO=1 and FOC=disabled, On my tests, if the primary goes down,
the secondary take over (new primary) immediately, when the original
primary comeback , them automatically resynchronize and the back to
original configuration.

All other configurations (I try a lot), always something happen, the
"lost instance" don't recognize the new primary or they don't
synchronize or they just don't goes up or goes up in standard mode (the
worst situation)....

I still not sure if I missing something when I try to configure this
environment or if this is the default behave with used HDR with CM FOC
(don't detect the primary is back and treat the situation).


Just for curiosity, with this same IFX version, onto other environment,
using *SDS*, all works fine with CM.
All switchover and identification of the new primary by the old primary
or others secondaries...


Regards
Cesar
Post by Nate Woodward
I'm trying to acheive high availability without data loss. I have an
HDR pair configured with a connection manager on each box for
redundancy.
Post by Nate Woodward
Clients are pointed at an sqlhosts group with the connection managers
in them, and the connection managers are configured to redirect the
clients to the current primary server in the pair. The connection
managers are also configured for failover, with FOC = HDR,10, so that
the secondary takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down the
primary if it loses network connectivity, so that HDR can be
re-established with the old secondary as the new primary. Is there a
better way to do what I'm trying to accomplish?
Informix 11.70.UC1GE (Linux 32-bit, although it'll be 64-bit in
production if I get a working setup)
barmgr group - -
primbar1 onsoctcp host1 foobar_1526 g=barmgr
primbar2 onsoctcp host2 foobar_1526 g=barmgr
barcluster group - - i=10
#foobar onsoctcp host1 foobar_1526
g=barcluster
foobar1 onsoctcp host1 foobar1_1527
g=barcluster
Post by Nate Woodward
foobar2 onsoctcp host2 foobar2_1528 g=barcluster
NAME barmgr1
SLA primbar1=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
NAME barmgr2
SLA primbar2=primary
DEBUG 1
LOGFILE connmgr.log
FOC HDR,10
IBM Informix Dynamic Server Version 11.70.UC1GE -- On-Line (Prim) --
Up
02:23:24 -- 354840 Kbytes
Type State Paired server Last DR CKPT
(id/pg)
Post by Nate Woodward
Supports Proxy Writes
primary on foobar2 417 / 40
NA
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
IBM Informix Dynamic Server Version 11.70.UC1GE -- Read-Only (Sec) --
Up
01:46:53 -- 354840 Kbytes
Type State Paired server Last DR CKPT
(id/pg)
Post by Nate Woodward
Supports Proxy Writes
HDR Secondary on foobar1 417 / 40
N
DRINTERVAL -1
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /usr/informix/etc/dr.lostfound
DRIDXAUTO 1
ENCRYPT_HDR 0
Backlog 0
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Nate Woodward
2010-12-09 18:04:23 UTC
Permalink
Thanks for the advice, I'll give it a try.

-Nate

PS: I'm new to this list, and the Informix-list Digest emails I get say
to include the message number in square brackets in the subject line. I
don't see this number anywhere -- am I doing something wrong?


________________________________

From: Art Kagel [mailto:***@gmail.com]
Sent: Wednesday, December 08, 2010 2:20 PM
To: ***@wellsfargo.com
Cc: informix-***@iiug.org
Subject: Re: HDR+Connection Manager in intermittent network scenario



I agree with Paul, especially if you have the CM managing failover, but
even if not it's a very good idea!

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (***@iiug.org)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated
nor those of the entities themselves.
Post by Nate Woodward
Post by Nate Woodward
I'm trying to acheive high availability without data loss. I
have an
Post by Nate Woodward
HDR
Post by Nate Woodward
pair configured with a connection manager on each box for
redundancy.
Post by Nate Woodward
Post by Nate Woodward
Clients are pointed at an sqlhosts group with the connection
managers
Post by Nate Woodward
in
Post by Nate Woodward
them, and the connection managers are configured to redirect
the
Post by Nate Woodward
clients
Post by Nate Woodward
to the current primary server in the pair. The connection
managers
Post by Nate Woodward
are
Post by Nate Woodward
also configured for failover, with FOC = HDR,10, so that the
secondary
Post by Nate Woodward
takes over the primary's job if it fails.
- primary gets disconnected from network
- secondary is brought to primary mode
- primary regains network connectivity
- two primary's are on the network -- what now?
I'm considering writing an ALARMPROGRAM script to bring down
the
Post by Nate Woodward
primary
Post by Nate Woodward
if it loses network connectivity, so that HDR can be
re-established
Post by Nate Woodward
with
Post by Nate Woodward
the old secondary as the new primary. Is there a better way to
do
Post by Nate Woodward
what
Post by Nate Woodward
I'm trying to accomplish?
<snipped>

It is mainly because of dangerous situations like this (dual
primaries) that I maintain that there should be a *third* server to
monitor both the primary and secondary. Besides monitoring, that third
server should also be the one to execute the switchover of the primary
duties to the secondary server (when the old primary becomes
unresponsive), and should also be the one to re-direct the client
traffic to the new primary db server. This is the very successful model
that we put in place here in 2004 for a highly critical HDR pair --
obviously, this was *before* MACH 11 and Connection Manager. (for more
info, see my presentation from the 2006 NA IIUG/IDUG Conference,
available in the member area of iiug.org)

My recommendation would be to put the Connection Manager on a small
*third* server, ideally situated very close network-wise to the majority
of your client connections. The redundant CM should ideally be on a
fourth server, but also located close by client servers. The main point
here, of course, is to *NOT* have your CM on the same server as your
DBMS.

HTH,
Paul Mosser
_______________________________________________
Informix-list mailing list
Informix-***@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list

Loading...