High Availability with DAGs in Exchange 2010

High AvailabilityWhen I’m approached by persons running Exchange 2003 or 2007 in their environment and they ask me what I think is the strongest reason to transition toward Exchange 2010 I tell them it’s the High Availability features through Database Availability Groups (DAGs). For those running Exchange 2003 it is a no brainer to move to 2010.  For those running 2007, which happens to have some of the rudimentary pieces to DAGs it may not be as obvious on why they should move just yet.

The underlying concept for providing high availability in Exchange 2010 is through a process called “continuous replication”.  This is where a copy of the active database is created (called a passive copy) and then the 1 MB transaction logs are shipped over and replayed into the passive copy.   With Exchange 2003 this wasn’t an option at all.  With Exchange 2007 there are several flavors of continuous replication (LCR, CCR and SCR) but these have all been eliminated with 2010.  Exchange 2010 uses a DAG to provide high availability.

The way you set up a DAG is to first create the DAG for the Exchange organization.  The group itself is just an empty place where members can be added.  You add servers as members and servers can only belong to one DAG at a time.  Then you can create passive copies of the databases from one DAG member to another.  You can have up to 16 different servers in the DAG and thus multiple passive copies in play.  The value here is that you can have passive copies in your local site as well as in a secondary site in the event of a site failure.  You can have passive copies that are lagged (aka lagged copies) where the transaction logs are delayed for a specified period of time to ensure database corruptions or possible discovered viruses are not copied over.

The interesting and oft confusing side to high availability with DAGs in Exchange 2010 is the importance of quorum.  It’s essential that there be a majority of DAG members available to ensure quorum is maintained.  This prevents a problem with clustered systems called “split brain syndrome” where one server wants to take control because of a perceived failure on the part of the active server.  Without a referee to ensure quorum you might have 2 active copies up at the same time.  However, by using a third server in the mix, which can be a Mailbox server as part of the DAG or a witness server that has a shared folder for use by the DAG, quorum is maintained.

DAG members, as mentioned, can be local to a site, or span sites and in spanning sites there is more to consider.  One important facet to DAGs that span sites is the need to enable DAC mode (datacenter activation mode).  This assists with problems when a site loses power and the WAN link goes down as well.  Another important consideration is that of needing to manually switchover (rather than have automatic failover) should a site go down.

If it sounds like a lot to learn, well that’s true.  But it’s worth knowing what is built right into Exchange 2010 to help provide greater high availability.  Utilizing high availability with DAGs in Exchange 2010 it is possible to eliminate the “disaster” in disaster recovery.

Some great articles to consider from TechNet and the Exchange Team Blog include the following:

Exchange 2010 High Availability Misconceptions Addressed

http://blogs.technet.com/b/exchange/archive/2011/05/31/exchange-2010-high-availability-misconceptions-addressed.aspx

Planning for High Availability and Site Resilience

http://technet.microsoft.com/en-us/library/dd638104

Database Availability Group Design Examples

 http://technet.microsoft.com/en-us/library/dd979781.aspx


Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>