My SQL : Replication for High Availability - Procedures (part 3) - Dual Masters - Shared disks

11/24/2013 6:36:24 PM

2. Dual Masters

One frequently mentioned setup for high availability is the dual masters topology. In this setup, two masters replicate each other to keep both current. This setup is very simple to use since it is symmetric. Failing over to the standby master does not require any reconfiguration of the main master, and failing back to the main master again when the standby master fails in turn is very easy.

Servers can be either active or passive. If a server is active it means that the server accepts writes, which are likely to be propagated elsewhere using replication. If a server is passive, it does not accept writes and is just following the active master, usually to be ready to take over when it fails.

When using dual masters, there are two different setups, each serving a different purpose:


In an active-active setup, writes go to both servers, which then transfer changes to the other master.


In this setup, one of the masters, called the active master, handles writes while the other server, called the passive master, just keeps current with the active master.

This is almost identical to the hot standby setup, but since it is symmetric, it is easy to switch back and forth between the masters, each taking turns being the active master.

Note that this setup does not necessarily let the passive master answer queries. For some of the solutions that you’ll see in this section, the passive master is a cold standby.

These setups do not necessarily mean that replication is used to keep the servers synchronized—there are other techniques that can serve that purpose. Some techniques can support active-active masters, while other techniques can only support active-passive masters.

The most common use of an active-active dual masters setup is to have the servers geographically close to different sets of users, for example, in offices at different places in the world. The users can then work with the local server, and the changes will be replicated over to the other master so that both masters are kept in sync. Since the transactions are committed locally, the system will be perceived as more responsive. It is important to understand that the transactions are committed locally, meaning that the two masters are not consistent in the sense that they have the same information. The changes committed to one master will be propagated to the other master eventually, but until that has been done, the masters have inconsistent data.

This has two main consequences that you need to be aware of:

  • If the same information is updated on the two masters—for example, a user is accidentally added to both masters—there will be a conflict between the two updates and it is likely that replication will stop.

  • If a crash occurs while the two masters are inconsistent, some transactions will be lost.

To some extent, you can avoid the problem with conflicting changes by allowing writes to only one of the servers, thereby making the other master a passive master. This is called an active-passive setup—where the active server is called the primary and the passive server is called the secondary.

Losing transactions when the server crashes is an inevitable result of using asynchronous replication, but depending on the application, it does not necessarily have to be a serious problem. You can limit the number of transactions that are lost when the server crashes by using a new feature in MySQL 5.5 called semisynchronous replication. The idea behind semisynchronous replication is that the thread committing a transaction will block until at least one slave acknowledges that it has received the transaction. Since the events for the transaction are sent to the slave after the transaction has been committed to the storage engine, the number of lost transactions can be kept down to at most one per thread.

Similar to the active-active approach, the active-passive setup is symmetrical and therefore allows you to switch easily from the main master to the standby and back. Depending on the way you handle the mirroring, it may also be possible to use the passive master for administrative tasks such as upgrading the server and use the upgrade server as the active master once the upgrade is finished without any downtime at all.

One fundamental problem that has to be resolved when using an active-passive setup is the risk of both servers deciding that they are the primary master—this is called the split-brain syndrome. This can occur if network connectivity is lost for a brief period, long enough to have the secondary promote itself to primary, but then the primary is brought online again. If changes have been made to both servers while they are both in the role of primary, there may be a conflict. In the case of using a shared disk, simultaneous writes to the disks by two servers are likely to cause “interesting” problems with the database—that is, probably disastrous and difficult to pinpoint.

2.1. Shared disks

A straightforward dual masters approach is shown in Figure 4, where a pair of masters is connected using a shared disk architecture such as a SAN (storage area network). In this approach, both servers are connected to the same SAN and are configured to use the same files. Since one of the masters is passive, it will not write anything to the files while the active master is running as usual. If the main server fails, the standby will be ready to take over.

Figure 4. Dual masters using a shared disk

The advantage of this approach is that since the binlog files are stored on a shared disk, there is no need for translating binlog positions. The two servers are truly mirror images of each other, but they are running on two different machines. This means that switching over from the main master to the standby is very fast. There is no need for the slaves to translate positions to the new master; all that is necessary is to note the position where the slave stopped, issue a CHANGE MASTER command, and start replication again.

When you fail over using this technique, you have to perform recovery on the tables, since it is very likely updates were stopped midstream. Each storage engine behaves differently in this situation. For example, InnoDB has to perform a normal recovery from the transaction log, as it would in the event of a crash, whereas if you use MyISAM you probably have to repair the tables before being able to continue operation. Of these two choices, InnoDB is preferred because recovery is significantly faster than repairing a MyISAM table.

Example 2 shows a Python script for handling such a failover using the Replicant library. Notice that the position uses the server ID of the main server, but since both servers are using the same files, the standby server is really a mirror image of the main server. Since the position contains the server ID as well, this will also catch any mistakes made by the user, such as passing a master that is not a mirror image of the main master.

Example 2. Procedure to remaster a slave when using a shared disk
def remaster_slave(slave, master):
position = fetch_slave_position(slave)
change_master(slave, master, position)

The ability to set up dual masters using shared disks is dependent on the shared storage solution used.

The problem with using shared storage is that since the two masters are using the same files for storing data, you have to be very careful when doing any administrative tasks on the passive master. Overwriting the configuration files, even by mistake, can be fatal.

  •  My SQL : Replication for High Availability - Redundancy, Planning
  •  SQL Server 2012 : Exploring SQL CLR - Deployment (part 2) - Deploying Your Stored Procedures, Testing Your Stored Procedures
  •  SQL Server 2012 : Exploring SQL CLR - Deployment (part 1) - Deploying Your Assembly
  •  SQL Server 2012 : Exploring SQL CLR - Your First SQL CLR Stored Procedure, CLR Stored Procedures and Server-Side Data Access
  •  SQL Server 2012 : Exploring SQL CLR - Visual Studio/SQL Server Integration
  •  SQL Server 2008 : DBCC validation - Removing corruption
  •  SQL Server 2008 : Mirroring in action (part 3) - Suspending and resuming mirroring, Initiating failover
  •  SQL Server 2008 : Mirroring in action (part 2) - Monitoring database mirroring
  •  SQL Server 2008 : Mirroring in action (part 1) - Mirroring setup
  •  SQL Server 2008 : High availability with database mirroring - Failover options
    Top 10
    SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
    The latest Audi TT : New angles for TT
    Era of million-dollar luxury cars
    Game Review : Hearthstone - Blackrock Mountain
    Game Review : Battlefield Hardline
    Google Chromecast
    Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
    Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
    Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
    Michael Kors Designs Stylish Tech Products for Women
    - First look: Apple Watch

    - 3 Tips for Maintaining Your Cell Phone Battery (part 1)

    - 3 Tips for Maintaining Your Cell Phone Battery (part 2)
    Popular Tags
    Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone