2. Dual Masters
One frequently mentioned setup for high availability is the dual masters
topology. In this setup, two masters replicate each other to keep both
current. This setup is very simple to use since it is symmetric. Failing
over to the standby master does not require any reconfiguration of the
main master, and failing back to the main master again when the standby
master fails in turn is very easy.
Servers can be either active or passive. If a server is active it
means that the server accepts writes, which are likely to be propagated
elsewhere using replication. If a server is passive, it does not accept
writes and is just following the active master, usually to be ready to
take over when it fails.
When using dual masters, there are two different setups, each
serving a different purpose:
Active-active
In an active-active setup, writes go to both servers,
which then transfer changes to the other master.
Active-passive
In this setup, one of the masters, called the active master,
handles writes while the other server, called the passive master,
just keeps current with the active master.
This is almost identical to the hot standby setup, but since
it is symmetric, it is easy to switch back and forth between the
masters, each taking turns being the active master.
Note that this setup does not necessarily let the passive
master answer queries. For some of the solutions that you’ll see
in this section, the passive master is a cold standby.
These setups do not necessarily mean that replication is used to
keep the servers synchronized—there are other techniques that can serve
that purpose. Some techniques can support active-active masters, while
other techniques can only support active-passive masters.
The most common use of an active-active dual masters setup is to have the servers
geographically close to different sets of users, for example, in offices
at different places in the world. The users can then work with the local
server, and the changes will be replicated over to the other master so
that both masters are kept in sync. Since the transactions are committed
locally, the system will be perceived as more responsive. It is
important to understand that the transactions are committed locally,
meaning that the two masters are not consistent in the sense that they
have the same information. The changes committed to one master will be
propagated to the other master eventually, but until that has been done,
the masters have inconsistent data.
This has two main consequences that you need to be aware
of:
If the same information is updated on the two masters—for
example, a user is accidentally added to both masters—there will be
a conflict between the two updates and it is likely that replication
will stop.
If a crash occurs while the two masters are inconsistent, some
transactions will be lost.
To some extent, you can avoid the problem with conflicting changes
by allowing writes to only one of the servers, thereby making the other
master a passive master. This is called an active-passive setup—where the active server is called
the primary and the passive server is
called the secondary.
Losing transactions when the server crashes is an inevitable
result of using asynchronous replication, but depending on the
application, it does not necessarily have to be a serious problem. You
can limit the number of transactions that are lost when the server
crashes by using a new feature in MySQL 5.5 called semisynchronous replication. The
idea behind semisynchronous replication is that the thread committing a transaction will block until at least
one slave acknowledges that it has received the transaction. Since the
events for the transaction are sent to the slave after the transaction
has been committed to the storage engine, the number of lost
transactions can be kept down to at most one per thread.
Similar to the active-active approach, the active-passive setup is
symmetrical and therefore allows you to switch easily from the main
master to the standby and back. Depending on the way you handle the
mirroring, it may also be possible to use the passive master for
administrative tasks such as upgrading the server and use the upgrade
server as the active master once the upgrade is finished without any
downtime at all.
One fundamental problem that has to be resolved when using an
active-passive setup is the risk of both servers deciding that they are
the primary master—this is called the split-brain syndrome. This can occur
if network connectivity is lost for a brief period, long enough to have
the secondary promote itself to primary, but then the primary is brought
online again. If changes have been made to both servers while they are
both in the role of primary, there may be a conflict. In the case of
using a shared disk, simultaneous writes to the disks by two servers are
likely to cause “interesting” problems with the database—that is,
probably disastrous and difficult to pinpoint.
2.1. Shared disks
A straightforward dual masters approach is shown in Figure 4, where a pair of masters
is connected using a shared disk architecture such as a SAN (storage
area network). In this approach, both servers are connected to the
same SAN and are configured to use the same files. Since one of the
masters is passive, it will not write anything to the files while the
active master is running as usual. If the main server fails, the
standby will be ready to take over.
The advantage of this approach is that since the binlog files are stored on a shared disk, there is no
need for translating binlog positions. The two servers are truly
mirror images of each other, but they are running on two different
machines. This means that switching over from the main master to
the standby is very fast. There is no need for the slaves to translate
positions to the new master; all that is necessary is to note the
position where the slave stopped, issue a CHANGE MASTER
command, and start replication again.
When you fail over using this technique, you have to perform
recovery on the tables, since it is very likely updates were stopped
midstream. Each storage engine behaves differently in this situation.
For example, InnoDB has to perform a normal recovery from the
transaction log, as it would in the event of a crash, whereas if you
use MyISAM you probably have to repair the tables before being able to
continue operation. Of these two choices, InnoDB is preferred because
recovery is significantly faster than repairing a MyISAM table.
Example 2
shows a Python script for handling such a failover using the
Replicant library. Notice that the position uses the
server ID of the main server, but since both servers are using the
same files, the standby server is really a mirror image of the main
server. Since the position contains the server ID as well, this will also catch any mistakes
made by the user, such as passing a master that is not a mirror image
of the main master.
Example 2. Procedure to remaster a slave when using a shared
disk
def remaster_slave(slave, master): position = fetch_slave_position(slave) change_master(slave, master, position)
|
The ability to set up dual masters using shared disks is
dependent on the shared storage solution used.
The problem with using shared storage is that since the two
masters are using the same files for storing data, you have to be very
careful when doing any administrative tasks on the passive master.
Overwriting the configuration files, even by mistake, can be
fatal.