Better Crash-safe replication for MySQL
At Booking.com, we use MySQL replication extensively for read scaling. However, because we have so many slaves (we have thousands), sooner or later, one of them will fail. In MySQL 5.5 and earlier, replication was not crash-safe. MySQL 5.6 provides crash-safe replication when modifying the default configuration. But this safety comes with several downsides. Read on to find out more about crash-safe replication in MySQL 5.6 and learn how to mitigate its downsides.
MySQL replication involves two tasks on a slave: the IO thread and the SQL thread:
To keep track of its progress, each thread needs to store a state:
In MySQL 5.5 and earlier, those states are persisted in files (see the master-info-fileand relay-log-info-file options for more information about those files). However, the IO thread and the SQL thread cannot make progress and update their state files at the same time (atomically). This means that inconsistencies can be introduced by a crash:
Those two inconsistencies are the cause of the infamous "Duplicate entry for key PRIMARY" errors when restarting a slave after a crash. This error happens when running an INSERT twice. Other errors can also happen on UPDATE and DELETE statements depending on many factors, the list of which is too long and complex to give here. In the worst cases silent data corruption will occur.
To solve inconsistency #2 above, MySQL 5.6 introduces the option to persist the state of the SQL thread in the database. By setting relay-log-info-repository to TABLE (this is not the default option), the SQL thread saves its state at the same time as it commits a transaction. If the storage engine is atomic (like InnoDB), inconsistency in the state of the SQL thread is prevented.
However, it is not possible to avoid inconsistency #1. The IO thread has no way to synchronize its progress (appending events to the relay logs) and its state (saving data in a FILE or a TABLE). The consequence is that it is impossible, once a crash has occurred, to trust both the content of the relay log and the state of the IO thread.
If we cannot trust those, how can MySQL 5.6 provide crash-safe replication? The implemented solution is to:
In other words, all the relay logs are considered un-proper for SQL thread consumption. This behaviour is not enabled by default. To benefit from it, the optionrelay-log-recovery must be set to 1 (the default value is 0).
In summary, to obtain crash-safe replication in MySQL 5.6, one must:
More information about MySQL 5.6 crash-safe replication can be found in the manual.
The problems with MySQL 5.6 Crash-safe replicationSo far, everything looks good, so what are the downsides? We identified a number of problems:
Problem #1 means that all restarts must be done with extra care. If a lagging slave is restarted, it can trigger problems #2, #3, #4, and #5. Having those problems after a crash could be acceptable, but having those after a simple restart is harder to accept.
Problem #2 is the one that worries us most at Booking.com. As presented in a previous post, the network load on our master is a concern for us. The idea of a master network interface being overloaded as a result of a restart/crash of a slave is hard to accept.
A special case of problem #2 is delayed replication. In the case of a restart/crash of a delayed slave the amount of the binary log that needs to be re-downloaded is proportional to the delay of the slave. We have delayed slaves that store tens of GBs of unexecuted relay logs. A restart/crash of those would overload the network interface or a master for minutes. This is unbearable: we must find something better.
Again on problem #2, the restarted/crashed slave could be remote from the master. If the bandwidth of the WAN link is limited, it can be saturated by the slave downloading binary logs from the master. This is also an undesirable effect.
Problem #3 is less of a concern for us, but still not an enjoyable proposition (it might be more of a concern for others).
Problem #4 means that before purging binary logs, a DBA must make sure that they have been executed by all slaves. This is an additional task that DBAs could do without.
Problem #5 is more subtle since it is dependent upon the preference of our DBAs to keep a copy of the relay logs on the slave (setting relay-log-purge to 0). With relay-log-recovery=1, this copy of the relay logs cannot be trusted after a restart or a crash:
The following bugs/feature requests have been logged to trace those problems:
Do not get us wrong. Having a way to make replication crash-safe is a great step forward since MySQL 5.5. Still, some other small steps need to be carried out to have ideal crash-safe replication. For us, ideal crash-safe replication should:
Our wish #2 above can be implemented by adding four variables to the IO thread state. Those variables would indicate the last trusted (synced) position in the relay logs and the corresponding position in the master binary logs. When restarting MySQL, the relay logs would be truncated at that last synced position and the IO thread would be initialized at the corresponding position in the master binary logs. (Note that if the IO thread was stopped correctly, nothing is truncated, which satisfies our wish #1 above.) After the truncation, if the position of the SQL thread is above (or at) the truncation point (IO thread ahead of or at the same position as the SQL thread), nothing special needs to be done. If the position of the SQL thread is after the truncation point, the behaviour will depend on relay_log_purge:
Until ideal crash-safe replication is implemented in a future version of MySQL, we need to mitigate the problems caused by the current implementation (for our deployment at Booking.com, the biggest problem is unnecessarily downloading binary logs from the master). The solution we are currently investigating is to deploy aBinlog Server on our slaves:
This way, we avoid the downsides of MySQL relay log recovery by pushing the problem to the Binlog Server. And because making the Binlog Server crash-safe is trivial, we successfully work-around the downsides of MySQL relay log recovery.
But if we are not careful with this approach, we could end up consuming twice the disk space that is necessary by storing binary logs in the Binlog Server directory, and storing the relay logs in the MySQL directory.
The relay-log-space-limit option can be used to solve this problem. By setting this option to twice the max-relay-log-size (as recommended in the documentation), we put an upper limit on the extra space that can be used. But when using this option, care must be taken to set it right as changing the value will necessitate restarting MySQL [1]. There is a feature request (bug #68236) to make the relay-log-space-limit option dynamic. Hopefully this will be implemented soon.
ConclusionAs promised in our last post, we presented another case for using the Binlog Server. Hopefully, this will be a temporary use that will become obsolete. When a better relay log recovery is implemented in MySQL, we will not need to deploy additional software on our MySQL servers.
We have yet another instance where using the Binlog Server is beneficial. I will present this at Percona Live London in my talk: High Availability, Disaster Recovery and Extreme Read Scaling using Binlog Servers.
[1]: When restarting MySQL to change the option, care must be taken not to restart MySQL with relay-log-recovery=1 as this could cause the unnecessary re-downloading of binary logs. With a local Binlog Server, this becomes less of a concern.
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!