Yoshinori Matsunobu’s blog: Semi-Synchronous Replication at Facebook
Yoshinori, this is excellent work. Thank you for posting. Do you know if your patches will be picked up by MariaDB, Percona, or Oracle MySQL any time soon?
After intensive testing and hack, we started using Semi-Synchronous MySQL Replication at Facebook production environments. Semi-Synchronous Replication itself was ready since MySQL 5.5 (GA was released 3.5 years ago!), but I’m pretty sure not many people have used in production so far. Here are summary of our objective, enhancements and usage patterns. If you want to hear more in depth, please feel free to ask me at Percona Live this week.
The objective of the Semi-Synchronous Replication is simple — Master Failover without data loss, without full durability.
covers both fully automated and semi-automated MySQL failover solution. Fully automated means both failure detection and slave promotion are done automatically. Semi automated means failure detection is not done but slave promotion is done by one command. Time to detect failure is approximately 10 seconds, and actual failover is taking around 5 to 20 seconds, depending on what you are doing during failover (i.e. forcing power off of the crashed master will take at least a few seconds). Total downtime can be less than 30 seconds, if failover works correctly. I’m using term « Fast Failover » in this post, which includes both automated and semi-automated master failover.
Both mysqlfailover and MHA rely on MySQL replication. MySQL replication is asynchronous. So there is a very serious disadvantage — potential data loss risk on master failover. If you use normal MySQL replication and do automated master failover with MHA/mysqlfailover, you can do failover quickly (a few seconds with MHA), but you always have risks of losing recently committed data.