PECL/mysqlnd_ms: how failed XA transactions get fixed | Ulf WendelUlf Wendel
XA transactions are an open standard for distributed transactions. A distributed or global transaction can spawn multiple databases. XA transactions come handy, for example, when data sets are sharded over multiple servers and a business transaction affects multiple shards. The free MySQL Fabric tools help to setup and manage a sharded MySQL cluster. The development version of PECL/mysqlnd_ms 1.6 helps with XA transactions. It abstracts SQL details and acts as a transaction manager. The PHP MySQL driver kicks in when things go wrong and XA transactions show their nasty side: blocked servers. Good news: this is a rare case. Bad news: a deep dive below.
XA transactions use the two-phase commit protocol, which is a blocking protocol. Please, see also my previous blog post on the nature of the protocol and MySQL implementation limitation. If the client that drives the XA transaction, your PHP script, crashes at a certain point, some XA participants (MySQL servers) cannot make any progress. In the worst case, they end up waiting for a decision on the global transactions outcome endlessly. No, there is no timeout. As they wait, they block resources. That can be memory used for the transaction or some lock on some table.
Any serious user of XA transactions will therefore have to implement some mechanism that ensures progress in case of crashes. After a crash, it must be possible to learn which participant is blocked, connect to the participant and tell it to either commit or roll back the open transaction. This housekeeping job is rather annoying, yet important. PECL/mysqlnd_ms can do it for you, it can act as a transaction manager. (On an aside: the academic world does distinguish between a transaction manager and coordinator. I am using the terms interchangeably here.)
Upon request, PECL/mysqlnd_ms can record the state of each global transaction in a MySQL database. Should your PHP script (and with it PECL/mysqlnd_ms) crash or be interrupted in another way with an XA transaction being unfinished, then the next PHP script that runs can check the database and “garbage collect” the unfinished global transaction. The “next” PHP script could be run on the same server or another one, as long as all servers use the same MySQL database to track XA transactions, the garbage collection will work flawless.
The illustration below shows a scenario with two PHP servers that use PECL/mysqlnd_ms. A script on one of the servers runs and begins a distributed (XA) transactions that involves two MySQL servers M1, M2. As you can see, a simple sequence of mysqlnd_ms_xa_begin(); … ; mysqli_query(); mysqli_query() causes a lot of background activity inside PECL/mysqlnd_ms.