I'm studying how two-phase commit works across a distributed transaction. It is indeed my knowning that within the last area of the phase the transaction coordinator asks each node whether it's prepared to commit. If everybody agreed, it informs these to proceed and commit.

What prevents the next failure?

  1. All nodes respond that they're prepared to commit
  2. The transaction coordinator informs these to "proceed and commit" but among the nodes crashes before receiving this message
  3. Other nodes commit effectively, however the distributed transaction is corrupt
  4. It is indeed my knowning that once the crashed node returns, its transaction may have been folded back (because it never got the commit message)

I'm presuming each node is managing a normal database that does not know anything about distributed transactions. What did I miss?

No, they aren't expected to roll back because within the original poster's scenario, a few of the nodes have previously committed. Ultimately once the crashed node opens up, the transaction coordinator informs it to commit again.

Since the node responded positively within the "prepare" phase, it's needed to have the ability to "commit", even if it returns from the crash.

No. Point 4 is incorrect. Each node records in stable storage it could commit or rollback the transaction, to ensure that it'll have the ability to do as commanded even across crashes. Once the crashed node returns up, it has to understand that it features a transaction in pre-commit condition, reinstate any relevant locks or any other controls, after which make an effort to contact the coordinator site to gather the status from the transaction.

The issues only occur when the crashed node never returns up (then anything else thinks the transaction was OK, or is going to be once the crashed node returns).

Two phase commit is not foolproof and it is just made to operate in the 99% of times cases.

"The protocol assumes that there's stable storage each and every node having a write-ahead log, that no node crashes forever, the data within the write-ahead log isn't lost or corrupted inside a crash, which any two nodes can contact one another."


You will find many different ways to fight the issues with two-phase commit. The majority of them find yourself as some variant from the Paxos three-phase commit formula. Mike Burrows, who designed the Chubby lock service at Google which is dependant on Paxos, stated that you will find two kinds of distributed commit calculations - "Paxos, and incorrect ones" - inside a lecture I saw.

One factor the crashed node could do, if this reawakes, is say "I never learned about this transaction, should it happen to be committed?" towards the coordinator, that will tell it exactly what the election was.

Keep in mind this is one particular more general problem: the crashed node could miss many transactions before it rebounds. Therefore it's terribly essential that upon recovery it will talk with the idea to the coordinator or any other replica prior to making itself available. When the node itself can't tell whether it's crashed, then things have more involved but nonetheless tractable.

If you are using a quorum system for database reads, the inconsistency is going to be masked (making recognized to the database itself).

Outlining everyone's solutions:

  1. One cannot use normal databases with distributed transactions. The database must clearly support a transaction coordinator.

  2. The nodes aren't expected to roll back because a few of the nodes have previously committed. Ultimately that after the crashed node returns, the transaction coordinator informs it to commit again.