A transaction is a single operation or set of operations
that succeed or fail together as a whole, thereby ensuring consistency
of data should unforeseen circumstances arise. A classic scenario is
the typical financial transaction. For example, let’s say you buy a
car. The single transaction of buying a car consists of three distinct
operations:
You select a car.
You pay for it.
You drive the car off the lot.
Skipping
any of these steps could cause major angst to one or more of the
parties involved. This is a simple example of a set of steps that must
always occur together in a consistent manner.
Transactions
allow you to ensure consistency in your data through four basic
principles. These principles provide a set of rules that must be
followed for a transaction to succeed. The four principles help ensure
that the state of your data is atomic, consistent, isolated, and
durable, regardless of the success or failure of the transaction. Let’s
examine these properties now.
Understanding the ACID Properties
Consider
an ATM transaction where you withdraw $100 from your bank account. The
data for this transaction can be represented in a database as a table
with two columns, one holding your AccountId and the other holding your AccountBalance.
To
begin with, your account balance is $100, so after withdrawing $100,
your updated balance should be zero. Also, it makes logical sense that
the system must ensure that you have the funds available in your
account before the cash can be dispensed. What this means in database
terms is that two database queries must be run in this transaction. The
first query checks the account balance, as shown here:
SELECT AccountBalance FROM Account WHERE AccountId = @AccountId
If the query returns an AccountBalance value greater than or equal to the requested withdrawal, you can withdraw the cash.
After withdrawing the cash, you must update the account record with the updated balance. To do so, you run an UPDATE query:
UPDATE Account
SET AccountBalance -= 100
WHERE AccountId = @AccountId
The
two distinct operations in this transaction are the two database
queries that support the one operation of withdrawing cash. Both must
succeed or fail together in an atomic manner, or the transaction should
not be considered complete. Atomicity is the first ACID property.
Now
let’s change the nature of the transaction a bit. Let’s say that the
original account balance is $150 and that within the same transaction,
the user requests to withdraw $100 and to transfer another $75 to a
second account. The first update query will succeed, changing the
account balance to $150 – $100 = $50. But the second operation will
fail because there won’t be enough money left in the account to
transfer the $75. You therefore need a way to undo the cash withdrawal
and return the database to its original state. You cannot leave the
database midway through a transaction because it is in an inconsistent
state. In a real-world scenario, you would not normally wrap a
withdrawal and a transfer in the same transaction, but this was just a
simple example to show how data can end up in an inconsistent state
between multiple operations. Rolling back operations that cannot be
completed in this manner demonstrates the second ACID property, consistency.
Let’s say now that the withdrawal and transfer operations are separated into two distinct transactions
instead of one, but that they happen to run simultaneously. Each
transaction will have to check the current balance by attempting to
execute a query like this:
SELECT AccountBalance FROM Account WHERE AccountId = @AccountId
Unless
your system has explicit checks blocking concurrent reads, both
transactions will get the same result: $150. Thus they will both assume
that the account has enough funds for the transaction. One transaction
will disburse $100, and the other will transfer $75. The result will be
an overall deduction of $100 + $75 = $175, even though the account
actually has only $150 available. In many systems, especially financial
applications, such transactions must be isolated from each other to
prevent what is known as a “dirty read.” A dirty read happens when data
is read at one point in a transition state and the result of the query
doesn’t reflect the data’s true state at the end of the current
operation. This leads to the third ACID property, isolation.
Isolation
means that other transactions attempting to request a common resource
will be blocked. Blocking, in turn, seriously affects the response
times of your application. As it turns out, you’ll often want to relax
this blocking behavior to suit your application architecture.
Lastly,
when you have successfully completed all your operations within a
transaction, you don’t want to lose the changes made. In other words,
system failures must not affect the transactional integrity of your
operations. This relates to the fourth ACID property, durability. Durability
means that the systems involved with the transaction will faithfully
retain the correct transacted state even if the system sustains
catastrophic failure nanoseconds after the transaction completes.
Conversely, if the transaction isn’t completed because of system
failure, it won’t be allowed to complete (or be undone) until the
system is reset and the application restarted. Transacted steps are
retained and the application can resume operations where it left off
with no ill effects (at least from a data consistency perspective).
Let’s quickly summarize the four ACID properties:
Atomicity Operations succeed or fail together. Unless all steps succeed, the transaction cannot be considered complete.
Consistency
Operations leave the database in a consistent state. The transaction
takes the underlying database from one stable state to another, with no
rules violated before the beginning or after the end of the transaction.
Isolation
Every transaction is an independent entity. One transaction will not
affect any other transaction that is running at the same time.
Durability
Every transaction is persisted to a reliable medium that cannot be
undone by system failures. Furthermore, if a system failure does occur
in the middle of a transaction, either the completed steps must be
undone or the uncompleted steps must be executed to finish the
transaction. This typically happens by use of a log that can be played
back to return the system to a consistent state.
Important
A
transaction can work with a single resource, such as a database, or
multiple resources, such as multiple databases or message queues.
Transactions limited to a single resource are referred to as local transactions, and transactions that span multiple resources are called distributed transactions.