I cannot possibly address the intricacies of each of the
many platforms developers use in building web applications,
but most of the issues you will face have nothing to do with your
underlying choice of platform. Whether written in .NET, Ruby, Java, PHP,
or anything else, web applications share a similar general
architecture—and architecture makes or breaks an application in the
cloud.
Figure 1
illustrates the generic application architecture that web applications
share.
You may move around or combine the boxes a bit, but you are
certain to have some kind of (most often scripting) language that
generates content from a combination of templates and data pulled from a
model backed by a database. The system updates the model through actions
that execute transactions against the model.
1. System State and Protecting Transactions
The defining issue in moving to the cloud is how your
application manages its state. Let’s look at the problem of booking a
room in a hotel.
The architecture from Figure 4-1 suggests that
you have represented the room and the hotel in a model. For the
purposes of this discussion, it does not matter whether you have a
tight separation between model, view, and data, or have mixed them to
some degree. The key point is that there is some representation of the
hotel and room data in your application space that mirrors their
respective states in the database.
How does the application state in the application tier change
between the time the user makes the request and the transaction is
changed in the database?
The process might look something like this basic
sequence:
Lock the data associated with the room.
Check the room to see whether it is currently
available.
If currently available, mark it as “booked” and thus no
longer available.
1.1. The problem with memory locks
You can implement this logic in many different ways, not
all of which will succeed in the cloud. A common Java approach that works well in a single-server
environment but fails in a multiserver context might use the
following code:
public void book(Customer customer, Room room, Date[] days)
throws BookingException {
synchronized( room ) { // synchronized "locks" the room object
if( !room.isAvailable(days) ) {
throw new BookingException("Room unavailable.");
}
room.book(customer, days);
}
}
Because the code uses the Java locking keyword synchronized, no other threads in the current process can make changes
to the room object. If you are on a single server, this code will work
under any load supported by the server. Unfortunately, it will fail
miserably in a multiserver context.
The problem with this example is the memory-based lock that
the application grabs. If you had two clients making two separate
booking requests against the same server, Java would allow only one
of them to execute the synchronized block at a time. As a result,
you would not end up with a double booking.
On the other hand, if you had each customer making a request
against different servers (or even distinct processes on the same
server), the synchronized blocks on each server could execute
concurrently. As a result, the first customer to reach the room.book() call would lose his
reservation because it would be overwritten by the second. Figure 2 illustrates the
double-booking problem.
The non-Java way of expressing the problem is that if your
transactional logic uses memory-based locking to protect the
integrity of a transaction, that transaction will fail in a
multiserver environment—and thus it won’t be able to take advantage
of the cloud’s ability to dynamically scale application
processing.
One way around this problem is to use clustering technologies
or cross-server shared memory systems. Another way to approach the
problem is to treat the database as the authority on the state of
your system.
I can hear a lot of you—especially those of you who have
massively multithreaded applications—cursing me right now. If you
find yourself in a situation in which you use memory-based locking
and reworking the application away from that model is impractical,
you can still move into the cloud. You simply won’t be able to
scale your application across multiple application servers. The way around this concern is to lock everything using a
shared locking mechanism—typically your database engine. The only
other common alternative is to write some kind of home-grown
distributed transaction management system. But why do that when
your database already has one? |
1.2. Transactional integrity through stored procedures
I am not a huge fan of stored procedures. A key benefit of
stored procedures, however, is that they enable you to leverage the
database to manage the integrity of your transactions. After all,
data integrity is the main job of your database engine!
Instead of doing all of the booking logic in Java, you could
leverage a MySQL stored procedure:
DELIMITER |
CREATE PROCEDURE book
(
IN customerId BIGINT,
IN roomId BIGINT,
IN startDate DATE,
IN endDate DATE,
OUT success CHAR(1)
)
BEGIN
DECLARE n DATE;
DECLARE cust BIGINT;
SET success = 'Y';
SET n = startDate;
bookingAttempt:
REPEAT
SELECT customer INTO cust FROM booking
WHERE room_id = roomId AND booking_date = n;
IF cust IS NOT NULL AND cust <> customerId
THEN
SET success = 'N';
LEAVE bookingAttempt;
END IF;
UPDATE booking SET customer = customerId
WHERE room_id = roomId AND booking_date = n;
SET n = DATE_ADD(n, INTERVAL 1 DAY);
UNTIL n > endDate
END REPEAT;
IF success = 'Y' THEN
COMMIT;
ELSE
ROLLBACK;
END IF;
END
|
This method goes through each row of the booking table in your MySQL database and
marks it booked by the specified customer. If it encounters a date
when the room is already booked, the transaction fails and rolls
back.
An example using the stored procedure follows, using Python:
def book(customerId, roomId, startDate, endDate):
conn = getConnection();
c = conn.cursor();
c.execute("CALL book(%s, %s, %s, %s, @success)", \
(customerId, roomId, startDate, endDate));
c.execute("SELECT @success");
row = c.fetchone();
success = row[0];
if success == "Y":
return 1
else:
report 0
Even if you have two different application servers running two
different instances of your Python application, this transaction
will fail, as desired, for the second customer, regardless of the
point at which the second customer’s transaction begins.
1.3. Two alternatives to stored procedures
As I noted earlier, I am not a fan of stored procedures. They
have the advantage of executing faster that the same logic in an
application language. Furthermore, multiserver transaction
management through stored procedures is very elegant. But I have
three key objections:
Stored procedures are not portable from one database to
another.
They require an extended understanding of database
programming—something that may not be available to all
development teams.
They don’t completely solve the problem of scaling
transactions across application servers under all scenarios. You
still need to write your applications to use them wisely, and
the result may, in fact, make your application more
complicated.
In addition to these core objections, I personally strongly
prefer a very strict separation of presentation, business modeling,
business logic, and data.
The last objection is subjective and perhaps a nasty personal
quirk. The first two objections, however, are real problems. After
all, how many of you reading this book have found yourselves stuck
with Oracle applications that could very easily work in MySQL if it
weren’t for all the stored procedures? You are paying a huge Oracle
tax just because you used stored procedures to build your
applications!
The second objection is a bit more esoteric. If you have the
luxury of a large development staff with a diverse skill set, you
don’t see this problem. If you are in a small company that needs
each person to wear multiple hats, it helps to have an application
architecture that requires little or no database programming
expertise.
To keep your logic at the application server level while still
maintaining multiserver transactional integrity, you must either
create protections against dirty writes or create a lock in the
database.
One of the
techniques I feature—and generally recommend for its speed benefits,
whatever your overall architecture—is the use of a last
update timestamp and modifying agent in your updates.
The booking logic from the stored procedure essentially was an
update to the booking
table:
UPDATE booking SET customer = ? WHERE booking_id = ?;
If you add last_update_timestamp and
last_update_user fields, that SQL would operate
more effectively in a multiserver environment:
UPDATE booking
SET customer = ?, last_update_timestamp = ?, last_update_user = ?
WHERE booking_id = ? AND last_update_timestamp = ? AND last_update_user = ?;
In this situation, the first client will attempt to book the
room for the specified date and succeed. The second client then
attempts to update the row but gets no matches since the timestamp
it reads—as well as the user ID of the user on the client—will not
match the values updated by the first client. The second client
realizes it has updated zero rows and subsequently displays an error
message. No double booking!
This approach works well as long as you do not end up
structuring transactions in a way that will create
deadlocks. A deadlock occurs between two transactions when each transaction
is waiting on the other to release a lock. Our reservations system
example is an application in which a deadlock is certainly
possible.
Because we are booking a range of dates in the same
transaction, poorly structured application logic could cause two
clients to wait on each other as one attempts to book a date already
booked by the other, and vice versa. For example, if you and I are
looking to book both Tuesday and Wednesday, but for whatever reason
your client first tries Wednesday and my client first tries Tuesday,
we will end up in a deadlock where I wait on you to commit your
Wednesday booking and you wait on me to commit my Tuesday
booking.
This somewhat contrived scenario is easy to address by making
sure that you move sequentially through each day. Other application
logic, however, may not have as obvious a solution.
Another alternative is to create a field for managing your
locks. The room table, for example, might have two extra columns for
booking purposes: locked_by and
locked_timestamp. Before starting
the transaction that books the rooms, update the room table and
commit the update. Once your booking transaction completes, release
the lock by nulling out those fields prior to committing that
transaction.
Because this approach requires two different database
transactions, you are no longer executing the booking as a single
atomic transaction. Consequently, you risk leaving an open lock that
prevents others from booking any rooms on any dates. You can
eliminate this problem through two tricks:
The room is considered unlocked not only when the fields
are NULL, but also when the
locked_timestamp has been
held for a long period of time.
When updating the lock at the end of your booking
transaction, use the locked_by and locked_timestamp fields in the
WHERE clause. Thus, if
someone else steals a lock out from under you, you only end up
rolling back your transaction.
Both of these approaches are admittedly more complex than
taking advantage of stored procedures. Regardless of what approach
you use, however, the important key for the cloud is simply making
sure that you are not relying on memory locking to maintain your
application state integrity.
2. When Servers Fail
The ultimate architectural objective for the cloud is to
set up a running environment where the failure of any given
application server ultimately doesn’t matter. If you are running just
one server, that failure will obviously matter at some level, but it
will still matter less than losing a physical server.
One trick people sometimes use to get around the problems
described in the previous section is data segmentation—also known
as
sharding. Figure 3 shows how you
might use data segmentation to split processing across multiple
application servers.
In other words, each application server manages a subset of
data. As a result, there is never any risk that another server will
overwrite the data. Although segmentation has its place in scaling
applications, that place is not at the application server in a cloud
cluster. A segmented application server cluster ultimately has a very
low availability rating, as the failure of any individual server does
matter.
The final corollary to all of this discussion of application
state and server failure is that application servers in a cloud cannot
store any state data beyond caching data. In other words, if you need
to back up your application server, you have failed to create a solid
application server architecture for the cloud. All state
information—including binary data—belongs in the database, which must
be on a persistent system.