1. Privacy in the Cloud
The key to privacy in the cloud—or any other environment—is the
strict separation of sensitive data from nonsensitive data followed by
the encryption of sensitive elements. The simplest example is storing
credit cards. You may have a complex e-commerce application storing
many data relationships, but you need to separate out the credit card
data from the rest of it to start building a secure e-commerce
infrastructure.
Note: When I say you need to separate the data, what I mean is that
access to either of the two pieces of your data cannot compromise
the privacy of the data. In the case of a credit card, you need to
store the credit card number on a different virtual server in a
different network segment and encrypt that number. Access to the
first set of data provides only customer contact info; access to the
credit card number provides only an encrypted credit card
number.
Figure 1
provides an application architecture in which credit card data can be securely managed.
It’s a pretty simple design that is very hard to compromise as
long as you take the following precautions:
The application server and credit card server sit in two
different security zones with only web services traffic from the
application server being allowed into the credit card processor
zone.
Credit card numbers are encrypted using a customer-specific
encryption key.
The credit card processor has no access to the encryption
key, except for a short period of time (in memory) while it is
processing a transaction on that card.
The application server never has the ability to read the
credit card number from the credit card server.
No person has administrative access to both servers.
Under this architecture, a hacker has no use for the data on any
individual server; he must hack both servers to gain access to credit
card data. Of course, if your web application is poorly written, no
amount of structure will protect you against that failing.
You therefore need to minimize the ability of a hacker to use
one server to compromise the other.
Make sure the two servers have different attack vectors. In
other words, they should not be running the same software. By
following this guideline, you guarantee that whatever exploit
compromised the first server is not available to compromise the
second server.
Make sure that neither server contains credentials or other
information that will make it possible to compromise the other
server. In other words, don’t use passwords for user logins and
don’t store any private SSH keys on either server.
1.1. Managing the credit card encryption
In order to charge a credit card, you must provide the credit card number, an
expiration date, and a varying number of other data
elements describing the owner of the credit card. You may also be
required to provide a security code.
This architecture separates the basic capture of data from the
actual charging of the credit card. When a person first enters her
information, the system stores contact info and some basic credit
card profile information with the e-commerce application and sends
the credit card number over to the credit card processor for
encryption and storage.
The first trick is to create a password on the e-commerce
server and store it with the customer record. It’s not a password
that any user will ever see or use, so you should generate something
complex using the strongest password guidelines. You should also
create a credit card record on the e-commerce server that stores
everything except the credit card number. Figure 2 shows a sample
e-commerce data model.
With that data stored in the e-commerce system database, the
system then submits the credit card number, credit card password,
and unique credit card ID from the e-commerce system to the credit
card processor.
The credit card processor does not store the password.
Instead, it uses the password as salt to encrypt the credit card
number, stores the encrypted credit card number, and associates it
with the credit card ID. Figure 3 shows the
credit card processor data model.
Neither system stores a customer’s security code, because the
credit card companies do not allow you to store this code.
1.2. Processing a credit card transaction
When it comes time to charge the credit card, the e-commerce
service submits a request to the credit card processor to charge the
card for a specific amount. The e-commerce system refers to the
credit card on the credit card processor using the unique ID that
was created when the credit card was first inserted. It passes over
the credit card password, the security code, and the amount to be charged. The credit
card processor then decrypts the credit card number for the
specified credit card using the specified password. The unencrypted
credit card number, security code, and amount are then passed to the
bank to complete the transaction.
1.3. If the e-commerce application is compromised
If the e-commerce application is compromised, the attacker has
access only to the nonsensitive customer contact info. There is no
mechanism by which he can download that database and access credit
card information or otherwise engage in identity theft. That would
require compromising the credit card processor separately.
Having said all of that, if your e-commerce application is
insecure, an attacker can still assume the identity of an existing
user and place orders in their name with deliveries to their
address. In other words, you still need to worry about the design of
each component of the system.
Note: Obviously, you don’t want intruders gaining access to your
customer contact data either. In the context of this section, my
references to customer contact data as “nonsensitive” is a
relative term. Your objective should be to keep an intruder from
getting to either bit of data.
1.4. If the credit card processor is compromised
Compromising the credit card processor is even less useful
than compromising the e-commerce application. If an attacker gains
access to the credit card database, all he has are random unique IDs
and strongly encrypted credit card numbers—each encrypted with a
unique encryption key. As a result, the attacker can take the
database offline and attempt to brute-force decrypt the numbers, but
each number will take a lot of time to crack and, ultimately,
provide the hacker with a credit card number that has no
individually identifying information to use in identity
theft.
Another attack vector would be to figure out how to stick a
Trojan application on the compromised server and listen for
decryption passwords.
2. When the Amazon Cloud Fails to Meet Your Needs
The architecture I described in the previous section matches
traditional noncloud deployments fairly closely. You may run into
challenges deploying in the Amazon cloud, however, because of a couple
of critical issues involving the processing of sensitive data:
Some laws and specifications impose conditions on the
political and legal jurisdictions where the data is stored. In
particular, companies doing business in the EU may not store
private data about EU citizens on servers in the U.S. (or any
other nation falling short of EU privacy standards).
Some laws and specifications were not written with
virtualization in mind. In other words, they specify physical
servers in cases where virtual servers would do identically well,
simply because a server meant a physical server at the time the
law or standard was written.
The first problem has a pretty clear solution: if you are doing
business in the EU and managing private data on EU citizens, that data
must be handled on servers with a physical presence in the EU, stored
on storage devices physically in the EU, and not pass through
infrastructure managed outside the EU.
Amazon provides a presence in both the U.S. and EU. As a result,
you can solve the first problem by carefully architecting your Amazon
solution. It requires, however, that you associate the provisioning of
instances and storage of data with your data management
requirements.
The second issue is especially problematic for solutions such as
Amazon that rely entirely on virtualization. In this case, however,
it’s for fairly stupid reasons. You can live up to the spirit of the
law or specification, but because the concept of virtualization was
not common at the time, you cannot live up to the letter of the law or
specification. The workaround for this scenario is similar to the
workaround for the first problem.
In solving these challenges, you want to do everything to
realize as many of the benefits of the cloud as possible without
running private data through the cloud and without making the overall
complexity of the system so high that it just isn’t worth it. Cloud
providers such as Rackspace and GoGrid tend to make such solutions easier
than attempting a hybrid solution with Amazon and something
else.
When it comes to laws and standards, what you may think of as
sensitive may not be what the law has in mind. In some cases, for
example, EU privacy laws consider IP addresses to be personally
identifying information. I am not a legal expert and therefore am
not even going to make any attempts to define what information is
private. But once you and your legal advisers have determined what
must be protected as private, follow the general rule in this
section: private data goes into the privacy-protected server and
nonprivate data can go into the cloud. |
To meet this challenge, you must route and store all private
information outside the cloud, but execute as much application logic
as possible inside the cloud. You can accomplish this goal by
following the general approach I described for credit card processing
and abstracting the concepts out into a privacy server and a web
application server:
The privacy server sits outside the cloud and has the
minimal support structures necessary to handle your private
data.
The web application server sits inside the cloud and holds
the bulk of your application logic.
Because the objective of a privacy server is simply to
physically segment out private data, you do not necessarily need to
encrypt everything on the privacy server. Figure 4 illustrates how
the e-commerce system might evolve into a privacy architecture
designed to store all private data outside of the cloud.
As with the cloud-based e-commerce system, you store credit card
data on its own server in its own network segment. The only difference
for the credit card processor is that this time it is outside of the
cloud.
The new piece to this puzzle is the customer’s personally
identifying information. This data now exists on its own server
outside of the cloud, but still separate from credit card data. When
saving user profile information, those actions execute against the
privacy server instead of the main web application. Under no
circumstances does the main web application have any access to
personally identifying information, unless that data is aggregated
before being presented to the web application.
How useful this architecture is depends heavily on how much
processing you are doing that has nothing to do with private data. If
all of your transactions involve the reading and writing of private
data, you gain nothing by adding this complexity. On the other hand,
if the management of private data is just a tiny piece of the
application, you can gain all of the advantages of the cloud for the
other parts of the application while still respecting any requirements
around physical data location.