Two indirect benefits of the cloud are:
Thanks to the way virtualized servers launch from machine images, your first step in moving
into any cloud infrastructure is to create a repeatable deployment
process that handles all the issues that could come up as the system
starts up. To ensure that it does, you need to do some deployment
planning.
The machine image (in Amazon, the AMI) is a raw copy of your
operating system and core software for a particular environment on a
specific platform. When you start a virtual server, it copies its
operating environment from the machine image and boots up. If your
machine image contains your installed application, deployment is nothing
more than the process of starting up a new virtual instance.
1. Amazon Machine Image Data Security
When you create an Amazon machine image, it is encrypted and stored in
an Amazon S3 bundle. One of two keys can subsequently decrypt the AMI:
Your Amazon key
A key that Amazon holds
Only your user credentials have access to the AMI. Amazon needs
the ability to decrypt the AMI so it can actually boot an instance
from the AMI.
Even though your AMI is encrypted, I strongly recommend never
storing any sensitive information in an AMI. Not only does Amazon
have theoretical access to decrypt the AMI, but there also are
mechanisms that enable you to make your AMI public and thus perhaps
accidentally share whatever sensitive data you were maintaining in
the AMI. For example, if one company sues another Amazon customer, a
court may subpoena the other Amazon customer’s data.
Unfortunately, it is not uncommon for courts to step outside the
bounds of common sense and require a company such as Amazon to make
available all Amazon customer data. If you want to make sure your
data is never exposed as the result of a third-party subpoena, you
should not store that data in an Amazon AMI. Instead, encrypt it separately and load it into your instance
at launch so that Amazon will not have the decryption keys and thus the data cannot be accessed, unless you are
a party to the subpoena. |
2. What Belongs in a Machine Image?
A machine image should include all of the software necessary for
the runtime operation of a virtual instance based on that image
and nothing more. The starting point is obviously
the operating system, but the choice of components is absolutely
critical. The full process of establishing a machine image consists of the following steps:
Create a component model that identifies what components and
versions are required to run the service that the new machine
image will support.
Separate out stateful data in the component model. You will
need to keep it out of your machine image.
Identify the operating system on which you will
deploy.
Search for an existing, trusted baseline public machine
image for that operating system.
Harden your system using a tool such as Bastille.
Install all of the components in your component
model.
Verify the functioning of a virtual instance using the
machine image.
Build and save the machine image.
The starting point is to know exactly what components are
necessary to run your service. Figure 1 shows a
sample model describing the runtime components for a MySQL
database server.
In this case, the stateful data exists in the MySQL directory,
which is externally mounted as a block storage device. Consequently,
you will need to make sure that your startup scripts mount your block
storage device before starting MySQL.
Because the stateful data is assumed to be on a block storage
device, this machine image is useful in starting
any MySQL databases, not just a specific set of
MySQL databases.
The services you want to run on an instance generally dictate
the operating system on which you will base the machine image. If you
are deploying a .NET application, you probably will use one of the
Amazon Windows images. A PHP application, on the other hand, probably will be targeting a Linux
environment. Either way, I recommend searching some of the more
trusted prebuilt, basic AMIs for your operating system of choice and
customizing from there.
Warning: Avoid using “kitchen sink” Linux distributions. Each machine
image should be a hardened operating system and have only the tools
absolutely necessary to serve its function.
Hardening an operating system is the act of minimizing attack
vectors into a server. Among other things, hardening involves the
following activities:
Removing unnecessary services.
Removing unnecessary accounts.
Running all services as a role account (not root) when
possible.
Running all services in a restricted jail when
possible.
Verifying proper permissions for necessary system
services.
The best way to harden your Linux system is to use a proven
hardening tool such as Bastille.
Now that you have a secure base from which to operate, it is
time to actually install the software that this system will support.
In the case of the current example, it’s time to install MySQL.
When installing your server-specific services, you may have to
alter the way you think about the deployment thanks to the need to
keep stateful data out of the machine image. For a MySQL server, you
would probably keep stateful data on a block device and mount it at
system startup. A web server, on the other hand, might store stateful
media assets out in a cloud storage system such as Amazon S3 and pull
it over into the runtime instance on startup.
Different applications will definitely require different
approaches based on their unique requirements. Whatever the situation,
you should structure your deployment so that the machine image has the
intelligence to look for its stateful data upon startup and provide
your machine image components with access to that data before they
need it.
Once you have the deployment structured the right way, you will
need to test it. That means testing the system from launch through
shutdown and recovery. Therefore, you need to take the following
steps:
Build a temporary image from your development
instance.
Launch a new instance from the temporary image.
Verify that it functions as intended.
Repeat until the process is robust and reliable.
At some point, you will end up with a functioning instance from
a well-structured machine image. You can then build a final instance
and go have a beer (or coffee).
3. A Sample MySQL Machine Image
The trick to creating a machine image that supports database servers
is knowing how your database engine of choice stores its data. In the
case of MySQL, the database engine has a data directory for its stateful data.
This data directory may actually be called any number of things
(/usr/local/mysql/data, /var/lib/mysql, etc.), but it is the only
thing other than the configuration file that must be separated from
your machine image. In a typical custom build, the data directory is
/usr/local/mysql/data.
Note: If you are going to be supporting multiple machine images, it
often helps to first build a hardened machine image with no services
and then build each service-oriented image from that base.
Once you start an instance from a standard image and harden it,
you need to create an elastic block storage volume and mount
it. The standard Amazon approach is to mount the volume off of /mnt
(e.g., /mnt/database). Where you
mount it is technically unimportant, but it can help reduce confusion
to keep the same directory for each image.
You can then install MySQL, making sure to install it within the
instance’s root filesystem (e.g., /usr/local/mysql ). At that point, move the
data over into the block device using the following steps:
Stop MySQL if the installation process automatically started
it.
Move your data directory over into your mount and give it a
name more suited to mounting on a separate device (e.g., /mnt/database/mysql ).
Change your my.cnf file
to point to the new data directory.
You now have a curious challenge on your hands: MySQL cannot
start up until the block device has been mounted, but a block device
under Amazon EC2 cannot be attached to an instance of a virtual
machine until that instance is running. As a result, you cannot start
MySQL through the normal boot-up procedures. However, you can end up
where you want by enforcing the necessary order of events: boot the
virtual machine, mount the device, and finally start MySQL. You should
therefore carefully alter your MySQL startup scripts so that the
system will no longer start MySQL on startup, but will still shut the
MySQL engine down on shutdown.
Warning: Do not simply axe MySQL from your startup scripts. Doing so
will prevent MySQL from cleanly shutting down when you shut down
your server instance. You will thus end up with a corrupt database
on your block storage device.
The best way to effect this change is to edit the MySQL startup
script to wait for the presence of the MySQL data directory before
starting the MySQL executable.
4. Amazon AMI Philosophies
In approaching AMI design, you can follow one of two core
philosophies:
I am a strong believer in the minimalist approach. The
minimalist approach has the advantage of being easier for rolling out
security patches and other operating-system-level changes. On the flip
side, it takes a lot more planning and EC2 skills to structure a
multipurpose AMI capable of determining its function after startup and
self-configuring to support that function. If you are just getting
started with EC2, it is probably best to take the comprehensive
approach and use cloud management tools to eventually help you evolve
into a library of minimalist machine images.
For a single application installation, you won’t likely need
many machine images, and thus the difference between a comprehensive
approach and a minimalist approach is negligible. SaaS
applications—especially ones that are not multitenant—require a
runtime deployment of application software.
Runtime deployment means uploading the application software—such
as the MySQL executable discussed in the previous section—to a newly
started virtual instance after it has started, instead of embedding it
in the machine image. A runtime application deployment is more complex
(and hence the need for cloud management tools) than simply including
the application in the machine image, but it does have a number of
major advantages:
You can deploy and remove applications from a virtual
instance while it is running. As a result, in a multiapplication
environment, you can easily move an application from one cluster
to another.
You end up with automated application restoration. The
application is generally deployed at runtime using the latest
backup image. When you embed the application in an image, on the
other hand, your application launch is only as good as the most
recent image build.
You can avoid storing service-to-service authentication
credentials in your machine image and instead move them into the
encrypted backup from which the application is deployed.