Performing
trustworthy backups is a critical process in any Exchange Server
environment. One of the simplest ways to ensure that your backups are
done properly is to document your requirements and your processes.
A mechanism needs to be in
place to track the success of backups and a process to follow if a
backup fails. Sticking to this process and not conflicting with the set
policies ensures that backups are valid and recoverable if a failure
occurs.
Companies that are
publicly traded follow a set of rules around documentation of processes
and proof of following those processes. This is primarily dictated by
Sarbanes-Oxley, or SOX. For privately held companies, although they are
not legally required to follow SOX standards, they nonetheless serve as
an excellent example of best practices around maintaining an IT
environment and should be strongly considered.
Documenting Backup Policy and Procedures
When building your
documentation around your backups, it is best to start with a policy
that supports not only the SLAs for your Exchange Server environment but
one that also complies with any existing rules from your Information
Security group or Regulatory Compliance group.
Management should
review and approve your backup policies to ensure that they are in line
with any established SLAs. Policies should include items such as the
following:
Frequency and type of backups
Acceptable standards for offsite storage and retrieval
Escalation path for failed backups
Decision criteria for overrun jobs
Clear statement of what is and isn’t backed up
Whether the backups are password protected
Data retention periods
In this way, everyone
knows what is and isn’t covered by Exchange Server backups, and there
are no surprises in the future. Having this policy documented is also
helpful if you are required to pass any audits or verify regulatory
compliance.
Maintaining Documentation on the Exchange Server Environment
Systems such as
Exchange Server often outlast the employees who built them. This means
that it’s easy to lose track of exactly how systems are deployed, where
various roles are located, and the specific needs of each participating
system. For this reason, it is extremely important
to maintain accurate documentation for the server configurations, the
network, and the path of mail flow. In addition, you need to track the
configuration of firewalls and switches that can potentially impact the
overall Exchange Server environment if they were to fail and need to be
replaced.
Server Configuration Documentation
Server documentation is
essential for any environment regardless of size, number of servers, or
disaster recovery budget. A server configuration document contains a
server’s name, network configuration information, hardware and driver
information, disk and volume configuration, or information about the
applications installed. This complete server configuration document
contains all the necessary configuration information a qualified
administrator needs if the server needs to be restored and the operating
system cannot be restored efficiently. A server configuration document
can also be used as a reference when server information needs to be
collected.
The Server Build Document
A server build document
contains step-by-step instructions on how to build a particular type of
server for an organization. The details of this document should be
tailored to the skill of the person intended to rebuild the server. For
example, if this document were created for disaster recovery purposes,
it might be detailed enough that anyone with basic computer skills could
rebuild the server. This type of information can also be used to help
information technology (IT) staff follow a particular server build
process to ensure that when new servers are added to the network, they
all meet company server standards.
Hardware Inventory
Documenting the
hardware inventory of an entire network might not be necessary. If the
entire network does need to be inventoried, and if the organization is
large, the Microsoft System Center Configuration Manager can help
automate the hardware inventory task. If the entire network does not
need to be inventoried, hardware inventory can be collected for all the
production and lab servers and networking hardware, including
specifications such as serial numbers, amount of memory, disk space,
processor speed, and operating system platform and version. By knowing
all the hardware involved, the restore process becomes much simpler,
especially in situations in which hardware needs to be replaced as part
of the restoration.
Network Configurations
Network
configuration documentation is essential when network outages occur.
Current, accurate network configuration documentation and network
diagrams can help simplify and isolate network troubleshooting when a
failure occurs.
WAN Connection
WAN connectivity
should be documented for enterprise networks that contain many sites to
help IT staff understand the enterprise network topology. This document
is helpful when a server is restored and data should be synchronized
enterprisewide after the restore. Knowing the link performance between
sites helps administrators understand how long an update made in Site A
will take to reach Site B. This document should contain information
about each WAN link, including circuit numbers, Internet service
provider (ISP) contact names, ISP technical support phone numbers, and
the network configuration on each end of the connection, and can be used
to troubleshoot and isolate WAN connectivity issues.
A strong
understanding of the network is also critical to the process of
initially creating the backups. By understanding the implication of
backups over the network or how bandwidth would be affected after
replacing a failed Database Availability Group replica, you can account
for periods of time in which the environment might not have the normal
level of redundancy that it was designed for and backups might
potentially need to be altered to account for it.
For example, if an
environment were using database availability groups to place replicas of
mailbox data into two locations, they might feel that they were
protected against system failures; combined with a 30-day deleted item
retention, they might only do traditional backups once a month. If a DAG
replica failed and would take two days to reseed due to a total
replacement of the failed replica, they would be at risk for those two
days because only one copy of the mailbox databases would be available.
During this period of time, they might alter their backup schedule to
perform backups nightly until the additional replica was returned to
service.
Router, Switch, and Firewall Configurations
Firewalls, routers, and,
sometimes, switches can run proprietary operating systems with a
configuration that is exclusive to the device. During a system recovery,
certain gateway connections, configuration routing information, routing
table data, and other information might need to be reset on the
restored server. Information should be collected from these devices,
including logon passwords and current configurations. When a
configuration change is planned for any of these devices, the newly
proposed configuration should be created using a text or graphical
editor, but the change should be approved before it is made on the
production device. A rollback plan should be created first to ensure
that the device can be restored to the original state if the change does
not deliver the desired results.
Updating Documentation
One of the most
important, yet sometimes overlooked, areas around documentation is
maintaining accuracy as changes are applied to server systems.
Documentation is tedious, but outdated documentation can be worthless if
changes have occurred to a server’s software configuration since the
document was created. For example, if a server configuration document
were used to re-create a server from scratch but many changes were
applied to the server after the document was created, the correct
security patches might not be applied, applications might be configured
incorrectly, or data restore attempts could be unsuccessful. Whenever a
change will be made to a network device, printer, or server,
documentation outlining the previous configuration, proposed changes,
and rollback plans should be created before the change is approved and
carried out on the production device. After the change is carried out
and the device is functioning as needed, the documentation associated
with that device or server should be updated.