ENTERPRISE

High Availability in Exchange Server 2010 : Exchange Server database technologies

2/8/2011 9:06:20 AM
You may feel that my coverage of non-mailbox High Availability is going to be pretty brief. This is because configuring High Availability for these other server roles has not significantly changed since exchange 2007, so I will just give an overview of these requirements. However, before we start talking about High Availability on the Mailbox Server role we have to discuss some database technologies used in Exchange Server 2010. Exchange Server 2010 uses a database to store the primary data, i.e. the messages you send and receive. This database technology is a transactional system, which is pretty common, but Exchange Server uses its own technology built on the Extensible Storage Engine (ESE), sometimes referred to as a JET database.

When installing an Exchange Server 2010 Mailbox Server, the initial mailbox database is, by default, stored on the local C:\ drive; more specifically on C:\Program Files\Microsoft\Exchange Server\V14\Mailbox\Mailbox Database <<random number>>\. This random number is generated by Exchange Server during the initial configuration because the database names on Exchange 2010 and higher servers must be unique within the Exchange organization.

Figure 1. By default the database and log files are placed on the c:\ drive.

A number of files make up the Exchange 2007 database environment:

  • "mailbox database 0242942819.edb"

  • E00.log

  • E00000003a.log, E000000003b.log, E00000003c.log, etc.

  • E00.chk

  • E00res00001.log and E00res00002.log

  • E00tmp.log

  • Tmp.edb.

NOTE

The random number in this example is 0242942819, hence the name of the Mailbox Database is "mailbox database 0242942819.edb."

All names in the above mentioned list start with the same three digits: E00; this is called the database prefix. The first database in the Exchange organization has a prefix of E00, the second database has a prefix E01, and so on.

All of these files play a crucial role in the correct functioning of Exchange server.

A crucial step in understanding Exchange database technology is understanding the flow of data between the Exchange Server and the database itself. Data is processed in 32 KB blocks, also called "pages.". When Exchange is finished processing such a page it is immediately written to a log file if it was updated. The page is still kept in memory until Exchange needs this memory again, but when the page isn't used for some time, or when Exchange needs to force an update during a checkpoint, the page is written to the database file. So, the data in the log files is always in advance of the data in the database. This is an important step to remember when troubleshooting database issues!

NOTE

Exchange Server 2010 uses 32KB pages, Exchange Server 2007 uses 8KB pages, Exchange Server 2003 and earlier use 4KB pages when processing data. The parts of the server memory that are used by these pages are referred to as the "cache buffers."

As data is written to the database, a pointer called the checkpoint is updated to reflect the new or updated page that was written to the database. The checkpoint is stored in a special file called the checkpoint file, which Exchange Server uses to make sure it knows what data has been written to the database, and what data is in the log files and not yet written to the database. So, in short:

  1. Mail data is initially processed in memory, separated into pages.

  2. Updated pages are written to the log file.

  3. If pages are no longer needed by Exchange these pages are written to the database.

  4. The checkpoint file is updated to reflect the new location of the checkpoint.

Figure 2. Processing of mail data in Exchange Server 2010.

1 Extensible Storage Engine

The database engine used by Exchange Server is a quite special, and is built on the Extensible Storage Engine, or ESE. ESE exists in several flavors:

  • ESE97 for Exchange Server 5.5

  • ESE98 for Exchange Server 2000/2003

  • ESENT for Active Directory

  • ESE for Exchange Server 2007 and Exchange Server 2010.

ESE is a low-level database engine. This means it knows all about "base types," such as short, string, long, longlong, systime, etc., but it has no knowledge of any structure or schema. The schema is defined by the Information Store in the application. This is in contrast to a relational database like Microsoft SQL server, where all the database structures are just meta-data (i.e. are part of the database itself).

ESE is optimized for handling large amounts of semi-structured data, as it is impossible for an Exchange Server to predict what kind of data will be received, how large the data will be, or what attachments messages will have.

NOTE

Ever since the early days of Exchange, rumors have been going around about the use of Microsoft SQL server as the database engine for Exchange Server. Microsoft tried this for Exchange Server 2010 and actually got it working. However, the decision was made to stay on the ESE database. More information about this can be found on the Microsoft Exchange Product Group blog: HTTP://TINYURL.COM/ESEDB.

2 Log files

When Exchange server is working with a page, and that page's status is changed from dirty to clean, the page is written to the log file almost immediately. Data held in memory is fast to access, but volatile; all it takes is a minor hiccup in the server, and data in memory is lost. When it is saved in the log file, the whole server could burn down, and as long as you keep the disk, you also keep the data. Thankfully, saving to the log file is normally a matter of milliseconds. The log files are numbered internally, and this number (referred to as the lGeneration number) is used for identifying the log files, and for storing them on the disk when they are completely filled with data.

The current log file, or the "log file in use" is E00.log, and while Exchange is filling this log file with data, a temporary E00tmp.log file is already created (or is in the process of being created) in the background. When the E00.log is eventually filled with data, it is saved under another name. The name is derived from the log file's prefix (E00, E01, E02, etc.) and the lGeneration number, which is a sequential hexadecimal notation. So, for example, when the lGeneration number is 1, the E00.log is saved as E0000000001.log. Alternatively, the last time this process happened in Figure 1, the lGeneration number was 3E, so the log file was saved as E000000003E.log. Since the lGeneration number is a sequential number, we know that the next lGeneration number of the E00.log must be 3F, and the next time this log file roll-over process takes place, the log file will be saved as E000000003F.log.

Although it's not directly visible, the lGeneration number is stored inside the log file, and can be checked by dumping the header information of the log file with the ESEUTIL utility. The first few lines of the log file's header should read something like:



The lGeneration number is listed on the third line, both in decimal and hexadecimal notation. Unfortunately, this is very confusing, and there will be a day that an Exchange administrator mixes up these notations and starts working with the wrong log file.

After the pages are written to the log file, they are kept in memory, thereby saving an expensive read from disk action when Exchange Server needs the page again. When the Mailbox Server needs that memory for other pages, or when the page stays in memory for a long time, it is written to the database file. This is also known as the "lazy writer mechanism." A common misbelief is that data is read from the log files and written to the database file, but this is not the case. It is written directly from memory to the database, and log files are only read in recovery scenarios, for example, after an improper shutdown of the server. Under normal circumstances, the log files are 100% write, whereas the database is a random mix between read and write actions.

To be honest, it would be possible to write an entire book just about the storage technologies involved, but I think that level of detail generally isn't necessary for the average SysAdmin. However, if you're feeling particularly advanced, I can recommend the book "Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers" by Jerry Cochran. You can find it on Amazon, and Jerry has an article on WindowsITPro.com which also covers the topic: HTTP://TINYURL.COM/JERRYCOCHRAN.

5.2.3 Checkpoint files

The relationship between writing data in the log files and writing data into the database itself is managed by the checkpoint file, E00.chk. The checkpoint file points to the page in the database that was last written, and is advanced as soon as Exchange writes another page from memory to the database.

The difference between the data in the database and the data in the log files is referred to as checkpoint depth. This checkpoint depth can be several log files; in fact, the default checkpoint depth is 20 log files. By using the checkpoint, Exchange waits before writing to the database, and tries to combine several write actions so that the database write operations can be performed more efficiently.

Figure 3. All data below the checkpoint is written to the database.

Checkpoint depth is also a per database setting. So when a database's checkpoint depth is 20 log files, a minimum of 20 MB of data is kept in memory for that specific database. When using 30 databases in Exchange Server 2010, each at its maximum checkpoint depth, approximately 600 MB of Exchange data is kept in memory.

4 The Mailbox Database

The "mailbox database 0242942819.edb" file is the primary repository of the Exchange Server 2010 Mailbox Server role. In Exchange Server 2007 this file was called "mailbox database. edb," whereas in Exchange 2003 and Exchange 2000 the database was comprised of two files: priv1.edb and priv1.stm. In Exchange Server 2010, a Mailbox Server can now hold up to 100 databases.

The maximum size of an ESE database can be huge. The upper limit of a file on NTFS is 64 Exabytes, and this is generally considered sufficient to host large Mailbox Database files. The Microsoft-recommended maximum file-size of the Mailbox Database on Exchange Server 2010 is 2TB. Compared to the 200GB file-size limit in Exchange 2007 (using Continuous Cluster Replication) this is a tremendous increase. Bear in mind that a prerequisite for using this sizing is that you have to configure multiple database copies to achieve a High Availability solution.

Other  
 
Most View
Asus Taichi 21 - Feels Like A Bold
Dell Latitude E6230 – The Workstation For The Workaholic
Everything You Need To Know About The iPad (Part 1)
Nokia Lumia 520 - Does Nokia Really Need Another Budget Windows Phone? (Part 2)
ASP.NET 4.0 : Data-Binding Expressions (part 1) - Simple Data Binding
Preamplifier/DAC Gato Audio PRD-3 Review (Part 1)
Be Creative With Your PC (Part 2)
ASP.NET 3.5 : Caching ASP.NET Pages (part 4) - Advanced Caching Features
WD Black2 Dual Drive Don’t Call It A Hybrid
Sharepoint 2013 : Overview of Windows Azure for Sharepoint (part 5) - DEVELOPING WINDOWS AZURE APPLICATIONS - Creating a Model
Top 10
Review : MSI GT72 2QE Dominator Pro
Review : Panasonic Lumix DMC-LX100
Games Review - Borderlands: The Pre-Sequel
Time for a laser all-in-one at home (part 5) - Samsung Xpress C460FW
Time for a laser all-in-one at home (part 4) - HP Color LaserJet Pro MFP M177fw
Time for a laser all-in-one at home (part 3) - Fuji Xerox DocuPrint CM215fw
Time for a laser all-in-one at home (part 2) - Canon imageClass MF8280Cw
Time for a laser all-in-one at home (part 1) - Brother MFC-9330CDW
Installing and Configuring Windows Server 2008 R2 : Performing postinstallation tasks (part 5) - Configuring disk drives - Creating a RAID 5 volume
Installing and Configuring Windows Server 2008 R2 : Performing postinstallation tasks (part 4) - Configuring disk drives - Creating a mirrored volume