programming4us
programming4us
ENTERPRISE

Commercial Backup Utilities : Backup of Very Large Filesystems and Files, Aggressive Requirements

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
2/16/2015 8:53:20 PM

Backup of Very Large Filesystems and Files

Large filesystems and files caught many backup products by surprise. For many years, 4 GB filesystems with a maximum of 2 GB files ruled the land. This is because none of the operating systems allowed anything larger. Then came 32-bit and 64-bit operating systems and the multiterabyte filesystem. Not long after that, some vendors were announcing the ability to create multiterabyte files. This caused a major problem with some backup vendors because many design decisions were made assuming there was a 4 GB limit.

Tip

It’s inexcusable that a product would still have this problem at this point.

There are a few things to consider when investigating whether a given product can handle large files and filesystems. Does the vendor have any hardcoded limits that say a file can’t be any bigger than N bytes? Do they have problems if a filesystem (or file) is bigger than a volume? Do they have any automated way to create multiple simultaneous backups of a single filesystem, without requiring you to manually divide that filesystem into many pieces?

Switching Backup Products

You should never change your backup product just because you’re having problems with it. First, the problems with any given backup system are almost always misconfiguration, misunderstanding, lack of defined processes, not enough hardware, or too much hardware. It’s almost never the product itself. Second, switching backup products affects all three important business factors: cost, risk, and service levels. There’s the cost of acquiring the product, legacy restores, new product training, and implementation services. Then there’s the risk of data loss while you are learning the new product. Thirdly, there will be an apparent drop in service levels as you try to figure out how the new system works.

Therefore, if you’re having problems with your backup system, switching products should be the last thing on the list. Before you do that, hire a specialist in your backup product to make sure that it’s performing optimally and that you’ve taken advantage of any useful extra features that might help solve your problem. They’ll probably solve your problem, it will cost a fraction of what a new backup product will cost, and your knowledge of your current product will increase. Therefore, there’s only one reason that you should be considering changing your backup product: you have requirements that your current backup product cannot meet.

Aggressive Requirements

These requirements include your recovery time objective, recovery point objective, consistency, and backup window groups.

Your recovery time objective, or RTO, is how quickly you want the system to be recovered. RTOs can range from zero seconds to many days, or even weeks. Each piece of information serves a business function, so the question is how long you can live without that function. If the answer is that you can’t live without it for one second, you have an RTO of zero seconds. If the answer is that you can live without it for two weeks, you have an RTO of two weeks.

The recovery point objective, or RPO, is determined by how much data you can afford to lose. If you can lose three days worth of a given set of data, that set of data has an RPO of three days. If it’s real-time customer orders, however, you may decide you can’t afford to lose any of them; you have an RTO of zero for that application.

There can also be an RPO for a group of machines. If you have several systems that are related to each other, you may need to recover them to the same point in time. They are referred to as a consistency group. To meet such a requirement, you have to back up all related systems at exactly the same time, or you have to give each system a very small RPO. Having an RPO for a group of machines basically makes the RPO for each machine in that group the same as the lowest RPO of any machine in that group.

Once you’ve determined an RTO and RPO for each system and disaster type, you need to agree on when you can back up a system, how long you can take to back it up, and how much you are allowed to impact the production system while it is being backed up. These values are collectively and generally referred to as the backup windo w.

Once you’ve determined your requirements, you may find that they are too aggressive. We’ll consider a requirement aggressive if a traditional LAN-based recovery method can’t meet it. This typically means that the bandwidth is too small, the amount of data to move is too large, or the amount of time you’re given isn’t reasonable. This tends to happen in one of three scenarios:

Remote office backup

Remote offices have long been the elephant in the room at many data protection planning sessions. They’re typically handled with remote tape drives or tape libraries that aren’t being managed by skilled personnel, or they’re not being managed at all. I recently met with an oil-and-gas company whose remote offices included off-shore drilling rigs. Imagine the fun they have getting a vaulting company to stop by!

However, an increasing number of people want to fix this problem by backing up their remote office across the network. How do you back up a remote office with hundreds of gigabytes of data if it’s on the other side of a WAN connection? A typical backup and recovery system would not be able to meet any reasonable RTO, RPO, or backup window requirements.

Very large applications

I’ve recently seen a 150 TB Oracle database. Try backing that thing up! While you may not have a 150 TB application, it’s highly possible that you have an application that’s too large to back up and recover within an acceptable time. As of this writing, that size seems to be individual servers that have gone beyond several terabytes. Anything a few terabytes of smaller can be backed up and recovered within a few hours, which is within range of most RTOs and RPOs. However, what do you do if you’ve got a 10 TB application and a one-hour RTO?

Very critical applications

Some applications are so critical to the business that their owners simply won’t accept any downtime or loss of data. How do you meet a one-minute RTO or a zero-second RPO, regardless of how large your application is? Applications that fall into this category are the hardest applications to design for and require very advanced data protection systems.

The following technologies can help you meet aggressive requirements, and they are listed in order of their ability to meet such requirements. The farther down the list they appear, the more aggressive requirements they can meet.

Test Your Restores

At a large insurance company, a decision had been made to back up our NetApp Filer by backing up the shares rather than purchasing an NDMP key and dedicated tape drive. We used the utility from the Windows resource pack to mount the share at boot time. For an unknown reason, the share did not get mounted, and because we specified soft in the mount configuration file, there was no notice or error message given.

Since the share was never mounted, there was no notice given during backups. However, if the owner of the share logged on to the system, the share would be mounted at that time and dismounted when he signed off. This went on for several months before the owner of the share corrupted the data and asked for a restore from the previous night. It was then that we found out there were no backups for three months and that the share he was using had expired.

Harry Tirrell

LAN-Free Backup

The first advanced backup and recovery technique is to send the backup data across a SAN instead of the LAN—thus earning the term LAN-free. Since LAN-free backups are faster than LAN-based backups, they can be used to help meet more aggressive RTOs and RPOs than you could meet with LAN-free backups. If you’ve got a large server that’s having trouble meeting its RTO or RPO, you may consider making it a LAN-free backup client.

LAN-free backups require the backup device to be connected via a storage area network, or SAN, running either Fibre Channel or iSCSI. A Fibre Channel SAN would use Fibre Channel HBAs and a Fibre Channel switch. An iSCSI SAN would use regular HBAs with iSCSI drivers (or perhaps iSCSI HBAs), and can route their traffic across any IP network—although it’s a good idea to separate this traffic from regular IP traffic.

A basic storage area network (SAN)

Figure 1. A basic storage area network (SAN)

As illustrated in Figure 1, backup servers connected to a SAN have virtual physical access to all peripherals in the SAN. Once the peripherals are attached and the SAN is configured, all SCSI/Fibre Channel/iSCSI traffic is routed through the SAN, and each server “thinks” that the library is locally attached to it. This allows the server to take advantage of the recent advancements in backup technology that allow backups to be sent across the SAN. They are much faster (and much easier on the CPU) than LAN-based backups.

Tip

When people first see a SAN drawing, they don’t see much difference between it and a LAN, and the line between the two gets more blurred every day. Historically, a SAN used Fibre Channel, and a LAN used Ethernet and IP. Now with iSCSI, a SAN can also use IP as a transport mechanism. Just remember that a SAN is talking SCSI. That SCSI may be running on top of Fibre Channel or IP—but it’s still SCSI.

Commercial backup applications can dynamically configure which servers have access to each peripheral. For example, when it is time for a particular server’s full backup, the backup application can configure the router in such a way that it has access to every available backup drive. Of course, it can do this for a critical restore as well.

Server-Free (or Serverless) Backup

LAN-free backups do make backups go much faster, and they do require less CPU than LAN-based backups, but they also still require some CPU. If you have a very large server, even that reduced CPU load may be more than the server can handle. It may impact the application too much, or it may require more CPU than you have available. What if you could back up the application without actually sending the data through the server that is using the database? If you’re willing to accept a very different type of backup design, this is now possible. Consider the drawing in Figure 2.

Serverless backup

Figure 2. Serverless backup

At the top of Figure 2, there are two database servers that are connected to a large, multihost-attachable disk array. The backup server also is attached to this array. The databases actually are sitting on top of mirrored partitions inside the large disk array. To accomplish a serverless backup, the backup server tells the database server that it needs to do a backup of the database. The database server splits off one of the mirrors or makes a virtual snapshot of its volumes, then tells the backup server that it can back it up. The backup server then backs up the data via a path that doesn’t include the original database server—hence the term serverless.[2] (Of course, the backup server is still involved. It’s just the original server that no longer has to move the data.) The entire application can then be backed up without transferring the data through the client that is using it. The data may take one of two paths:

  • It can back up the data via another dedicated server that can access the split-off mirror or snapshot.

  • It can back up the data via a SAN router than accepts the SCSI X-COPY command. This moves the data directly from the disk device to tape without going through any server.

If the data being backed up is a split mirror (instead of a virtual snapshot), it also offers another advantage that traditional backup methods cannot. This second mirror can be left disconnected until it is time to back it up again. At that point, it can be quickly resynced to the other side of the mirror. Leaving it disconnected like this gives you an instantly available backup of the entire database. If something were to happen to the production database, you could run a few commands and remount the database using the mirror. All you would need to do is to replay your transaction logs since the mirror was split off. Without this standby mirror, you would have to restore the database before you could replay your transaction logs. This technology could be used to meet very aggressive RTOs and RPOs.

There are a few disadvantages to this method, starting with the fact that it is extremely complex. If everything goes right, you’re fine. If something goes wrong, you’ve got logs on the backup server, media server, client, storage array, and SAN router. It can take a really long time to figure out why it’s not working. The second disadvantage is that most serverless backup products don’t offer serverless restores. Make sure to look into that when investigating this option. Finally, this is also a very expensive option.

Tip

Unix Backup & Recovery spoke more highly of serverless backup. A lot of things have changed since then, starting with the three technologies that will be covered next. Many people now consider them the preferred methods for meeting very aggressive RTOs and RPOs.

De-Duplication Backup Systems

The developers of de-duplication backup systems asked themselves a few questions. If only a few bytes change in a file, why are we backing up the entire file? If the same file resides in two places on the same system, why do we back it up twice? Why don’t we just store a reference to the second file? Some even asked why we’re backing up the same file across multiple systems. Doesn’t that waste server and network resources?

The answers to all of these questions, of course, rest in the limitations of traditional backup systems. If we don’t back up the file every time we see it, we’re going to need to load a lot more tapes when we have to restore. In addition, if we only back up the changed bytes in a file, we might need multiple tapes just to restore a single file.

However, if you back up any given file only once, and back up the changed bytes only when a file changes, it is actually possible to meet more demanding backup window requirements. The tape issues mentioned here are mitigated by backing up to disk. Tape-based copies of the disk-based backups can be created at any time, depending on the requirements of the customer. Some de-duplication products can also meet aggressive RTO requirements by restoring only the blocks that have changed since the file was last backed up. The RPO abilities of these products are based on how often you back up, but it is common to use such products to back up hourly, allowing you to meet a one-hour RPO. The same is true for your consistency group requirements.

Tip

De-duplication backup systems use techniques similar to those used by disk targets with de-duplication features. Where those systems de-duplicate at the target level, a complete de-duplication backup system eliminates the redundancy at the client level, reducing the amount of data that has to be sent from a remote office or laptop.

The biggest advantage to de-duplication products is that, from the user adoption perspective, they’re the closest to what users already know. Their interfaces are similar, and they often have database agents similar to those found in traditional backup software. They’re simply able to back up faster and more often, and they use much less bandwidth.

Snapshots

Another alternate backup method is a snapshot. The most common type of snapshot is a virtual copy of a device or filesystem that relies on the original volume to actually present the data to you.[3] This reliance on the original volume is why snapshots must be backed up to provide recovery from physical failures. Snapshot functionality may be found in a number of places, including advanced filesystems and volume managers, enterprise storage arrays, NAS filers, and backup software.

Snapshots can help meet aggressive backup requirements. For example, some snapshots can meet an RTO of a few seconds by simply changing a pointer. You also can create several snapshots per day, allowing for an aggressive RPO. Since snapshots can be created in seconds, you can meet aggressive backup window requirements as well. You can create a stable, virtual backup of a multiterabyte database in seconds, reducing the impact on the application to potentially nothing. Then you’ve got hours to perform a backup of that snapshot. The next section discusses how replication is a great way to do that. Finally, creating synchronized snapshots on multiple systems is also relatively easy, so you can meet aggressive synchronicity requirements as well.

One interesting development in the snapshot world is the development of APIs that allow other vendors to interface with snapshots. NDMP and Microsoft’s VSS are examples. NDMP allows for backup vendors to schedule the creation of a snapshot, as well as catalog and restore from its contents. Restores are performed using the same interface that you would use for “normal” backups, but they are actually performed by the filer using snapshot technology. VSS allows storage vendors with snapshot capabilities to have the files in those snapshots listed in and restored from the Previous Versions tab in Windows Server 2003 and later. Hopefully, this functionality will be added to workstation versions of Windows, and more NAS vendors will support it as well.

Another interesting development with snapshots is the creation of database agents that work with snapshots. The database agent communicates with the database so that it believes it’s being backed up when all that’s really happening is the creation of a snapshot. Recoveries are also sometimes integrated, allowing for incredibly fast recoveries that are controlled by the database application.

Replication

Replication is the practice of continually copying from a source system to a target system all files or blocks that have changed on the source system. Replication used to be what companies implemented after everything was completely backed up and redundant, which meant that very few people used replication. However, many people are now using replication as their first line of defense for providing both backup and disaster recovery.

Replication by itself is not a good backup strategy; it copies everything, including bad things, such as viruses and file deletions. Therefore, a replication-based backup system must be able to provide history by either occasionally backing up the replicated destination or through the use of snapshots. It’s usually preferable to make a snapshot on the source and replicate that snapshot to the destination. That way, you can prepare database applications for backup, take a snapshot, then have that snapshot replicated.

Near-Continuous Data Protection Systems

Replication, when coupled with snapshots, is called near-continuous data protection (or near-CDP). Since you can take snapshots hourly (or even more often in some systems), and the replication is occurring continuously, snapshots and replication are closer to CDP than traditional backup, hence the term near-CDP. Some near-CDP products replicate first, then take snapshots on the target system. Others take snapshots on the source system and replicate those snapshots to the target system.

One advantage of near-CDP systems is that snapshots take just seconds to create, and replication is a very easy way to get the data to another device. You can also cascade replication to provide multiple copies, such as an on- and off-site copy, without touching a tape. If you then want to provide a tape copy of the replicated snapshot, you simply back up one of the destination devices.

The biggest disadvantage when compared to true CDP products is that when you cause logical corruption on the source system, such as deletion or corruption of a file, that corruption immediately overwrites the current backup, and you have to recover the file as of the most recent snapshot. A true CDP product would be able to recover the file to just before you fat-fingered it. Another disadvantage to near-CDP systems is that they often require you to change your primary storage system to support them because they are usually storage-array-based.

Continuous Data Protection Systems

A true continuous data protection (CDP) system is fundamentally an asynchronous replication-based backup system that doesn’t overwrite the target with the most recent data. The software is continuously running, and every time a file changes, the new bytes in that file are sent to the backup server within seconds or minutes. Unlike replication, however, a continuous data protection system stores the changes in a log instead of overwriting the target system with the most recent blocks; it is therefore able to roll back any changes at any time.

Different CDP products transfer data to the backup server in different ways. Some transfer changed blocks immediately while others batch up changed blocks and send them every few minutes. They also differ in how they do recoveries. Some products do quick restores by restoring only the blocks that have changed since the point in time you are recovering from; others recover in a more traditional manner, recovering the entire file or filesystem that you asked to be recovered. Obviously, the first method allows you to meet much more aggressive RTOs and RPOs than the second method.

Either It’s Continuous or It’s Not!

Some vendors are referring to their near-CDP products as CDP. They’re doing this to ride the market momentum that CDP has built. Some even defend that they’re actually continuous because they are continually replicating. I heard one vendor say, “We’re continuously copying all the data. We’re just not keeping it all!” It reminds me of the Seinfeld episode on rental cars. “Oh, you’re good at taking reservations... You’re just not so good at holding reservations. And that’s really the important part...the holding.”

Yes, they’re replicating continuously. That means if you fat-finger a document, your mistake could be immediately replicated onto the backup. The only backup that you will have is the last snapshot. That is not continuous protection; it’s replication with snapshots—also referred to as near-CDP. A truly continuous product would restore your file right up to the point when you fat-fingered it.

They want to differentiate themselves from the “old” way of doing backups and draw attention to the fact that they’re making these snapshots throughout the day —usually hourly. That’s great—just don’t call it continuous. It doesn’t matter if the snapshots are being done once an hour, once a minute, or even once a second. Each of those would be called a time period, and period is an antonym of continuous in the thesaurus I use.

You either save every change, or you don’t. If you’re taking snapshots, you’re not saving every change. It’s as simple as that.

This is not to say that I am not a fan of near-CDP products. Some of the coolest data protection things I’ve ever done have been with snapshots and replication. I also think that most people’s requirements would easily be met by an hourly snapshot. I just don’t want these products calling themselves CDP products.

It’s like a Fibre Channel array calling itself NAS because it is storage attached to a network. Come on, people!

A CDP system has an unnoticeable backup window because it’s copying only changed bytes as they change throughout the day. If they support the block-level recoveries discussed earlier, they also have incredibly fast RTOs. They likewise have infinitely granular RPOs because they can recover any file or filesystem to any point in time. This means that they can meet any type of synchronicity requirement, as they can recover 1, 10, or 100 systems to any synchronized point in time that you would like.

Different CDP products also back up different things. Some are filesystem-based, enabling you to back up and recover any files within that filesystem. Others are database-centric, providing CDP functionality only to a particular database, such as Exchange or SQL-server.

Unlike traditional backup products, file-based CDP products are not going to provide interfaces for your database applications, and believe they’re unnecessary. Such vendors say that they copy blocks to the backup destination in the same order that they are changed on the client. They can therefore put the files back to literally any point in time that you want. Restarting your database after a CDP recovery causes it go into the same mode that it would go into if the server were to crash. It examines the datafiles, figures out what’s inconsistent, rolls backward or forward any necessary transactions or blocks, and your database is up. (By the way, if this crash recovery process didn’t work, your database vendor would be out of business. Servers crash, and they have to prepare for that.) If the CDP product puts the blocks back in the same exact order they were changed, the database should be able to recover from any point in time. They also say that if for some extreme reason the database can’t perform crash recovery from the 12:03:57:01 p.m. image, recover to 12:03:57:00 or 12:03:56:59. Some products can even present a logical unit number (LUN) or volume to your database that it can mount and test before you do the recovery.

Your database vendor may have a different opinion about CDP. They may feel that if you’re not using their supported backup method, you shouldn’t call them for support if something goes wrong. If you’re considering a CDP product to back up your database, you should have that conversation with your database vendor and then make your own decision. In addition, your DBAs have to be sold on CDP as well. Some may think it’s revolutionary; others will think it’s scary. If you like the idea, keep pushing both your database vendor and your DBAs to consider it. Times change. It wasn’t that long ago that Oracle didn’t support NAS and snapshots; now it loves them.

Remote Office Backup

Figure 3 shows a remote office backing up to a central office using either de-duplication, near-CDP, or CDP software. If the clients on the left of the drawing are too large to meet their RTO if recovering from the central office, you can back up to a local recovery server that is used to facilitate nondisaster major recoveries. That server then replicates its backups to a centralized server.

Backing up a remote office

Figure 3. Backing up a remote office


Other  
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 8) - Using a Script to Migrate CPUs
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 7) - Configuring an nPartition and Virtual Partitions for Auto-Booting
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 6) - Creating the Second Virtual Partition, Booting the Second Virtual Partition
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 5) - Virtual Partition States, Booting the First Virtual Partition
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 4) - Installation of Virtual Partitions
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 3) - Planning for Virtual Partitions
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 2) - The Virtual Partition Environment
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 1)
  •  The HP Virtual Server Environment : HP Virtual Partitions - Virtual Partition Terminology
  •  The HP Virtual Server Environment : HP Virtual Partitions - Virtual Partitions Overview
  •  
    Top 10
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control
    - Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window
    - Microsoft Visio 2013 : Using the Organization Chart Wizard with new data
    REVIEW
    - First look: Apple Watch

    - 3 Tips for Maintaining Your Cell Phone Battery (part 1)

    - 3 Tips for Maintaining Your Cell Phone Battery (part 2)
    programming4us programming4us
    programming4us
     
     
    programming4us