programming4us
programming4us
ENTERPRISE

Commercial Backup Utilities : Simultaneous Backup of Many Clients to One Drive, Disk-to-Disk-to-Tape Backup, Simultaneous Backup of One Client to Many Drives

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
2/16/2015 8:54:44 PM

Simultaneous Backup of Many Clients to One Drive

There are now backup drives that can write much faster than many disk drives. Backup drives now use Fibre Channel and large buffers, and sometimes write multiple streams of data to the drive at once. The result is that many backup drives can write at 100 MB/s or faster. The difficulty with most of these drives, though, is that they are streaming tape drives. This means that you must supply them with a steady stream of incoming data equivalent to their maximum throughput. If you don’t, the drive does not stream, resulting in a significantly slower transfer rate than it is capable of. This problem, of course, affects only streaming tape drives. It does not affect backup drives that simulate a disk drive, such as magneto-optical, CD, DVD, or virtual tape drives.

All streaming tape drives are designed to write most effectively at their optimum speed. If they are supplied with data at a slower rate than that, the result may be surprising. What begins to happen is referred to as repositioning. The drive spends most of its time repositioning and has less time to actually write data. The result is that it will write at a small fraction of its maximum data rate.

When would this happen? Consider a single-threaded backup process such as dump or tar. It encounters many potential bottlenecks before getting to the drive. The first obstacle is the disk itself, which may not be able to supply data at a sufficient rate. Another is the SCSI bus and system to which the drive is connected. It may be busy satisfying other I/O requests, hence impacting the data rate that can be transferred across the bus. The next, of course, is the network. If it’s a 100 MB network, the best possible data rate is less than 10 MB/s. If it’s a gigabit network, any individual server can typically handle only about 60–75 MB/s. (Hopefully this limit will go away soon.) The final possible bottleneck is the system to which the backup drive is connected. Any one of these bottlenecks could slow the data rate to less than the optimal speed for the backup drive. If that happens, a tape drive stops streaming and begins repositioning in order to keep pace with the incoming data. When this happens, it becomes the new bottleneck.

What can be done about this? Most commercial backup products solve this problem by implementing multiplexing. Multiplexed backups read data from several systems and several disks at one time and supply the data from all of these threads to the backup drive simultaneously. The data from these different sources is then interleaved onto the volume. The backup drive is therefore always being supplied with a sufficient stream of data so that it can write at its maximum speed. A cautious administrator learning of this feature for the first time might be worried. Some administrators don’t like the fact that their data is spread out all over the volume. However, this feature is really the only way to stream a backup drive that can read or write at speeds faster than a disk. This is why most commercial backup products implement multiplexing.

Tip

Different backup software vendors use different terms for this feature. It is sometimes referred to as parallelism or interleaving.

The downside to multiplexing is that it slows down the speed of restores. Therefore, multiplexing was considered by many to be a necessary evil. The feature that has replaced it is disk-to-disk-to-tape backup, where backups are first sent to disk, then sent to tape.




Disk-to-Disk-to-Tape Backup

A very important methodology in today’s backup systems is to first back up data to disk, and then copy it to tape. This is referred to as disk to disk to tape, or D2D2T. As long as the backup product can support automatically copying data from one media type to another, it can be used to support D2D2T backups. However, support for automating staging based on capacity can help a lot.

For example, suppose you back up to a large disk array or virtual tape library , and you would like backups to stay on that VTL as long as possible before being moved to tape. The ideal would be that the backup software would automatically copy the backups from disk to tape, and would automatically expire the disk-based backups to make room for more backups. This kind of functionality is often called disk staging and is usually only available if you are backing up to a filesystem (i.e., not using a VTL).

Different backup software companies support D2D2T in a number of different ways, and some of them are more fond of it than others. If you are a fan of D2D2T, you should definitely look into how well the backup software you are considering supports the features in this area.

Simultaneous Backup of One Client to Many Drives

How does a backup product get a very large database (VLDB) or very large system (VLS) onto backup media? This is a slightly different scenario from the one described previously. Multiplexing allows 5 or 10 systems to share the same device, so that backup devices are constantly streaming, and backups can be done in a timely manner. In contrast, the situation that calls for multistreaming is a single system that could not possibly back up its files to a single backup drive in one night. Even if one had a backup drive that is capable of 100 MB/s, that is only 720 GB per hour, or 5.76 TB in an eight-hour period—assuming the drive could be streamed for that long. What if the system contains 20 TB of data? What if the drive is capable only of 25 MB/s because of the network it’s running on?

The only way to get that kind of speed is to use several drives simultaneously. (Other combinations would work, of course, but they all require multiple simultaneous backup drives.) However, in order to make that happen, the backup software needs to be able to take the one system and send it to many devices simultaneously. Many products are able to do this. What is important is how they do it.

The most common way that backup software companies tell you to back up a VLDB or VLS is to configure multiple backup definitions, each of which is a subset of the entire system, then run them simultaneously. For example, there could be one backup definition that backs up everything, excluding /home1 and /home2. Then there could be a second backup definition that backs up just /home1 and /home2. The software then would be told to back them up at the same time. That way, two different devices would be operating at the same time—potentially cutting the backup time by as much as half.

This method is not desirable for a few reasons. The first is that it is difficult to figure out where to divide the partitions up, and you’ll never get it exactly right. Even if you get it right one day, things will change. The second problem is that it requires you to maintain an include list that must be updated every time they add a new filesystem.

Warning

Backups should never be defined this way. If backup definitions must be split up, always start with an “everything except” backup definition. That way, when a new filesystem is added, it will get backed up automatically.

Glad Nobody Asked for That One!

I was using a backup product that required me to create separate backup definitions if I wanted to do one-to-many. The host was so big and my tape drives were so slow that I had to define five separate backup definitions. First, I had a backup definition that included everything except /data1-/data8. Then I had four more, each of which backed up two of the /data<x> filesystems. When the machine’s administrator added /data9, it was backed up automatically by the “everything except /data1-8” backup. However, when /data10 and /data11 were added, they didn’t get backed up at all.

One day I noticed that I couldn’t find any history for the /data10 filesystem. It was only after a frantic call to tech support that I found out the error was mine. When I excluded /data1, what I actually said was, “exclude all filesystems that match the regular expression /data1.” Unfortunately, /data1 also matches /data10. For more than two months, there were two filesystems that weren’t getting backed up. Luckily, nobody needed a restore. This is why I say to be very careful when excluding filesystems using regular expressions.

A better method is if the backup product automatically creates the multiple jobs for you. If they support this, they usually do it on the filesystem level, creating a job for each filesystem. One challenge with this is a single large filesystem. Due to modern advancements in filesystems, they can now be several terabytes. So now, not only are there several terabytes of data to back up from one system, but all the data resides in one filesystem.

There is no way to back up a multiterabyte system to a single backup drive in one night. The only way to get that kind of speed is to use several drives simultaneously, on four separate channels. However, in order to make that happen, the backup software must be able to take the one filesystem and send it to many devices simultaneously. As of this writing, this is an area that only two products have properly addressed.

Other  
  •  Commercial Backup Utilities : Backup of Very Large Filesystems and Files, Aggressive Requirements
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 8) - Using a Script to Migrate CPUs
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 7) - Configuring an nPartition and Virtual Partitions for Auto-Booting
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 6) - Creating the Second Virtual Partition, Booting the Second Virtual Partition
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 5) - Virtual Partition States, Booting the First Virtual Partition
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 4) - Installation of Virtual Partitions
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 3) - Planning for Virtual Partitions
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 2) - The Virtual Partition Environment
  •  The HP Virtual Server Environment : Virtual Partition Example Scenario (part 1)
  •  The HP Virtual Server Environment : HP Virtual Partitions - Virtual Partition Terminology
  •  
    Top 10
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control
    - Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window
    - Microsoft Visio 2013 : Using the Organization Chart Wizard with new data
    Video Sports
    programming4us programming4us
    programming4us
     
     
    programming4us