custom backup format is one that is readable
only by a particular commercial backup utility. Backups made in a custom backup format cannot be read by native backup
utilities such as
dump. Some backup products use a
custom format that is published or freely available. Although their backups can’t be read
with native utilities, a programmer theoretically could write a program that would read
their backups. Some backup products are completely proprietary. In fact, some products are
so proprietary that even their own product can’t read a volume if the indexes for that
volume have expired or been deleted. A standard backup format would be a format that is
readable by a standard utility. There are two schools of thought on this subject.
There are those who feel that using proprietary or custom backup formats is dangerous.
If the backup volumes can’t be read by native utilities, what do you do when the
commercial backup product is broken? They prefer to use utilities that back up using
industry-standard backup formats such as
tar. They provide a sense of security that is just
not possible when using a custom backup format. Companies often switch backup products,
and when that happens, their old volumes are not readable by the new product. If the
volumes were readable by standard utilities, however, they still could be used for
restores. What is commonly done in this scenario is to keep the old backup system running
for restores only.
Vendors are on both sides of this debate. What follows is my best effort to explain
the pros and cons of each side. You have to choose which pros and cons are most
important to you.
Some say that old backup formats are just that—old. They served their purpose, but it
is time to move on to more sophisticated utilities. There are two problems with native
utilities. The first problem is that they are always changing. The
dump command is filesystem-specific, and different versions of
dump are incompatible. The
cpio commands also have changed
their formats over time and are not always compatible between different operating systems.
ntbackup has remained the same over the years.) The
second problem is that native utilities have significant limitations, such as pathname
lengths and the inability to handle open files. The most significant limitation, though,
is their inability to generate or receive multiple backup streams.
On the other hand, longtime system administrators have learned how to use
is a familiarity there that just isn’t going to be possible with a unique format. Also,
some people have been burned by commercial utilities that have come and gone. There have
even been a few that have changed their own formats, making their old volumes unreadable
by a new version of their own software! This means that concerns about custom and
proprietary formats are valid. Since restricting yourself to native utilities
significantly reduces the number of available products, make sure that you properly
examine the limitations of these native utilities before doing so. The limitations of
each of these utilities are discussed in the following sections.
dump command is a Berkeley contribution
and as such is not always included on some pure System V Release 4 systems. (It sure
surprised me the first time that happened!)
backs up a filesystem via the raw device, not through the filesystem. It therefore
must know the structure of the filesystem that it is backing up, so each new type of
filesystem requires a new version of
dump. Also, a
dump backup of one filesystem type will not
necessarily be readable by the
restore utility of
another filesystem type. There have been a number of new filesystem types over the
years. Each new filesystem usually comes with its own version of
dump, and many of the newer versions are not reverse
compatible with the older versions.
Some new filesystem types don’t come with a traditional
dump command at all.
A backup tool should not rely on a native utility that changes from filesystem to
filesystem. The backup volumes are not compatible between platforms, and even within
the same platform, such as
xfsdump on SGI. Also,
dump is not always available.
The tar, ditto, and cpio utilities
cpio access files through the filesystem just as a user
does. Since they are not filesystem-dependent, they change much less over time than
dump does. This may be
cpio’s greatest advantage. However, there are different
cpio for each platform, and
not all of them are compatible. It also should be noted that most of the commercial
that write in a
cpio-compatible format do not use the actual
they have their own command that writes in a format that is readable by
cpio. (This is
ditto works.) That way, the commercial
product can overcome some of
cpio’s limitations, such as
cpio’s 255-character limitation and
tar’s 100-character limit on pathnames.
As stated earlier, the camps are divided into those backup products that use a
standard format and those products that do not. Further, products that do not use a
standard format should themselves be divided into two groups—those that publish their
format and those that do not. Theoretically, a programmer who knows the format of the
volume could write a program to read it. Most products depend quite heavily on a
database that tracks the location of each file or piece of file on the volume. If the
database is corrupted or lost, they may not be able to read the volume at all.
Should someone use a product that has a custom
backup format? Before purchasing such a product, be sure to ask a few questions. Is the
format of this volume completely proprietary, or is there a document explaining how it
was written? Is there a standalone utility that allows me to read these volumes even if
the catalog is down? If this product made a volume but then later did not know what was
on it, could it reread the volume and determine the file sets that went to that
Some backup programs that use custom formats come with a standalone utility that can
read the volume without the use of the backup database, providing essentially the same
functionality as a native command. This is a beautiful thing, but it’s harder to come by
than you might imagine.
Some readers may remember the System Independent Data Format (SIDF) that was first
proposed back in 1993 as an international volume-interchange format. It was used on a
limited basis by a small number of backup products. If a product followed this format
completely, not only would it have completely platform-independent volumes, but its
volumes would be readable by other backup software products. The format barely gained
acceptance. Any questions on the status of SIDF are answered by going to http://www.sidf.org: “www.sidf.org not found. Please
check the name and try again.”
Suppose you had a bunch of volumes that were written in
tar format, and your backup software has been keeping track of them all. If
that software is not functioning properly, how will you know what is on the hundreds, or
even thousands, of backup volumes that you have? I suppose you could do a
tar tvf on all of them and create your own “minicatalog.”
That’s not an easy task. Suppose you had 500 or so tapes. It would take you more than a
month to read them all. This is just to get a table of contents of these volumes.
A much better solution would be to get a backup system that you trust. Learn how to
check the database for inconsistencies. Run those checks every day, and if any
inconsistencies are found that can’t be fixed, recover the database back to the point in
time before it became corrupted. If the backup software allows it, you then have it
reread any volumes that have been written to since then.