A
custom backup format is one that is readable
only by a particular commercial backup utility. Backups made in a custom backup format cannot be read by native backup
utilities such as tar
, cpio
, or dump
. Some backup products use a
custom format that is published or freely available. Although their backups can’t be read
with native utilities, a programmer theoretically could write a program that would read
their backups. Some backup products are completely proprietary. In fact, some products are
so proprietary that even their own product can’t read a volume if the indexes for that
volume have expired or been deleted. A standard backup format would be a format that is
readable by a standard utility. There are two schools of thought on this subject.
There are those who feel that using proprietary or custom backup formats is dangerous.
If the backup volumes can’t be read by native utilities, what do you do when the
commercial backup product is broken? They prefer to use utilities that back up using
industry-standard backup formats such as cpio
or
tar
. They provide a sense of security that is just
not possible when using a custom backup format. Companies often switch backup products,
and when that happens, their old volumes are not readable by the new product. If the
volumes were readable by standard utilities, however, they still could be used for
restores. What is commonly done in this scenario is to keep the old backup system running
for restores only.
Tip
Vendors are on both sides of this debate. What follows is my best effort to explain
the pros and cons of each side. You have to choose which pros and cons are most
important to you.
Some say that old backup formats are just that—old. They served their purpose, but it
is time to move on to more sophisticated utilities. There are two problems with native
utilities. The first problem is that they are always changing. The dump
command is filesystem-specific, and different versions of
dump
are incompatible. The tar
and cpio
commands also have changed
their formats over time and are not always compatible between different operating systems.
(ntbackup
has remained the same over the years.) The
second problem is that native utilities have significant limitations, such as pathname
lengths and the inability to handle open files. The most significant limitation, though,
is their inability to generate or receive multiple backup streams.
On the other hand, longtime system administrators have learned how to use ditto
, dump
, ntbackup
, tar
, and
cpio
. There
is a familiarity there that just isn’t going to be possible with a unique format. Also,
some people have been burned by commercial utilities that have come and gone. There have
even been a few that have changed their own formats, making their old volumes unreadable
by a new version of their own software! This means that concerns about custom and
proprietary formats are valid. Since restricting yourself to native utilities
significantly reduces the number of available products, make sure that you properly
examine the limitations of these native utilities before doing so. The limitations of
each of these utilities are discussed in the following sections.
The
dump
command is a Berkeley contribution
and as such is not always included on some pure System V Release 4 systems. (It sure
surprised me the first time that happened!) dump
backs up a filesystem via the raw device, not through the filesystem. It therefore
must know the structure of the filesystem that it is backing up, so each new type of
filesystem requires a new version of dump
. Also, a
dump
backup of one filesystem type will not
necessarily be readable by the restore
utility of
another filesystem type. There have been a number of new filesystem types over the
years. Each new filesystem usually comes with its own version of dump
, and many of the newer versions are not reverse
compatible with the older versions.
Some new filesystem types don’t come with a traditional dump
command at all.
A backup tool should not rely on a native utility that changes from filesystem to
filesystem. The backup volumes are not compatible between platforms, and even within
the same platform, such as (efs)dump
and xfsdump
on SGI. Also, dump
is not always available.
The tar, ditto, and cpio utilities
Unlike
dump
, ditto
, tar
and cpio
access files through the filesystem just as a user
does. Since they are not filesystem-dependent, they change much less over time than
dump
does. This may be ditto
’s, tar
’s, and cpio
’s greatest advantage. However, there are different
versions of ditto
, tar
, and cpio
for each platform, and
not all of them are compatible. It also should be noted that most of the commercial
backup products
that write in a tar
- or cpio
-compatible format do not use the actual tar
or cpio
command;
they have their own command that writes in a format that is readable by tar
or cpio
. (This is
the way ditto
works.) That way, the commercial
product can overcome some of tar
’s and cpio
’s limitations, such as cpio
’s 255-character limitation and tar
’s 100-character limit on pathnames.
As stated earlier, the camps are divided into those backup products that use a
standard format and those products that do not. Further, products that do not use a
standard format should themselves be divided into two groups—those that publish their
format and those that do not. Theoretically, a programmer who knows the format of the
volume could write a program to read it. Most products depend quite heavily on a
database that tracks the location of each file or piece of file on the volume. If the
database is corrupted or lost, they may not be able to read the volume at all.
Tip
Should someone use a product that has a custom
backup format? Before purchasing such a product, be sure to ask a few questions. Is the
format of this volume completely proprietary, or is there a document explaining how it
was written? Is there a standalone utility that allows me to read these volumes even if
the catalog is down? If this product made a volume but then later did not know what was
on it, could it reread the volume and determine the file sets that went to that
volume?
Some backup programs that use custom formats come with a standalone utility that can
read the volume without the use of the backup database, providing essentially the same
functionality as a native command. This is a beautiful thing, but it’s harder to come by
than you might imagine.
Some readers may remember the System Independent Data Format (SIDF) that was first
proposed back in 1993 as an international volume-interchange format. It was used on a
limited basis by a small number of backup products. If a product followed this format
completely, not only would it have completely platform-independent volumes, but its
volumes would be readable by other backup software products. The format barely gained
acceptance. Any questions on the status of SIDF are answered by going to http://www.sidf.org: “www.sidf.org not found. Please
check the name and try again.”
Suppose you had a bunch of volumes that were written in tar
format, and your backup software has been keeping track of them all. If
that software is not functioning properly, how will you know what is on the hundreds, or
even thousands, of backup volumes that you have? I suppose you could do a tar tvf
on all of them and create your own “minicatalog.”
That’s not an easy task. Suppose you had 500 or so tapes. It would take you more than a
month to read them all. This is just to get a table of contents of these volumes.
A much better solution would be to get a backup system that you trust. Learn how to
check the database for inconsistencies. Run those checks every day, and if any
inconsistencies are found that can’t be fixed, recover the database back to the point in
time before it became corrupted. If the backup software allows it, you then have it
reread any volumes that have been written to since then.