I will explain technical concepts and related terminology to help you design a Backup System for use in business or home.
Backup Methods Classified by Viewpoint
We can categorise backup processes into:
- Backups that target files and folders (file level backups)
- Backups that target full disks or partitions (block level backups)
For the most part, these types of backups are distinct in technology used, limitations, risks, and most importantly, outcomes. I will try to clarify the differences and why they should be taken into account when developing a backup system.
File level backups aim to make a copy of discrete files, such as your photos or documents. This type of backup focuses on each file as a discrete unit of data. When you copy your documents across to an external HDD, you are implementing a file level backup.
Block level backups are a little different and often misunderstood, resulting in backup designs that fail to protect your data. The aim of a block level backup is to copy all the blocks of data contained on a partition. It is important to gain an understanding of how files are stored, and what we mean by a disk, partition, and a block in order to make appropriate decisions regarding your block level backup design, so let’s cover the basics.
A disk stores data in small chunks, which can be referred to as blocks. When you save a file, it will be cut up into small pieces that will fit in the blocks available on the disk. The locations of blocks used by a file are recorded in the index along with the name of the file. When a file needs to be opened, the index is used to find out which blocks need to be read and stitched back together. In the above image, you might consider that a big file has been split across all the blue blocks with other squares representing other blocks on the disk.
A “block”, in the context of a block level backup, refers to one of those same sized portions of your disk. A block may, or may not, contain a piece of a file. It may in fact be blank or contain old data from a file you have deleted (this is why deleted files can sometimes be retrieved).
You will encounter the term “partition” or “disk partition” when setting up a block level. A partition is the name given to the part of a physical disk that is set up with its own index and set of blocks to contain files. It is possible to set up many partitions on a single physical disk, but often each disk will only have one visible partition and so people tend to use the terms disk and partition interchangeably. C: for example, is a partition but it also might be called a drive or disk.
The below image shows two physical disks and the partitions located on each disk. Note the partition with C: also has two hidden partitions, the first to help with the boot process to the Windows program located on C:. The second disk has just the one partition, represented by D: The critical information is in the C: and D: partitions, but it is normally best to backup the lot to make system recovery easier.
Block level backups don’t worry about the content of the blocks, they just copy all the blocks of data and in doing so just happen to capture all files on the backed up partition. Most often the backup will skip blocks that the index says has no current files in them to save time and space. While this method sounds more complex, it is pretty simple from the user’s perspective and is comprehensive where file level backups are more prone to miss important data.
Block level backups introduce some issues that can result in backup failures, but also advantages, such as the ability to backup open files using another layer of technology. That’s leads us into a funky new(ish) Microsoft technology called VSS.
How to Backup Open Files including System State – VSS
Have you ever tried copying a file that was in use? In most cases, windows won’t let you copy an open file; it is “locked” while open to prevent potential damage to the file. When you start windows or any program, many files will be opened and cannot be copied safely using basic file level methods. For this reason, most file level backups will fail to backup open files including programs and windows files.
Block level backups support a technology that allow blocks to copy while their related files are in use. This means they can backup your operating system and program files and allow a complete restore of your system state and personal files.
The technology used to backup open files is implemented by the Volume Snapshot Service under Windows systems (VSS, also called Volume Shadow Copy). A snapshot refers to the state of the disk at a moment in time, as the technology attempts to maintain access to the data at that moment. Once the snapshot is made, usually taking only moments, the system can continue to read and write files, so you can keep using the computer. The system will preserve any data that would otherwise be overwritten after the snapshot, so it is accessible to the continuing backup process, and new data is tracked and not backed up, preventing any inconsistency. This is not a trivial process and things can go wrong.
VSS incorporates a number of parts, including “writers” specific to various programs designed to ensure their disk writes are complete and consistent at the point the snapshot is taken. For example, a database writer would ensure all transactions that might be just partly written are complete before the snapshot, removing the risk of a corrupt database on restoring the backup. Certain types of programs need a specific writer to support them and if they fail a “successful” backup can contain damaged files. Sometimes, part of the VSS service can fail.
VSS is mature and works well in most cases, but you will still find that a VSS backup is more likely to fail than a simple file copy backup.
Some backup tools that focus on backing up particular files and folders can also use VSS. This blurs my definitions, since these use similar technology to block level backup but with the focus of a file level backup. This hybrid approach is worthwhile in cases where you want the advantages of a block level backup but want to exclude backing up some files, such as for example large videos you don’t care about. You should still keep in mind that VSS adds complexity, buts it’s OK to use VSS only backups where you have the technology available and you are careful verifying your backups.
Backup Archives Classified Archive Types
Now we have an idea of the technology behind block level backups, I will go over the rudiments of backup archive types. These concepts can apply to file or block level backups, but they tend to be more related to block level processes.
When you setup a new backup process, the first backup you typically perform is a “full” backup, including all data present on the source. Subsequent backups can vary. You can back all your files each time, or copy just those files that have changed or are new. There are more options than you might realise. I will address the terminology that refers to these methods and outline typical use.
Backup Set/Archive: A set of files, that when considered as a whole, include a complete set of backed up files over a period of time.
A backup set created by a dedicated backup program will often generate one file per backup, containing all files or data captured during the backup. A backup set will then normally contain a number of files over time, but they won’t look like the original files. It is important that you check these sets and ensure actually contain your files. Don’t just assume all your stuff is in there. Size is a good hint, if they are far smaller than your files, something is wrong.
If you have setup a simple file level backup, the archive set might be included in dedicated container files, such as a zipped file, or might be a simple mirror of the original files. I like methods that result in a simple mirror of your files for home backups, as it lets you quickly check what is in your backup set and is less prone to mistakes.
Full Backup: Take a complete, new copy of all files or data on your source and adds them to your destination storage.
A word of caution here, a “Full Backup” does not mean you have backed up all your data. It depends on what you have selected for the process to backup. Make sure you know where your data is before setting up your backup system and do not trust programs that try to select your data for you, they may miss important files. This comes back to concepts in my first article, make sure you have visibility concerning your backups.
Incremental Backup: Backs up new and changed files since the last backup.
An incremental backup will normally be much smaller than the full backup, and commensurately faster to complete. Using incremental backups is recommended where you add or change relatively few files over time.
When you make an incremental backup, it is dependent on any prior incremental backups as well as the original full backup, so if any of the files in the chain are damaged or lost, you will lose data. In theory you could take one full backup and then nothing but incremental for years – don’t, create a new full every now and then.
As a safety precaution, if a backup program tries to create a new incremental backup and can’t find the dependant full backup, it will normally try to create a new full backup.
Differential Backup: Like an incremental backup, but backs up all new and changed files since the last full backup
A differential is less commonly used than incrementals. They play a role where you have relatively large incremental backups to help manage space as they let you delete some older incremental backups without needing a new full backup.
Continuous Backup: A misleading term that normally refers to a chain of frequent incremental backups that are later stitched into a single backup archive.
Continuous backups are a more advanced function only available on business grade backup solutions. Incremental backups are incorporated into the original full backup by a cleanup process, and the oldest data may be stripped out to keep the size under control.
Continuous backups add complexity with the consolidation and cleanup process, but they have significant advantages by avoiding the load placed on systems by running full backups, and are ideal for immediate offsite backups where small incremental backups can be transferred via the internet to give you an immediate offsite copy.
Advanced systems can go one step further and use the backup image to mount an instance of backed up systems running as a virtual machine in the cloud. Great for continuity and disaster recovery.
Common Backup Settings and Options
Once you have decided on the type of backup archive, or more likely a combination of archive types, you need to determine how the process will operate, and when.
Backup Schedule: Set an automatic schedule for your backups
A backup schedule usually involves a combination of archive types set to appropriate frequency. It is important to schedule backups at times when your backup destination will be available and where the computer will be on. If you miss an automated backup, you can always trigger a manual one as needed to cover it.
There are many different interfaces used in backup programs and it is usually worth looking at the advanced or custom options to ensure your schedule is set correctly, rather than going with default settings.
A common schedule would be a daily incremental backup, with a new full backup about every month or three.
Retention and Cleanup: Manage your backup archives to remove old backups in order to maintain space for new backups.
It is very important to consider how long you need access to backed up files. For example, if you delete a file today, or perhaps a file is damaged and you may not notice, how many days or months do you want to keep the old version in the backup archive? Forever is great, except you need to consider how much space that requires!
You should also consider possible problems or damage that might impact your backups. When operating with full backups, its best you keep the old backup set till after a new one has been created, just in case the new one fails and you are left with no backups. You can bet that’s when your HDD will die.
Given that a typical backup system might involve an infrequent and very large full backup and a more frequent and smaller incremental backup, then carefully considering your retention plan can save a lot of space. Below is an example of how you might set retention (I suggest more time between full backups than this example, a month is probably reasonable, but depends on your situation)
In the above example, a full backup is run Mondays with all other days set to incremental backups. Disk space is limited on the backup hard disk, and lets assume that it can’t fit more than two full backups and some incrementals. With the retention period set to six days, backups will sometime be kept for more than six days where backups within the six day period are dependant on older incremental or full backups.
In the above example we have 12 days of backups stored and two full backups. If the system deleted all backups before Sunday, then the Sunday backup would be useless. The system will be smart enough not to (hopefully!). At this point, the backup disk will be near full with inadequate space available to create a new full backup, but consider what happens just before the third full backup is due.
The idea above is once the older incremental backup is no longer within the retention period, we can clean up (delete) all of the oldest backup set in one go.
In this way the old set is kept as long as possible, but is deleted before the next full is due, so the backup program does not run out of space on the following Monday.
See any possible issues with this retention? Any mistakes?
There are a number you should consider. Setting a tight schedule like this may not work as expected. How does the program interpret 6 day retention? Is in inclusive or exclusive when it counts back? What happens if you set it to 5 or 7 days? What happens if the cleanup task runs before, or after the backup task on a given day (that’s particularly important and a common mistake).
You must check that the system works as planned by manually checking that backups clean up the way you plan on the days you plan. Failure to verify your system will inevitably result in a flaw you may fail to notice and leave you vulnerable.
Compression: A mathematical process to reduce the space used by your backups.
When setting up the most basic file level backup, you probably won’t use compression, but every other backup will typically compress your files to save space. This is a good idea and you normally want to go with default settings.
Most photos and videos are compressed as part of their file standard, and additional compression won’t help. For some files that are inefficiently stored where their information content is much less than their file size, various compression schemes can save a tremendous amount of space.
Encryption: A mathematical process based around a password that scrambles the file so its information is not available unless the password is used as the key to unscramble the file.
Modern encryption cannot be broken as long as a secure and appropriate algorithm and password is used. Passwords like “abc123” can easily be guessed or “brute forced” but passwords like “Iam@aPass66^49IHate!!TypingsTuff” are not going to be broken unless the attacker can find the password through other means.
Encryption is dangerous! If you lose the password, your backups are useless. If part of the file is damaged, you probably won’t be able to get back any of it. It adds another layer of things that can go wrong. These risks are relatively small, so encryption is a good idea where your backups contain sensitive information, but if its just a backup of the family photo album, I suggest you don’t encrypt.
VSS: A technology that is related to block level backups and allows files to be backed up while open and in use
You should normally enable VSS, however, if you find errors with VSS causing backups to fail, it is OK to turn off in some situations. Make sure you understand what you lose if you turn off VSS eg database backups may fail.
Intelligent Sector Backup: You may see this idea under a number of terms for partition and full disk backups. The option prevents blocks with deleted files or blank space to be backed up and so saves a lot of space. You normally want this on.
Archive splitting: Many backup programs can split up backup archives into smaller files.
This was traditionally used where backups were split across limited disk media such as CDROMS and is not usually relevant where we are backing up to external HDDs, NAS boxes, or other storage with plenty of space.
Notifications: Most backup programs will send you an email on success or on fail of a backup process.
It is best to have the program send you a message on each backup, but you will find they are annoying, and you just delete them. That’s OK, at least you will notice after a while if the messages stop. Understand that a message that the backup failed is handy and you are more likely to notice, but the program can always fail in various ways so you never get that message.
Do not rely on fail messages or assume their lack means your backups are running. Manually verify backups from time to time.
So, when do we get to the nitty gritty?
Sorry. We are getting there!
In the next article I will outline the hardware and common tools that may form part of your backup system, then in the final article I will go through the nitty gritty and some examples of home and small business backup.