Guide to Personal and Small Business Backups – Technical Concepts

backup_reverseThis article builds on the high level conceptual framework introduced in our previous backup article.

I will explain technical concepts and related terminology to help you design a Backup System for use in business or home.

Related Articles:

01 GUIDE TO PERSONAL AND SMALL BUSINESS BACKUPS – CONCEPTUAL FRAMEWORK

 

Backup Methods Classified by Viewpoint

We can categorise backup processes into:

  • Backups that target files and folders (file level backups)
  • Backups that target full disks or partitions (block level backups)

For the most part, these types of backups are distinct in technology used, limitations, risks, and most importantly, outcomes.  I will try to clarify the differences and why they should be taken into account when developing a backup system.

File level backups aim to make a copy of discrete files, such as your photos or documents.  This type of backup focuses on each file as a discrete unit of data.  When you copy your documents across to an external HDD, you are implementing a file level backup.

Block level backups are a little different and often misunderstood, resulting in backup designs that fail to protect your data. The aim of a block level backup is to copy all the blocks of data contained on a partition.  It is important to gain an understanding of how files are stored, and what we mean by a disk, partition, and a block in order to make appropriate decisions regarding your block level backup design, so let’s cover the basics.

blocks

A disk stores data in small chunks, which can be referred to as blocks.  When you save a file, it will be cut up into small pieces that will fit in the blocks available on the disk.  The locations of blocks used by a file are recorded in the index along with the name of the file.  When a file needs to be opened, the index is used to find out which blocks need to be read and stitched back together.  In the above image, you might consider that a big file has been split across all the blue blocks with other squares representing other blocks on the disk.

A “block”, in the context of a block level backup, refers to one of those same sized portions of your disk.  A block may, or may not, contain a piece of a file.  It may in fact be blank or contain old data from a file you have deleted (this is why deleted files can sometimes be retrieved).

You will encounter the term “partition” or “disk partition” when setting up a block level.  A partition is the name given to the part of a physical disk that is set up with its own index and set of blocks to contain files.  It is possible to set up many partitions on a single physical disk, but often each disk will only have one visible partition and so people tend to use the terms disk and partition interchangeably.  C: for example, is a partition but it also might be called a drive or disk.

The below image shows two physical disks and the partitions located on each disk.  Note the partition with C: also has two hidden partitions, the first to help with the boot process to the Windows program located on C:.  The second disk has just the one partition, represented by D:  The critical information is in the C: and D: partitions, but it is normally best to backup the lot to make system recovery easier.

Backup_Partitions

Block level backups don’t worry about the content of the blocks, they just copy all the blocks of data and in doing so just happen to capture all files on the backed up partition.  Most often the backup will skip blocks that the index says has no current files in them to save time and space.  While this method sounds more complex, it is pretty simple from the user’s perspective and is comprehensive where file level backups are more prone to miss important data.

Block level backups introduce some issues that can result in backup failures, but also advantages, such as the ability to backup open files using another layer of technology.  That’s leads us into a funky new(ish) Microsoft technology called VSS.

 

How to Backup Open Files including System State – VSS

Have you ever tried copying a file that was in use?  In most cases, windows won’t let you copy an open file; it is “locked” while open to prevent potential damage to the file.  When you start windows or any program, many files will be opened and cannot be copied safely using basic file level methods.  For this reason, most file level backups will fail to backup open files including programs and windows files.

Block level backups support a technology that allow blocks to copy while their related files are in use.  This means they can backup your operating system and program files and allow a complete restore of your system state and personal files.

The technology used to backup open files is implemented by the Volume Snapshot Service under Windows systems (VSS, also called Volume Shadow Copy).  A snapshot refers to the state of the disk at a moment in time, as the technology attempts to maintain access to the data at that moment.  Once the snapshot is made, usually taking only moments, the system can continue to read and write files, so you can keep using the computer.  The system will preserve any data that would otherwise be overwritten after the snapshot, so it is accessible to the continuing backup process, and new data is tracked and not backed up, preventing any inconsistency.  This is not a trivial process and things can go wrong.

VSS

VSS incorporates a number of parts, including “writers” specific to various programs designed to ensure their disk writes are complete and consistent at the point the snapshot is taken.  For example, a database writer would ensure all transactions that might be just partly written are complete before the snapshot, removing the risk of a corrupt database on restoring the backup.  Certain types of programs need a specific writer to support them and if they fail a “successful” backup can contain damaged files.  Sometimes, part of the VSS service can fail.

VSS is mature and works well in most cases, but you will still find that a VSS backup is more likely to fail than a simple file copy backup.

Some backup tools that focus on backing up particular files and folders can also use VSS.  This blurs my definitions, since these use similar technology to block level backup but with the focus of a file level backup.  This hybrid approach is worthwhile in cases where you want the advantages of a block level backup but want to exclude backing up some files, such as for example large videos you don’t care about.  You should still keep in mind that VSS adds complexity, buts it’s OK to use VSS only backups where you have the technology available and you are careful verifying your backups.

 

Backup Archives Classified Archive Types

archiveNow we have an idea of the technology behind block level backups, I will go over the rudiments of backup archive types.  These concepts can apply to file or block level backups, but they tend to be more related to block level processes.

When you setup a new backup process, the first backup you typically perform is a “full” backup, including all data present on the source.  Subsequent backups can vary.  You can back all your files each time, or copy just those files that have changed or are new.  There are more options than you might realise.  I will address the terminology that refers to these methods and outline typical use.

 

Backup Set/Archive:  A set of files, that when considered as a whole, include a complete set of backed up files over a period of time.

A backup set created by a dedicated backup program will often generate one file per backup, containing all files or data captured during the backup.  A backup set will then normally contain a number of files over time, but they won’t look like the original files.  It is important that you check these sets and ensure actually contain your files.  Don’t just assume all your stuff is in there.  Size is a good hint, if they are far smaller than your files, something is wrong.

If you have setup a simple file level backup, the archive set might be included in dedicated container files, such as a zipped file, or might be a simple mirror of the original files.  I like methods that result in a simple mirror of your files for home backups, as it lets you quickly check what is in your backup set and is less prone to mistakes.

Full Backup:   Take a complete, new copy of all files or data on your source and adds them to your destination storage.

A word of caution here, a “Full Backup” does not mean you have backed up all your data.  It depends on what you have selected for the process to backup.  Make sure you know where your data is before setting up your backup system and do not trust programs that try to select your data for you, they may miss important files.  This comes back to concepts in my first article, make sure you have visibility concerning your backups.

 

Incremental Backup: Backs up new and changed files since the last backup.

An incremental backup will normally be much smaller than the full backup, and commensurately faster to complete.  Using incremental backups is recommended where you add or change relatively few files over time.

When you make an incremental backup, it is dependent on any prior incremental backups as well as the original full backup, so if any of the files in the chain are damaged or lost, you will lose data.  In theory you could take one full backup and then nothing but incremental for years – don’t, create a new full every now and then.

As a safety precaution, if a backup program tries to create a new incremental backup and can’t find the dependant full backup, it will normally try to create a new full backup.

Backup_Incremental

Differential Backup:  Like an incremental backup, but backs up all new and changed files since the last full backup

A differential is less commonly used than incrementals.  They play a role where you have relatively large incremental backups to help manage space as they let you delete some older incremental backups without needing a new full backup.

 

Continuous Backup: A misleading term that normally refers to a chain of frequent incremental backups that are later stitched into a single backup archive.

Continuous backups are a more advanced function only available on business grade backup solutions.  Incremental backups are incorporated into the original full backup by a cleanup process, and the oldest data may be stripped out to keep the size under control.

Continuous backups add complexity with the consolidation and cleanup process, but they have significant advantages by avoiding the load placed on systems by running full backups, and are ideal for immediate offsite backups where small incremental backups can be transferred via the internet to give you an immediate offsite copy.

Advanced systems can go one step further and use the backup image to mount an instance of backed up systems running as a virtual machine in the cloud.  Great for continuity and disaster recovery.

Common Backup Settings and Options

Once you have decided on the type of backup archive, or more likely a combination of archive types, you need to determine how the process will operate, and when.

 

Backup Schedule: Set an automatic schedule for your backups

A backup schedule usually involves a combination of archive types set to appropriate frequency.  It is important to schedule backups at times when your backup destination will be available and where the computer will be on.  If you miss an automated backup, you can always trigger a manual one as needed to cover it.

There are many different interfaces used in backup programs and it is usually worth looking at the advanced or custom options to ensure your schedule is set correctly, rather than going with default settings.

A common schedule would be a daily incremental backup, with a new full backup about every month or three.

 

Retention and Cleanup:  Manage your backup archives to remove old backups in order to maintain space for new backups.

It is very important to consider how long you need access to backed up files.  For example, if you delete a file today, or perhaps a file is damaged and you may not notice, how many days or months do you want to keep the old version in the backup archive?  Forever is great, except you need to consider how much space that requires!

You should also consider possible problems or damage that might impact your backups.  When operating with full backups, its best you keep the old backup set till after a new one has been created, just in case the new one fails and you are left with no backups.  You can bet that’s when your HDD will die.

Given that a typical backup system might involve an infrequent and very large full backup and a more frequent and smaller incremental backup, then carefully considering your retention plan can save a lot of space.  Below is an example of how you might set retention (I suggest more time between full backups than this example, a month is probably reasonable, but depends on your situation)

Retention_02

In the above example, a full backup is run Mondays with all other days set to incremental backups.  Disk space is limited on the backup hard disk, and lets assume that it can’t fit more than two full backups and some incrementals.  With the retention period set to six days, backups will sometime be kept for more than six days where backups within the six day period are dependant on older incremental or full backups.

In the above example we have 12 days of backups stored and two full backups.  If the system deleted all backups before Sunday, then the Sunday backup would be useless.  The system will be smart enough not to (hopefully!).  At this point, the backup disk will be near full with inadequate space available to create a new full backup, but consider what happens just before the third full backup is due.

Retention_03

The idea above is once the older incremental backup is no longer within the retention period, we can clean up (delete) all of the oldest backup set in one go.

In this way the old set is kept as long as possible, but is deleted before the next full is due, so the backup program does not run out of space on the following Monday.

See any possible issues with this retention? Any mistakes?

There are a number you should consider.  Setting a tight schedule like this may not work as expected.  How does the program interpret 6 day retention?  Is in inclusive or exclusive when it counts back?  What happens if you set it to 5 or 7 days?  What happens if the cleanup task runs before, or after the backup task on a given day (that’s particularly important and a common mistake).

You must check that the system works as planned by manually checking that backups clean up the way you plan on the days you plan.  Failure to verify your system will inevitably result in a flaw you may fail to notice and leave you vulnerable.

 

Compression: A mathematical process to reduce the space used by your backups.

When setting up the most basic file level backup, you probably won’t use compression, but every other backup will typically compress your files to save space.  This is a good idea and you normally want to go with default settings.

Most photos and videos are compressed as part of their file standard, and additional compression won’t help.  For some files that are inefficiently stored where their information content is much less than their file size, various compression schemes can save a tremendous amount of space.

monkey_compress

Encryption:  A mathematical process based around a password that scrambles the file so its information is not available unless the password is used as the key to unscramble the file.

Modern encryption cannot be broken as long as a secure and appropriate algorithm and password is used.  Passwords like “abc123” can easily be guessed or “brute forced” but passwords like “Iam@aPass66^49IHate!!TypingsTuff” are not going to be broken unless the attacker can find the password through other means.

Encryption is dangerous!  If you lose the password, your backups are useless.  If part of the file is damaged, you probably won’t be able to get back any of it.  It adds another layer of things that can go wrong.  These risks are relatively small, so encryption is a good idea where your backups contain sensitive information, but if its just a backup of the family photo album, I suggest you don’t encrypt.

 

VSS:  A technology that is related to block level backups and allows files to be backed up while open and in use

You should normally enable VSS, however, if you find errors with VSS causing backups to fail, it is OK to turn off in some situations.  Make sure you understand what you lose if you turn off VSS eg database backups may fail.

 

Intelligent Sector Backup:  You may see this idea under a number of terms for partition and full disk backups.  The option prevents blocks with deleted files or blank space to be backed up and so saves a lot of space.  You normally want this on.

 

Archive splitting:  Many backup programs can split up backup archives into smaller files.

This was traditionally used where backups were split across limited disk media such as CDROMS and is not usually relevant where we are backing up to external HDDs, NAS boxes, or other storage with plenty of space.

 

Notifications:  Most backup programs will send you an email on success or on fail of a backup process.

It is best to have the program send you a message on each backup, but you will find they are annoying, and you just delete them.  That’s OK, at least you will notice after a while if the messages stop.  Understand that a message that the backup failed is handy and you are more likely to notice, but the program can always fail in various ways so you never get that message.

Do not rely on fail messages or assume their lack means your backups are running.  Manually verify backups from time to time.

 

So, when do we get to the nitty gritty?

Sorry.  We are getting there!

In the next article I will outline the hardware and common tools that may form part of your backup system, then in the final article I will go through the nitty gritty and some examples of home and small business backup.

Guide to Personal and Small Business Backups – Conceptual Framework

ScreamToo often I see our techs consoling a despondent customer, in tears, having irretrievably lost precious files.  Family photos.  Business records.  Blog articles (!). All gone.  Yet some of those people have been “Backing up”.

A simple definition of “Backing Up” is a process that makes a copy of data onto a second device that can be used to restore that data if your primary copy is deleted or damaged.  A broader definition is any process that reduces your risk of losing data (files) or your system state (windows, settings).  I prefer to use a more global term, Backup System, a collection of backup processes or other elements working together to reduce risk of data loss and related harm.

You might reasonably believe that backing up is a simple process.  Before you run this process, your files are at risk of being lost, and afterwards, they are safe.  Run a backup, and it’s all good.  This type of binary thinking is prevalent even among IT professionals – Black and White, True and False, Risky and Safe.  Unfortunately, applying a binary worldview to backups will only get you into trouble by giving you a false sense of security.  Backups are not Black and White, they are Grey.

This article will disabuse you of false assumptions relating to backups, and introduce a conceptual framework you can use to design a Backup System and to protect your precious data.

Developing a Backup System is easy and effective if you use the right approach.  Clicking a button that say “backup” and hoping for the best, is only good for gamblers!

MrBackup

Backup Systems are about Risk Management

The key concept here is risk.  Most people have a decent, if subconscious understanding of risk.  The subconscious mind has habit of simplifying complex concepts and can mislead if you don’t consciously interrogate the concept.  So let’s consider, what we mean when we refer to risk.  Risk relates to:

  • the Harm you will take if you lose some of all files or system state, and
  • the Probability of losing some or all files or system state.

In a business context, you might add other “harm” that can relate to backups, such as downtime, or files finding their way to unauthorised people.

So Risk = Harm * Probability.  That seems simple.

But how do you quantify Harm?  Say you look at a tender you are working on, perhaps you know it will cost $500 to rewrite it, so you can assign a cost of losing the file with some accuracy.  What about the family photo album?  Hard to assign a $ amount to that.  You can probably make some rough estimate, but it is not possible to assign an exact value.  Priceless, perhaps.

What about the second element in the equation, the Probability (chance) of loss?  Probability can be very difficult to quantify.  What is the chance of your HDD failing, being infected by a virus that wipes your drive, throwing the whole thing out the nearest window when its misbehaving, and tougher still, what about disasters you have not even though of?  Again, you can only apply a ballpark figure on the likelihood of data loss.

The difficulty of determining the Risk Level that you are exposed to leads to another concept that is implicit with backups, but not often addressed explicitly.  Uncertainty.  Uncertainty, inherent in assessing risk, means that you can’t quantify your level of risk with accuracy, it necessitates a fudge factor, some safety margin to make sure you are not taking on too much risk.

Risk Level and Uncertainty lead us to our final concept, Acceptable Risk.

No backup system can reduce your risk of losing data to zero.  No such system is possible in our world.  Beware of anyone who tells you that their system is 100%!  Instead of aiming for zero risk, you should consider what your level of Acceptable Risk is, and weigh that against the cost to reduce your actual Risk Level.

Finally to the good news.  It is usually possible, with a little thought and attention, to vastly reduce your Risk Level inexpensively.  Developing an effective Backup System for a home or SME environment is about using available tools intelligently rather than spending a fortune.

Before we go into the How, we need to cover more abstract concepts that you can use to assess the backup methods you choose.  Again, without applying these concepts to critique your Backup System, it’s likely you will run into trouble and find you backups are not doing their job, inevitably when it is too late.

 

Develop your Backup System with Desirable Attributes

Certain attributes of a backups system tend to increase the likelihood that it will perform as desired.   When developing or assessing the quality of a backups system, you may want to consider the following attributes.

Simple as PossibleTo make life that little bit more difficult (this is about computers, after all), some of these characteristics contradict one another, so you must apply some common sense where a trade-off is necessary.

  1. Simple – Never add complexity for marginal benefit.

Convoluted backups systems fail more often than simple systems, because, by their nature, there is more to go wrong, with less visibility in how the system works.  Simplicity leads to our second attribute.

  1. Visible – Know where your stuff is and how the backup system works.

The first step is knowing where your important files are.  The second is knowing what process is used to backup those files.  The third step is being able to locate your files at your backup locations and verify that they are complete and viable.

  1. Automated – Make it work without human intervention.

Most data loss I encounter where there are no backups is followed by the line “I used to do it, just have not got around to it recently”.  The best systems should work even if you neglect it, but a word of warning, automated does not mean you can skip manually verifying that the system works.

  1. Independent – Multiple backup processes and data locations should be unrelated.

Processes that are less dependent on the same factors are less likely to fail on you at the same time.  You might use an image backup and a simple file copy backup on the same data, since a failure with one method will not necessarily result in the other also failing.  A backup located in another room is not as good as backup located in a different building, and implementing both Is better.

  1. Timely – Capacity to recover data that avoids damaging downtime.

StopwatchFor a business, downtime while you recover files can be costly.  Assess how long your system requires to restore files and systems and reduce that time where unacceptable.

  1. Cost Effective – Seek balance between cost and benefit.

Aim to find a sweet spot where the cost and effort put into your backups effectively reduces risk, and then stop.  Don’t fight your way to reduce risk just a little further when it requires massive extra cost, but also don’t be cheap and stop reducing risk when the cost to do so is minimal.

  1. Secure – Control access to sensitive data.

Consider the harm you will take if backed up data gets into the wrong hands.  Where the harm is significant, consider encryption and other security techniques.  Do not apply security without due consideration as increasing security techniques can, and usually will, increase the chance of your backup system failing.

 

Understand Concepts, Techniques, and set Objectives before you begin

Once you are comfortable with risk management, and the attributes you want to incorporate into a backup system, it is time to set objectives for your Backup System and how to achieve those objectives.

To develop a plan, you will need a grasp of:

  • Your data and its characteristics: size, location, live or closed files, live services etc
    • Include files and systems. Eg an accounting data file might be critical, but the installed accounting package might also be worthwhile to backup.
  • Importance/acceptable risk level related to identified data.
  • Related risks such as downtime and stolen data.
  • Storage devices available/desirable and capacity: external HDDs, NAS, cloud, etc
  • Backup tools available/desirable: Image creation tools, command line tools, VSS, etc
  • Techniques possible: file mirror, images, full/incremental/differential/continuous, scheduled tasks, verification, encryption, cleanup, etc
  • Contingency Plan – what can go wrong with backups and how can those risks be reduced.
  • Available budget

Finally, start designing your system.

This article has covered some of the high level concepts relating to backups such as risk and desirable attributes.  It has not covered the types of backups possible, storage devices, or techniques.  Follow up articles will cover these areas and provide walk through examples of backup systems for home and business.