Guide to Personal and Small Business Backups – Storage and Tools

Rack_StorageThis article will examine options for backup storage and tools, provide advice on how to choose between them, explain how they can be effectively employed, and give examples of common implementation pitfalls.

Prior articles have worked through the high level conceptual framework and technical concepts that relate to backup systems.

Related Articles:

01 GUIDE TO PERSONAL AND SMALL BUSINESS BACKUPS – CONCEPTUAL FRAMEWORK

02 GUIDE TO PERSONAL AND SMALL BUSINESS BACKUPS – TECHNICAL CONCEPTS

Backup Storage Infrastructure

Your backup system will copy all your important electronic Stuff from one or more storage locations to some other storage location.  It is all about storage, so it naturally follows that the choice of the storage used for backups has a major impact on the effectiveness of your system.

To help select the type of storage that best suits, you might review the desirable attributes of a backup system that I outlined in our first backup article and consider how selecting storage types will influence attributes of the final backup system.  As a reminder, the attributes were: simple, visible, automated, independent, timely, cost effective, and secure.

Hard Disk Drive (HDD) Storage

HDD_StackHDDs have been the mainstay of IT storage for decades.  The technology is slowly being replaced with SSDs but are still the most common primary storage and are one of the best options for backups.

HDDs allow random access, are fast, reliable, cheap, and generally have much going for them.

Internal Hard Disk Drives

Internal HDDs refer to the HDDs built into PCs and devices.  There are a number of options to incorporate internal drives into your backup system, though for the most part they play an incomplete role.

Where your PC has a single internal HDD, it will be of limited use as a backup drive.  In general, you don’t want to set a backup to the same physical device as the source files as it fails the test of independence – if the drive dies you lose source and backup files at the same time.

Internal_HDDThere are minor exceptions to this rule.  You could maintain copies of older and deleted files on the drive to offer limited protection against accidentally deleting or overwriting files.  You could direct an image backup to itself in circumstances where external storage may not be always available but you want frequent and regular automated snapshots available (remember to exclude the image location or recursion will run you out of space!).  In that case you would move the files to external storage when available allows.

Some PCs have more than one internal storage device, for example, you might use one fast HDD or a SSD for Windows and program files (for the speed) and a cheap, perhaps slower mechanical HDD with large capacity for other files or as a dedicated backup drive (for the cheap space and low cost).  A second drive adds options and given the minimal cost, I suggest adding a second internal HDD to a PC specifically for use in backups.

With a second internal HDD you could create a system image backup of the primary HDD to the second HDD and if the primary HDD fails, your backup will (hopefully) but still available on the second drive.  That design lets you schedule and run the image backup with certainty that the destination will be available (reliable automation), and provides some independence between source files and the backup, but it is still not great.

I have seen clients use this technique alone as their backup system only to lose their data when a power surge, virus, theft, or other event destroys the data on both drives.  The design is vulnerable to these significant risks because it still largely fails our test for independence, where the backup destination should be as far removed from the source files as possible.

To improve the independence of the destination drive, you might use an internal HDD in a computer on the same network rather than in the same machine by setting up a network share.  That’s a little better in terms of independence, but again it is not ideal as certain events can still destroy the data on both drives.

RAID stands for redundant array of independent disks and is another way to use multiple internal drives to reduce your risk of data loss.  One of the simple types of RAID is called a mirror.  A mirror uses two drives in an array with the system automatically mirroring any writes to the disk onto both disks in realtime.  In terms of operation, it looks like you are working with a single disk, but any time you save a file, it will be available on either disk.  If one disk fails, you won’t lose any data and in fact the PC will keep working like normal.

There are many other types of raid, some allowing for protection against one or more drives failing, but also some where if any drive fails, all your data will be lost.  You can set up a RAID array on a PC but I rarely suggest that as a good option as its cost/benefit tends to be marginal against other options.  The technology is most commonly used for server systems and storage arrays that use the hardware best suited to supporting raid arrays and in those environments I consider RAID to be essential.

Never confuse RAID with a complete backup solution; I have come across some spruikers who convince people that RAID is some magic technology that fully protects your data, never true.

The most common way to achieve a high level of independence for a backup system in a home or SME environment is to use multiple external HDDs.

External Hard Disk Drives

External HDDs are the bread and butter of home and SME backups.  They are awesome, and you should buy some!

WD_External_HDDThere are two basic types.  The physically larger drives, often called desktop HDDs, are 3.5” in size and will need a separate power supply (until USB-C drives become common).  The other type are physically smaller, often called portable HDDs, are 2.5” in size and can be powered from USB ports.  Value for money and for large capacity, the 3.5” drives are better with the smaller drives easier to cart around.  Either are fine for backups.

External drives come with various connectors.  The most common is USB.  Be aware that USB 2 drives are limited to about 35MB/s transfer rates, due to the limits of USB2.  Practically all current drives are USB 3.1 which allows for faster transfer, limited by the physical speed of the disk.  You will typically get 100+MB/s with USB3 drives so backups take much less time.  In terms of our preferred characteristics, “timely” means go with USB 3, though using an old USB 2 drive is fine as long as backups can still finish in a timely fashion.

You can get away with backing up to a single external drive, but your risk of losing data will be much higher than using two or more drives.  If you leave a single external HDD attached so it can take backups at any time, it may as well be an internal drive with the same vulnerabilities.  A single drive that you plug in only when backing means you need to plug it in manually every time you want to back up.  If you get lazy and leave the single drive plugged in, you will find a virus, power spike etc will kill your backup and source files at the same time, and you are stuffed.   If you don’t get around to the hassle of plugging it in to backup for a long time, then your PC HDD will die with all recent files lost.

Allocate at least 2 x external drives to your backup system and preferably three or more.  One can stay attached so scheduled backups work without thinking about it, and every now and then you should swap the attached drive with one stored elsewhere.  If you can afford a third or more drives, don’t swap them in a sequential cycle, leave one drive you swap in much less recently to allow you to keep some backups for a longer period on those and to reduce the risk that damaged files might be overwritten across all backups.

USB Pen Drives

Small, light, reliable, and increasingly large and fast, USB pen drives can be used as an alternative to external HDDs for backups.  At time of writing, they tend to be slower and smaller at a given price compared to a HDD, but where your backup needs are modest, a pen drives may do the job nicely.  Use them the same way you would a external HDD.

Solid State Disk Drives (SSD)

SSDs are slowly replacing mechanical HDDs in computing devices for their speed and (potentially) reliability advantages.  At time of writing they are still expensive for bulk storage and not generally recommended for backup solutions.  There are rare exceptions where their raw speed to shorten the period needed to run a backup makes them worthwhile, but for home and SME users, don’t buy SSDS for backups unless you have some special reason.

Network Attached Storage (NAS)

qnap_NASA NAS box is essentially a mini PC dedicated to file storage.  Most run a Linux OS with a web based GUI for setup and management and other features in the form of “apps”.  Their file storage can be accessed across your network, and even from outside your network.

You normally will buy a NAS without HDDs, and then populate the unit with size and brand you need.  It is important to match the unit with drives listed on the manufacturer’s compatibility list to ensure no glitches with the operation of the unit.  Drive manufacturers now make HDDs specifically for NAS units, like the WD Red range, and drives designed for NAS devices are normally your best choice over cheaper options.

RAID is a standard protocol used by NAS units, where all files are stored on at least two physical disks.  With this protection, if a drive fails, you won’t lose any data.  Remember when you add HDDs to a NAS with RAID redundancy enabled you will lose some capacity to allow the data to be replicated.

For home use, you might store some less important bulky files on the NAS given you have some protection with the RAID only (eg movies for media streaming), and additionally use the NAS as your primary backup device for image backups and/or file mirrors of your critical data.

If you buy a two bay NAS and add 2 x 4TB HDDs, you will only have total space of about 4TB available (a mirror), with three drives of 4TB you would have about 8TB, and similarly with 4 x 4TB about 12TB.  Also remember than drive manufacturers use a generous way to calculate capacity, so the NAS will report a little lower capacity than you might expect.

Some brands, such as Netgear, allow you to add drives as you need and have the available capacity automatically increased without need to wipe and recreate the array.  You can start with a 4 bay unit with 2 HDDs, then add a third and fourth as needed.

NAS units are attached to your network with a standard network cable and can be located in another room, or building, from your main devices.  They can be powered up and down remotely using wake on LAN commands.  They are excellent for automated backups and can act as a central backup location for all your devices.  For NAS units containing critical data, adding a small UPS and/or a surge protector is a good idea.

The main drawback of using a NAS as your only backup is while it is not physically attached to your devices, it is still prone to some of the events that could destroy its data and that of the originating device at the same time.  Power surges, theft, and some viruses are common risks.  One way around that issue is to rotate external HDDs attached to the NAS to take data from the NAS offsite.  You can also reduce the risks by using certain techniques, such as network passwords to prevent a virus that has access to your other PCs from accessing the device.

There are various limits and risks to using a NAS in your backup system but that can be a useful element in any backup system and I recommend them for most designs.

Cloud_backup

Cloud Storage

Let us all pause for a moment, and be thankful that our government vastly accelerated the rollout of massive bandwidth services by building an awesome NBN so we now lead the world in connectivity.  We can now easily work from home, backup everything into the cloud with a click, and offer our professional skills to a world market.

Oh, wait, sorry, delusion setting in again.  Happens when you spend too much time in this industry.  This is, after all, the Australian Government.  Let’s instead spend billions on roads so we can allow more people to move from A to B while producing nothing except pollution.  That’s productivity for you, Australia style.

Back to reality.  Cloud Storage refers to storage capacity you can access through the internet, normally third party storage but sometimes your own.  It’s a big deal nowadays as industry behemoths fight to get you on their cloud.  In theory, it’s a great way to back up your stuff.  Unfortunately, there is a big gotcha, the bottleneck that is your internet access.

You most likely have a low speed ADSL connection with upload speeds of under 100KB/s (uploads are much slower than downloads with ADSL).  That means it takes at least a few hours to upload a single gigabyte of data, while clogging up your internet connection so it’s barely usable for anything else.  Cloud backups are viable with slow connections, but limited and must be managed carefully.

So what is a cloud backup?  Nothing fancy, it just means that instead of using local storage, like an external HDD, you can use Cloud Storage to save your stuff.  It’s a great idea because the instant you have completed the backup, those files are offsite, and depending on service used, protected across multiple sites managed by professionals who are probably less likely to lose your Stuff than you are!

If you are one of the lucky people who enjoy a 100Mb/s or more upload service, great, then you are probably able to backup everything to the cloud.  For the rest of us with a low bandwidth internet connection, cloud backups are best used in a targeted way.  In other words, back up your small and important files rather than everything and use more traditional means alongside cloud backups.

“The Cloud” is a relatively new phenomenon and service providers are still working out viable business models.  New services appear, and disappear on a monthly basis.  For the most part, I suggest looking at services provided by the big guys such as Microsoft, Amazon, EMC, Google, and similar.  I expect most of the small players to be absorbed or disappear.

All we need to send backups to the cloud is available capacity.  It is not essential to sign up to a service that is specifically targeted at backups (though there are advantages with some designs).  The most common service available, and one you may already have access to without realising, is OneDrive, Microsoft’s cloud storage service.  If you have an Office 365 subscription, you will have access to a practically unlimited storage capacity on Microsoft’s servers that you can use to move files around, share stuff, and backup stuff.  OneDrive is not designed as a backup solution, but it can be used as part of a backup system where it sets up a folder on your PC and all files saved there are automatically uploaded to your cloud service.  Great for documents, not so viable for large files such as video or image snapshots.

Cloud storage services specifically developed for backups are also available and are more appropriate in a business environment.   Some, like Mozy (EMC) have been around a while, and most recently the other majors are aggressively moving into this market with Azure (Microsoft) and AWS offering various solutions.

Cloud backup probably should form part of your backup system, and in some cases can form the core of your design.

Other Storage Options

Tape Drives were, for many years, the go to backup option for business.  Tapes were cheap and relatively reliable but needed to be written to in a linear way.  I won’t go more into the details of tape drives, rather than simply say, don’t use tape drives.  On a small scale tapes drives are messy and unreliable compared to other options.

SAN arrays are like NAS units but further up the food chain.  For medium and larger business, a SAN in your backup system makes sense, often including replication to offsite SANs at a datacentre or a site dedicated to disaster recovery.  If you need this sort of system, you probably have your own IT people who can setup and manage and they are a bit beyond the scope of this article.

Others?  Yes there are even more options, but I think that about covers the most common options.

Backup and Archiving Longevity

LongevityI once found a decade old stack of floppy disks, my primary backup store during my Uni days.  I went through all of them to make copies of old documents and photos and was surprised to find almost half of them still had retrievable data.  At that age I expected them to all be dead (Verbatim, quality FDDs!).  There was nothing critical on them, but it’s an interesting lesson, you can’t afford to set and forget any data.

Remember when writable CDs emerged?  The media were reporting how this awesome optical technology would allow data to be archived for least 100 years.  Only a few years later we had clients bringing disks in to us after failing to retrieve “archived” data with the disks physically falling apart.

Will your data be there when you need it?  The failure rates of modern storage hardware is low, but physical stuff never lasts forever and a realistic lifespan can be difficult to predict.  It is likely that the external HDD you have sitting in the cupboard for the last five years will power up when plugged in, but the longer you leave it, the more chance that the device or data on it will be gone.

Keep any data you may need on newish devices, and replicated on multiple devices.  When that old external HDD is just too small to fit all your backups, perhaps keep it with that old set of data on it and chuck it in a cupboard but copy at least the critical files to a new, larger device as well.  Cloud based storage may be an option for long term storage, but trusting others to look after your stuff also introduces risk, so ensure you manage that risk.  Hint: free is bad and companies (especially start-ups) and the data they hold can disappear with little notice.

If you produce too much data to cost effectively maintain all the data on new devices, give careful thought on how best to store “archived” data and weigh the risks of data loss against cost of storage.

Backup (Software) Tools

toolsThere are a large number of software tools that you can use to build a backup system.  Do not fall into the trap of assuming that throwing money at a product will lead to a desirable result, though at the same time don’t rule out a high cost commercial option where it’s a good fit.

Google is your friend.  Look around online and check what the professionals use.  Making use of unpopular, emerging, or niche products is sometimes OK, but only adopt such tools where you see substantiative advantage in your environment.  In general, go with what everyone else uses to get a particular job done.  This will reduce your risk.

Consider the attributes of a backup system that I outlined in our first backup article and relate them to outcomes possible with the various tools: simple, visible, automated, independent, timely, cost effective, and secure.

Block Level (Image) Backup Tools

A block level backup tool is able to copy all data on a storage device, including open and system files, so you can be sure to get all your files stored on a partition or disk.

Windows has a basic imaging tool built in, though I’m not a fan of its interfaces limited features.  There are some better free tools available, such as AOEMI Backupper, and a wide range of paid tools such as Acronis and ShadowProtect.  The free tools such as Backupper are adequate in many situations, though their features tend to be more limited and you may need to use supplementary tools when handling related functions such as retention and cleanup.

With any block level tool you intend to use, look for features including:

  • Support for Full and incremental backups (and differential if you need it, but you probably don’t)
  • Automate scheduled backups.
  • Options to encrypt and/or compress backups.
  • Process to verify condition of backup archive (test if files are damaged)
  • Fast mounting of image files.
  • Replication (copy images to additional locations)
  • Retention (automatically clean up older backups to manage space based on age and/or size basis)
  • Ability to exclude specific files or folders. This is very handy, and not offered with all image tools so pay particular attention to this one.
  • Bare metal restore to different hardware.
  • Support for “continuous” backups and related consolidation and retention (advanced feature where frequent incrementals are merged into the archive and older files stripped out to manage space – excellent when uploading images offsite via the Internet)
  • Deduplication (useful for larger sites – eg if you back up a dozen windows desktops, but only store one of each system file instead of 12 to save a lot of space)
  • Central Management (manage backups across multiple devices from a single interface. Important for large sites)
  • Ability to mount and run image of backup files in a VM.

You probably don’t need all of these features, and some can be implemented outside the program.  For example, you could use robocopy and windows task scheduler for replication.  Don’t just tick off features, go with a product that does what you need reliably.

There are many implementation tricks that may not be obvious.  A common possible issue is when you create an image on an attached HDD, then swap the drive, you will at best end up with a different set of backups on each drive, maybe acceptable but not ideal.  Instead it is often better to create the archive in a location that’s always available, such as a NAS or internal HDD, then replicate the entire set to the external drives.  Think through what you need to achieve and make sure the tool you select can support those outcomes.

A block level backup tool should be used in nearly all backup systems.

File Level Backup Tools

A file level backup tool can be any software that lets you copy files.  The Windows File Explorer could be consider a file backup tool.  To be more useful however, we need to look at additional features such as

  • automation,
  • compression,
  • encryption,
  • incremental backups,
  • retention,
  • and others depending on your needs.

File level backups can be very simple, quite transparent, and very reliable.  This type of backup process is excellent to backup discrete files where you are not concerned about backing up locked files or keeping old versions and deleted files.  They can also be used as a replication tool to copy image backups.

My favourite file level backup tool is one you probably already have, a program called Robocopy that is built into windows and accessed by the command line.  Its quite a powerful utility that can be automated with use of a batch file and the task scheduler.  If you are not comfortable with the command line or creating *.bat files, a better option many be one of the many free graphical interface based utilities, or a GUI shell for Robocopy.  Rather than list the many options, I suggest using google to find recommendations from reputable sources (try searching google for “Robocopy GUI”).   There are many other similar tools, Fastcopy is another we occasionally find useful.

File level tools may be adequate for a very basic backup system, where you don’t care about backing up windows, applications, or locked files, but for the most part they should be used alongside block level image backup tools.

Batch Files and the Task Scheduler

A batch file is a simple text file you can create in notepad saved with the .bat extension in place of .txt.  If you double click the file, windows will read the file one line at a time and try to run the command listed on each line in order.

A batch file can be used to automate file level backups or replication when you set to run on a schedule with the windows task scheduler.  For example, if you typed a line something like robocopy d:\MyFiles f:\MyFiles /e /purge and ran it within a batch file, you could mirror your files to a different drive.

If you get a bit more creative you can use the technique for many useful functions including backup systems that retain older and deleted files, and to manage the file retention of image backups.  You can also look at Powershell and other scripting options to implement more advanced backup designs.

Emergency Measures

Designing a backup system is all well and good, but if its too late or your backup system has failed, is there anything you can do?

Sometimes.

Deleted files on a mechanical hard disk can often be retrieved by file recovery tools such as Recurva.  On a SSD you may be out of luck as with modern SSDs the old files are actively scrubbed shortly after being deleted.

Copies of files may be located in places you would not expect, cached files, online services.

A failed mechanical HDD will usually contain data that can be retrieved.  Data recovery experts may be able to help, however costs are often in the $1000s.

If you look to have lost important files, leave the device powered down and ring us.

Bringing it all Together

This third part of our Guide to Personal and Small Business Backups outlined the Storage and Tools commonly by Backup systems.  Prior articles have covered the high level conceptual framework around which you can build an efficacious backup system, and many of the technical concepts you need to develop assess an appropriate backup design.

Our final article in this series will get to the nitty gritty by presenting and explaining solutions in detail as they relate to common home and small business environments.



Guide to Personal and Small Business Backups – Technical Concepts

backup_reverseThis article builds on the high level conceptual framework introduced in our previous backup article.

I will explain technical concepts and related terminology to help you design a Backup System for use in business or home.

Related Articles:

01 GUIDE TO PERSONAL AND SMALL BUSINESS BACKUPS – CONCEPTUAL FRAMEWORK

 

Backup Methods Classified by Viewpoint

We can categorise backup processes into:

  • Backups that target files and folders (file level backups)
  • Backups that target full disks or partitions (block level backups)

For the most part, these types of backups are distinct in technology used, limitations, risks, and most importantly, outcomes.  I will try to clarify the differences and why they should be taken into account when developing a backup system.

File level backups aim to make a copy of discrete files, such as your photos or documents.  This type of backup focuses on each file as a discrete unit of data.  When you copy your documents across to an external HDD, you are implementing a file level backup.

Block level backups are a little different and often misunderstood, resulting in backup designs that fail to protect your data. The aim of a block level backup is to copy all the blocks of data contained on a partition.  It is important to gain an understanding of how files are stored, and what we mean by a disk, partition, and a block in order to make appropriate decisions regarding your block level backup design, so let’s cover the basics.

blocks

A disk stores data in small chunks, which can be referred to as blocks.  When you save a file, it will be cut up into small pieces that will fit in the blocks available on the disk.  The locations of blocks used by a file are recorded in the index along with the name of the file.  When a file needs to be opened, the index is used to find out which blocks need to be read and stitched back together.  In the above image, you might consider that a big file has been split across all the blue blocks with other squares representing other blocks on the disk.

A “block”, in the context of a block level backup, refers to one of those same sized portions of your disk.  A block may, or may not, contain a piece of a file.  It may in fact be blank or contain old data from a file you have deleted (this is why deleted files can sometimes be retrieved).

You will encounter the term “partition” or “disk partition” when setting up a block level.  A partition is the name given to the part of a physical disk that is set up with its own index and set of blocks to contain files.  It is possible to set up many partitions on a single physical disk, but often each disk will only have one visible partition and so people tend to use the terms disk and partition interchangeably.  C: for example, is a partition but it also might be called a drive or disk.

The below image shows two physical disks and the partitions located on each disk.  Note the partition with C: also has two hidden partitions, the first to help with the boot process to the Windows program located on C:.  The second disk has just the one partition, represented by D:  The critical information is in the C: and D: partitions, but it is normally best to backup the lot to make system recovery easier.

Backup_Partitions

Block level backups don’t worry about the content of the blocks, they just copy all the blocks of data and in doing so just happen to capture all files on the backed up partition.  Most often the backup will skip blocks that the index says has no current files in them to save time and space.  While this method sounds more complex, it is pretty simple from the user’s perspective and is comprehensive where file level backups are more prone to miss important data.

Block level backups introduce some issues that can result in backup failures, but also advantages, such as the ability to backup open files using another layer of technology.  That’s leads us into a funky new(ish) Microsoft technology called VSS.

 

How to Backup Open Files including System State – VSS

Have you ever tried copying a file that was in use?  In most cases, windows won’t let you copy an open file; it is “locked” while open to prevent potential damage to the file.  When you start windows or any program, many files will be opened and cannot be copied safely using basic file level methods.  For this reason, most file level backups will fail to backup open files including programs and windows files.

Block level backups support a technology that allow blocks to copy while their related files are in use.  This means they can backup your operating system and program files and allow a complete restore of your system state and personal files.

The technology used to backup open files is implemented by the Volume Snapshot Service under Windows systems (VSS, also called Volume Shadow Copy).  A snapshot refers to the state of the disk at a moment in time, as the technology attempts to maintain access to the data at that moment.  Once the snapshot is made, usually taking only moments, the system can continue to read and write files, so you can keep using the computer.  The system will preserve any data that would otherwise be overwritten after the snapshot, so it is accessible to the continuing backup process, and new data is tracked and not backed up, preventing any inconsistency.  This is not a trivial process and things can go wrong.

VSS

VSS incorporates a number of parts, including “writers” specific to various programs designed to ensure their disk writes are complete and consistent at the point the snapshot is taken.  For example, a database writer would ensure all transactions that might be just partly written are complete before the snapshot, removing the risk of a corrupt database on restoring the backup.  Certain types of programs need a specific writer to support them and if they fail a “successful” backup can contain damaged files.  Sometimes, part of the VSS service can fail.

VSS is mature and works well in most cases, but you will still find that a VSS backup is more likely to fail than a simple file copy backup.

Some backup tools that focus on backing up particular files and folders can also use VSS.  This blurs my definitions, since these use similar technology to block level backup but with the focus of a file level backup.  This hybrid approach is worthwhile in cases where you want the advantages of a block level backup but want to exclude backing up some files, such as for example large videos you don’t care about.  You should still keep in mind that VSS adds complexity, buts it’s OK to use VSS only backups where you have the technology available and you are careful verifying your backups.

 

Backup Archives Classified Archive Types

archiveNow we have an idea of the technology behind block level backups, I will go over the rudiments of backup archive types.  These concepts can apply to file or block level backups, but they tend to be more related to block level processes.

When you setup a new backup process, the first backup you typically perform is a “full” backup, including all data present on the source.  Subsequent backups can vary.  You can back all your files each time, or copy just those files that have changed or are new.  There are more options than you might realise.  I will address the terminology that refers to these methods and outline typical use.

 

Backup Set/Archive:  A set of files, that when considered as a whole, include a complete set of backed up files over a period of time.

A backup set created by a dedicated backup program will often generate one file per backup, containing all files or data captured during the backup.  A backup set will then normally contain a number of files over time, but they won’t look like the original files.  It is important that you check these sets and ensure actually contain your files.  Don’t just assume all your stuff is in there.  Size is a good hint, if they are far smaller than your files, something is wrong.

If you have setup a simple file level backup, the archive set might be included in dedicated container files, such as a zipped file, or might be a simple mirror of the original files.  I like methods that result in a simple mirror of your files for home backups, as it lets you quickly check what is in your backup set and is less prone to mistakes.

Full Backup:   Take a complete, new copy of all files or data on your source and adds them to your destination storage.

A word of caution here, a “Full Backup” does not mean you have backed up all your data.  It depends on what you have selected for the process to backup.  Make sure you know where your data is before setting up your backup system and do not trust programs that try to select your data for you, they may miss important files.  This comes back to concepts in my first article, make sure you have visibility concerning your backups.

 

Incremental Backup: Backs up new and changed files since the last backup.

An incremental backup will normally be much smaller than the full backup, and commensurately faster to complete.  Using incremental backups is recommended where you add or change relatively few files over time.

When you make an incremental backup, it is dependent on any prior incremental backups as well as the original full backup, so if any of the files in the chain are damaged or lost, you will lose data.  In theory you could take one full backup and then nothing but incremental for years – don’t, create a new full every now and then.

As a safety precaution, if a backup program tries to create a new incremental backup and can’t find the dependant full backup, it will normally try to create a new full backup.

Backup_Incremental

Differential Backup:  Like an incremental backup, but backs up all new and changed files since the last full backup

A differential is less commonly used than incrementals.  They play a role where you have relatively large incremental backups to help manage space as they let you delete some older incremental backups without needing a new full backup.

 

Continuous Backup: A misleading term that normally refers to a chain of frequent incremental backups that are later stitched into a single backup archive.

Continuous backups are a more advanced function only available on business grade backup solutions.  Incremental backups are incorporated into the original full backup by a cleanup process, and the oldest data may be stripped out to keep the size under control.

Continuous backups add complexity with the consolidation and cleanup process, but they have significant advantages by avoiding the load placed on systems by running full backups, and are ideal for immediate offsite backups where small incremental backups can be transferred via the internet to give you an immediate offsite copy.

Advanced systems can go one step further and use the backup image to mount an instance of backed up systems running as a virtual machine in the cloud.  Great for continuity and disaster recovery.

Common Backup Settings and Options

Once you have decided on the type of backup archive, or more likely a combination of archive types, you need to determine how the process will operate, and when.

 

Backup Schedule: Set an automatic schedule for your backups

A backup schedule usually involves a combination of archive types set to appropriate frequency.  It is important to schedule backups at times when your backup destination will be available and where the computer will be on.  If you miss an automated backup, you can always trigger a manual one as needed to cover it.

There are many different interfaces used in backup programs and it is usually worth looking at the advanced or custom options to ensure your schedule is set correctly, rather than going with default settings.

A common schedule would be a daily incremental backup, with a new full backup about every month or three.

 

Retention and Cleanup:  Manage your backup archives to remove old backups in order to maintain space for new backups.

It is very important to consider how long you need access to backed up files.  For example, if you delete a file today, or perhaps a file is damaged and you may not notice, how many days or months do you want to keep the old version in the backup archive?  Forever is great, except you need to consider how much space that requires!

You should also consider possible problems or damage that might impact your backups.  When operating with full backups, its best you keep the old backup set till after a new one has been created, just in case the new one fails and you are left with no backups.  You can bet that’s when your HDD will die.

Given that a typical backup system might involve an infrequent and very large full backup and a more frequent and smaller incremental backup, then carefully considering your retention plan can save a lot of space.  Below is an example of how you might set retention (I suggest more time between full backups than this example, a month is probably reasonable, but depends on your situation)

Retention_02

In the above example, a full backup is run Mondays with all other days set to incremental backups.  Disk space is limited on the backup hard disk, and lets assume that it can’t fit more than two full backups and some incrementals.  With the retention period set to six days, backups will sometime be kept for more than six days where backups within the six day period are dependant on older incremental or full backups.

In the above example we have 12 days of backups stored and two full backups.  If the system deleted all backups before Sunday, then the Sunday backup would be useless.  The system will be smart enough not to (hopefully!).  At this point, the backup disk will be near full with inadequate space available to create a new full backup, but consider what happens just before the third full backup is due.

Retention_03

The idea above is once the older incremental backup is no longer within the retention period, we can clean up (delete) all of the oldest backup set in one go.

In this way the old set is kept as long as possible, but is deleted before the next full is due, so the backup program does not run out of space on the following Monday.

See any possible issues with this retention? Any mistakes?

There are a number you should consider.  Setting a tight schedule like this may not work as expected.  How does the program interpret 6 day retention?  Is in inclusive or exclusive when it counts back?  What happens if you set it to 5 or 7 days?  What happens if the cleanup task runs before, or after the backup task on a given day (that’s particularly important and a common mistake).

You must check that the system works as planned by manually checking that backups clean up the way you plan on the days you plan.  Failure to verify your system will inevitably result in a flaw you may fail to notice and leave you vulnerable.

 

Compression: A mathematical process to reduce the space used by your backups.

When setting up the most basic file level backup, you probably won’t use compression, but every other backup will typically compress your files to save space.  This is a good idea and you normally want to go with default settings.

Most photos and videos are compressed as part of their file standard, and additional compression won’t help.  For some files that are inefficiently stored where their information content is much less than their file size, various compression schemes can save a tremendous amount of space.

monkey_compress

Encryption:  A mathematical process based around a password that scrambles the file so its information is not available unless the password is used as the key to unscramble the file.

Modern encryption cannot be broken as long as a secure and appropriate algorithm and password is used.  Passwords like “abc123” can easily be guessed or “brute forced” but passwords like “Iam@aPass66^49IHate!!TypingsTuff” are not going to be broken unless the attacker can find the password through other means.

Encryption is dangerous!  If you lose the password, your backups are useless.  If part of the file is damaged, you probably won’t be able to get back any of it.  It adds another layer of things that can go wrong.  These risks are relatively small, so encryption is a good idea where your backups contain sensitive information, but if its just a backup of the family photo album, I suggest you don’t encrypt.

 

VSS:  A technology that is related to block level backups and allows files to be backed up while open and in use

You should normally enable VSS, however, if you find errors with VSS causing backups to fail, it is OK to turn off in some situations.  Make sure you understand what you lose if you turn off VSS eg database backups may fail.

 

Intelligent Sector Backup:  You may see this idea under a number of terms for partition and full disk backups.  The option prevents blocks with deleted files or blank space to be backed up and so saves a lot of space.  You normally want this on.

 

Archive splitting:  Many backup programs can split up backup archives into smaller files.

This was traditionally used where backups were split across limited disk media such as CDROMS and is not usually relevant where we are backing up to external HDDs, NAS boxes, or other storage with plenty of space.

 

Notifications:  Most backup programs will send you an email on success or on fail of a backup process.

It is best to have the program send you a message on each backup, but you will find they are annoying, and you just delete them.  That’s OK, at least you will notice after a while if the messages stop.  Understand that a message that the backup failed is handy and you are more likely to notice, but the program can always fail in various ways so you never get that message.

Do not rely on fail messages or assume their lack means your backups are running.  Manually verify backups from time to time.

 

So, when do we get to the nitty gritty?

Sorry.  We are getting there!

In the next article I will outline the hardware and common tools that may form part of your backup system, then in the final article I will go through the nitty gritty and some examples of home and small business backup.

Guide to Personal and Small Business Backups – Conceptual Framework

ScreamToo often I see our techs consoling a despondent customer, in tears, having irretrievably lost precious files.  Family photos.  Business records.  Blog articles (!). All gone.  Yet some of those people have been “Backing up”.

A simple definition of “Backing Up” is a process that makes a copy of data onto a second device that can be used to restore that data if your primary copy is deleted or damaged.  A broader definition is any process that reduces your risk of losing data (files) or your system state (windows, settings).  I prefer to use a more global term, Backup System, a collection of backup processes or other elements working together to reduce risk of data loss and related harm.

You might reasonably believe that backing up is a simple process.  Before you run this process, your files are at risk of being lost, and afterwards, they are safe.  Run a backup, and it’s all good.  This type of binary thinking is prevalent even among IT professionals – Black and White, True and False, Risky and Safe.  Unfortunately, applying a binary worldview to backups will only get you into trouble by giving you a false sense of security.  Backups are not Black and White, they are Grey.

This article will disabuse you of false assumptions relating to backups, and introduce a conceptual framework you can use to design a Backup System and to protect your precious data.

Developing a Backup System is easy and effective if you use the right approach.  Clicking a button that say “backup” and hoping for the best, is only good for gamblers!

MrBackup

Backup Systems are about Risk Management

The key concept here is risk.  Most people have a decent, if subconscious understanding of risk.  The subconscious mind has habit of simplifying complex concepts and can mislead if you don’t consciously interrogate the concept.  So let’s consider, what we mean when we refer to risk.  Risk relates to:

  • the Harm you will take if you lose some of all files or system state, and
  • the Probability of losing some or all files or system state.

In a business context, you might add other “harm” that can relate to backups, such as downtime, or files finding their way to unauthorised people.

So Risk = Harm * Probability.  That seems simple.

But how do you quantify Harm?  Say you look at a tender you are working on, perhaps you know it will cost $500 to rewrite it, so you can assign a cost of losing the file with some accuracy.  What about the family photo album?  Hard to assign a $ amount to that.  You can probably make some rough estimate, but it is not possible to assign an exact value.  Priceless, perhaps.

What about the second element in the equation, the Probability (chance) of loss?  Probability can be very difficult to quantify.  What is the chance of your HDD failing, being infected by a virus that wipes your drive, throwing the whole thing out the nearest window when its misbehaving, and tougher still, what about disasters you have not even though of?  Again, you can only apply a ballpark figure on the likelihood of data loss.

The difficulty of determining the Risk Level that you are exposed to leads to another concept that is implicit with backups, but not often addressed explicitly.  Uncertainty.  Uncertainty, inherent in assessing risk, means that you can’t quantify your level of risk with accuracy, it necessitates a fudge factor, some safety margin to make sure you are not taking on too much risk.

Risk Level and Uncertainty lead us to our final concept, Acceptable Risk.

No backup system can reduce your risk of losing data to zero.  No such system is possible in our world.  Beware of anyone who tells you that their system is 100%!  Instead of aiming for zero risk, you should consider what your level of Acceptable Risk is, and weigh that against the cost to reduce your actual Risk Level.

Finally to the good news.  It is usually possible, with a little thought and attention, to vastly reduce your Risk Level inexpensively.  Developing an effective Backup System for a home or SME environment is about using available tools intelligently rather than spending a fortune.

Before we go into the How, we need to cover more abstract concepts that you can use to assess the backup methods you choose.  Again, without applying these concepts to critique your Backup System, it’s likely you will run into trouble and find you backups are not doing their job, inevitably when it is too late.

 

Develop your Backup System with Desirable Attributes

Certain attributes of a backups system tend to increase the likelihood that it will perform as desired.   When developing or assessing the quality of a backups system, you may want to consider the following attributes.

Simple as PossibleTo make life that little bit more difficult (this is about computers, after all), some of these characteristics contradict one another, so you must apply some common sense where a trade-off is necessary.

  1. Simple – Never add complexity for marginal benefit.

Convoluted backups systems fail more often than simple systems, because, by their nature, there is more to go wrong, with less visibility in how the system works.  Simplicity leads to our second attribute.

  1. Visible – Know where your stuff is and how the backup system works.

The first step is knowing where your important files are.  The second is knowing what process is used to backup those files.  The third step is being able to locate your files at your backup locations and verify that they are complete and viable.

  1. Automated – Make it work without human intervention.

Most data loss I encounter where there are no backups is followed by the line “I used to do it, just have not got around to it recently”.  The best systems should work even if you neglect it, but a word of warning, automated does not mean you can skip manually verifying that the system works.

  1. Independent – Multiple backup processes and data locations should be unrelated.

Processes that are less dependent on the same factors are less likely to fail on you at the same time.  You might use an image backup and a simple file copy backup on the same data, since a failure with one method will not necessarily result in the other also failing.  A backup located in another room is not as good as backup located in a different building, and implementing both Is better.

  1. Timely – Capacity to recover data that avoids damaging downtime.

StopwatchFor a business, downtime while you recover files can be costly.  Assess how long your system requires to restore files and systems and reduce that time where unacceptable.

  1. Cost Effective – Seek balance between cost and benefit.

Aim to find a sweet spot where the cost and effort put into your backups effectively reduces risk, and then stop.  Don’t fight your way to reduce risk just a little further when it requires massive extra cost, but also don’t be cheap and stop reducing risk when the cost to do so is minimal.

  1. Secure – Control access to sensitive data.

Consider the harm you will take if backed up data gets into the wrong hands.  Where the harm is significant, consider encryption and other security techniques.  Do not apply security without due consideration as increasing security techniques can, and usually will, increase the chance of your backup system failing.

 

Understand Concepts, Techniques, and set Objectives before you begin

Once you are comfortable with risk management, and the attributes you want to incorporate into a backup system, it is time to set objectives for your Backup System and how to achieve those objectives.

To develop a plan, you will need a grasp of:

  • Your data and its characteristics: size, location, live or closed files, live services etc
    • Include files and systems. Eg an accounting data file might be critical, but the installed accounting package might also be worthwhile to backup.
  • Importance/acceptable risk level related to identified data.
  • Related risks such as downtime and stolen data.
  • Storage devices available/desirable and capacity: external HDDs, NAS, cloud, etc
  • Backup tools available/desirable: Image creation tools, command line tools, VSS, etc
  • Techniques possible: file mirror, images, full/incremental/differential/continuous, scheduled tasks, verification, encryption, cleanup, etc
  • Contingency Plan – what can go wrong with backups and how can those risks be reduced.
  • Available budget

Finally, start designing your system.

This article has covered some of the high level concepts relating to backups such as risk and desirable attributes.  It has not covered the types of backups possible, storage devices, or techniques.  Follow up articles will cover these areas and provide walk through examples of backup systems for home and business.