Guide to Personal and Small Business Backups – Storage and ToolsAdam Oswald
This article will examine options for backup storage and tools, provide advice on how to choose between them, explain how they can be effectively employed, and give examples of common implementation pitfalls.
Prior articles have worked through the high level conceptual framework and technical concepts that relate to backup systems.
Backup Storage Infrastructure
Your backup system will copy all your important electronic Stuff from one or more storage locations to some other storage location. It is all about storage, so it naturally follows that the choice of the storage used for backups has a major impact on the effectiveness of your system.
To help select the type of storage that best suits, you might review the desirable attributes of a backup system that I outlined in our first backup article and consider how selecting storage types will influence attributes of the final backup system. As a reminder, the attributes were: simple, visible, automated, independent, timely, cost effective, and secure.
Hard Disk Drive (HDD) Storage
HDDs allow random access, are fast, reliable, cheap, and generally have much going for them.
Internal Hard Disk Drives
Internal HDDs refer to the HDDs built into PCs and devices. There are a number of options to incorporate internal drives into your backup system, though for the most part they play an incomplete role.
Where your PC has a single internal HDD, it will be of limited use as a backup drive. In general, you don’t want to set a backup to the same physical device as the source files as it fails the test of independence – if the drive dies you lose source and backup files at the same time.
There are minor exceptions to this rule. You could maintain copies of older and deleted files on the drive to offer limited protection against accidentally deleting or overwriting files. You could direct an image backup to itself in circumstances where external storage may not be always available but you want frequent and regular automated snapshots available (remember to exclude the image location or recursion will run you out of space!). In that case you would move the files to external storage when available allows.
Some PCs have more than one internal storage device, for example, you might use one fast HDD or a SSD for Windows and program files (for the speed) and a cheap, perhaps slower mechanical HDD with large capacity for other files or as a dedicated backup drive (for the cheap space and low cost). A second drive adds options and given the minimal cost, I suggest adding a second internal HDD to a PC specifically for use in backups.
With a second internal HDD you could create a system image backup of the primary HDD to the second HDD and if the primary HDD fails, your backup will (hopefully) but still available on the second drive. That design lets you schedule and run the image backup with certainty that the destination will be available (reliable automation), and provides some independence between source files and the backup, but it is still not great.
I have seen clients use this technique alone as their backup system only to lose their data when a power surge, virus, theft, or other event destroys the data on both drives. The design is vulnerable to these significant risks because it still largely fails our test for independence, where the backup destination should be as far removed from the source files as possible.
To improve the independence of the destination drive, you might use an internal HDD in a computer on the same network rather than in the same machine by setting up a network share. That’s a little better in terms of independence, but again it is not ideal as certain events can still destroy the data on both drives.
RAID stands for redundant array of independent disks and is another way to use multiple internal drives to reduce your risk of data loss. One of the simple types of RAID is called a mirror. A mirror uses two drives in an array with the system automatically mirroring any writes to the disk onto both disks in realtime. In terms of operation, it looks like you are working with a single disk, but any time you save a file, it will be available on either disk. If one disk fails, you won’t lose any data and in fact the PC will keep working like normal.
There are many other types of raid, some allowing for protection against one or more drives failing, but also some where if any drive fails, all your data will be lost. You can set up a RAID array on a PC but I rarely suggest that as a good option as its cost/benefit tends to be marginal against other options. The technology is most commonly used for server systems and storage arrays that use the hardware best suited to supporting raid arrays and in those environments I consider RAID to be essential.
Never confuse RAID with a complete backup solution; I have come across some spruikers who convince people that RAID is some magic technology that fully protects your data, never true.
The most common way to achieve a high level of independence for a backup system in a home or SME environment is to use multiple external HDDs.
External Hard Disk Drives
External HDDs are the bread and butter of home and SME backups. They are awesome, and you should buy some!
There are two basic types. The physically larger drives, often called desktop HDDs, are 3.5” in size and will need a separate power supply (until USB-C drives become common). The other type are physically smaller, often called portable HDDs, are 2.5” in size and can be powered from USB ports. Value for money and for large capacity, the 3.5” drives are better with the smaller drives easier to cart around. Either are fine for backups.
External drives come with various connectors. The most common is USB. Be aware that USB 2 drives are limited to about 35MB/s transfer rates, due to the limits of USB2. Practically all current drives are USB 3.1 which allows for faster transfer, limited by the physical speed of the disk. You will typically get 100+MB/s with USB3 drives so backups take much less time. In terms of our preferred characteristics, “timely” means go with USB 3, though using an old USB 2 drive is fine as long as backups can still finish in a timely fashion.
You can get away with backing up to a single external drive, but your risk of losing data will be much higher than using two or more drives. If you leave a single external HDD attached so it can take backups at any time, it may as well be an internal drive with the same vulnerabilities. A single drive that you plug in only when backing means you need to plug it in manually every time you want to back up. If you get lazy and leave the single drive plugged in, you will find a virus, power spike etc will kill your backup and source files at the same time, and you are stuffed. If you don’t get around to the hassle of plugging it in to backup for a long time, then your PC HDD will die with all recent files lost.
Allocate at least 2 x external drives to your backup system and preferably three or more. One can stay attached so scheduled backups work without thinking about it, and every now and then you should swap the attached drive with one stored elsewhere. If you can afford a third or more drives, don’t swap them in a sequential cycle, leave one drive you swap in much less recently to allow you to keep some backups for a longer period on those and to reduce the risk that damaged files might be overwritten across all backups.
USB Pen Drives
Small, light, reliable, and increasingly large and fast, USB pen drives can be used as an alternative to external HDDs for backups. At time of writing, they tend to be slower and smaller at a given price compared to a HDD, but where your backup needs are modest, a pen drives may do the job nicely. Use them the same way you would a external HDD.
Solid State Disk Drives (SSD)
SSDs are slowly replacing mechanical HDDs in computing devices for their speed and (potentially) reliability advantages. At time of writing they are still expensive for bulk storage and not generally recommended for backup solutions. There are rare exceptions where their raw speed to shorten the period needed to run a backup makes them worthwhile, but for home and SME users, don’t buy SSDS for backups unless you have some special reason.
Network Attached Storage (NAS)
A NAS box is essentially a mini PC dedicated to file storage. Most run a Linux OS with a web based GUI for setup and management and other features in the form of “apps”. Their file storage can be accessed across your network, and even from outside your network.
You normally will buy a NAS without HDDs, and then populate the unit with size and brand you need. It is important to match the unit with drives listed on the manufacturer’s compatibility list to ensure no glitches with the operation of the unit. Drive manufacturers now make HDDs specifically for NAS units, like the WD Red range, and drives designed for NAS devices are normally your best choice over cheaper options.
RAID is a standard protocol used by NAS units, where all files are stored on at least two physical disks. With this protection, if a drive fails, you won’t lose any data. Remember when you add HDDs to a NAS with RAID redundancy enabled you will lose some capacity to allow the data to be replicated.
For home use, you might store some less important bulky files on the NAS given you have some protection with the RAID only (eg movies for media streaming), and additionally use the NAS as your primary backup device for image backups and/or file mirrors of your critical data.
If you buy a two bay NAS and add 2 x 4TB HDDs, you will only have total space of about 4TB available (a mirror), with three drives of 4TB you would have about 8TB, and similarly with 4 x 4TB about 12TB. Also remember than drive manufacturers use a generous way to calculate capacity, so the NAS will report a little lower capacity than you might expect.
Some brands, such as Netgear, allow you to add drives as you need and have the available capacity automatically increased without need to wipe and recreate the array. You can start with a 4 bay unit with 2 HDDs, then add a third and fourth as needed.
NAS units are attached to your network with a standard network cable and can be located in another room, or building, from your main devices. They can be powered up and down remotely using wake on LAN commands. They are excellent for automated backups and can act as a central backup location for all your devices. For NAS units containing critical data, adding a small UPS and/or a surge protector is a good idea.
The main drawback of using a NAS as your only backup is while it is not physically attached to your devices, it is still prone to some of the events that could destroy its data and that of the originating device at the same time. Power surges, theft, and some viruses are common risks. One way around that issue is to rotate external HDDs attached to the NAS to take data from the NAS offsite. You can also reduce the risks by using certain techniques, such as network passwords to prevent a virus that has access to your other PCs from accessing the device.
There are various limits and risks to using a NAS in your backup system but that can be a useful element in any backup system and I recommend them for most designs.
Let us all pause for a moment, and be thankful that our government vastly accelerated the rollout of massive bandwidth services by building an awesome NBN so we now lead the world in connectivity. We can now easily work from home, backup everything into the cloud with a click, and offer our professional skills to a world market.
Oh, wait, sorry, delusion setting in again. Happens when you spend too much time in this industry. This is, after all, the Australian Government. Let’s instead spend billions on roads so we can allow more people to move from A to B while producing nothing except pollution. That’s productivity for you, Australia style.
Back to reality. Cloud Storage refers to storage capacity you can access through the internet, normally third party storage but sometimes your own. It’s a big deal nowadays as industry behemoths fight to get you on their cloud. In theory, it’s a great way to back up your stuff. Unfortunately, there is a big gotcha, the bottleneck that is your internet access.
You most likely have a low speed ADSL connection with upload speeds of under 100KB/s (uploads are much slower than downloads with ADSL). That means it takes at least a few hours to upload a single gigabyte of data, while clogging up your internet connection so it’s barely usable for anything else. Cloud backups are viable with slow connections, but limited and must be managed carefully.
So what is a cloud backup? Nothing fancy, it just means that instead of using local storage, like an external HDD, you can use Cloud Storage to save your stuff. It’s a great idea because the instant you have completed the backup, those files are offsite, and depending on service used, protected across multiple sites managed by professionals who are probably less likely to lose your Stuff than you are!
If you are one of the lucky people who enjoy a 100Mb/s or more upload service, great, then you are probably able to backup everything to the cloud. For the rest of us with a low bandwidth internet connection, cloud backups are best used in a targeted way. In other words, back up your small and important files rather than everything and use more traditional means alongside cloud backups.
“The Cloud” is a relatively new phenomenon and service providers are still working out viable business models. New services appear, and disappear on a monthly basis. For the most part, I suggest looking at services provided by the big guys such as Microsoft, Amazon, EMC, Google, and similar. I expect most of the small players to be absorbed or disappear.
All we need to send backups to the cloud is available capacity. It is not essential to sign up to a service that is specifically targeted at backups (though there are advantages with some designs). The most common service available, and one you may already have access to without realising, is OneDrive, Microsoft’s cloud storage service. If you have an Office 365 subscription, you will have access to a practically unlimited storage capacity on Microsoft’s servers that you can use to move files around, share stuff, and backup stuff. OneDrive is not designed as a backup solution, but it can be used as part of a backup system where it sets up a folder on your PC and all files saved there are automatically uploaded to your cloud service. Great for documents, not so viable for large files such as video or image snapshots.
Cloud storage services specifically developed for backups are also available and are more appropriate in a business environment. Some, like Mozy (EMC) have been around a while, and most recently the other majors are aggressively moving into this market with Azure (Microsoft) and AWS offering various solutions.
Cloud backup probably should form part of your backup system, and in some cases can form the core of your design.
Other Storage Options
Tape Drives were, for many years, the go to backup option for business. Tapes were cheap and relatively reliable but needed to be written to in a linear way. I won’t go more into the details of tape drives, rather than simply say, don’t use tape drives. On a small scale tapes drives are messy and unreliable compared to other options.
SAN arrays are like NAS units but further up the food chain. For medium and larger business, a SAN in your backup system makes sense, often including replication to offsite SANs at a datacentre or a site dedicated to disaster recovery. If you need this sort of system, you probably have your own IT people who can setup and manage and they are a bit beyond the scope of this article.
Others? Yes there are even more options, but I think that about covers the most common options.
Backup and Archiving Longevity
I once found a decade old stack of floppy disks, my primary backup store during my Uni days. I went through all of them to make copies of old documents and photos and was surprised to find almost half of them still had retrievable data. At that age I expected them to all be dead (Verbatim, quality FDDs!). There was nothing critical on them, but it’s an interesting lesson, you can’t afford to set and forget any data.
Remember when writable CDs emerged? The media were reporting how this awesome optical technology would allow data to be archived for least 100 years. Only a few years later we had clients bringing disks in to us after failing to retrieve “archived” data with the disks physically falling apart.
Will your data be there when you need it? The failure rates of modern storage hardware is low, but physical stuff never lasts forever and a realistic lifespan can be difficult to predict. It is likely that the external HDD you have sitting in the cupboard for the last five years will power up when plugged in, but the longer you leave it, the more chance that the device or data on it will be gone.
Keep any data you may need on newish devices, and replicated on multiple devices. When that old external HDD is just too small to fit all your backups, perhaps keep it with that old set of data on it and chuck it in a cupboard but copy at least the critical files to a new, larger device as well. Cloud based storage may be an option for long term storage, but trusting others to look after your stuff also introduces risk, so ensure you manage that risk. Hint: free is bad and companies (especially start-ups) and the data they hold can disappear with little notice.
If you produce too much data to cost effectively maintain all the data on new devices, give careful thought on how best to store “archived” data and weigh the risks of data loss against cost of storage.
Backup (Software) Tools
There are a large number of software tools that you can use to build a backup system. Do not fall into the trap of assuming that throwing money at a product will lead to a desirable result, though at the same time don’t rule out a high cost commercial option where it’s a good fit.
Google is your friend. Look around online and check what the professionals use. Making use of unpopular, emerging, or niche products is sometimes OK, but only adopt such tools where you see substantiative advantage in your environment. In general, go with what everyone else uses to get a particular job done. This will reduce your risk.
Consider the attributes of a backup system that I outlined in our first backup article and relate them to outcomes possible with the various tools: simple, visible, automated, independent, timely, cost effective, and secure.
Block Level (Image) Backup Tools
A block level backup tool is able to copy all data on a storage device, including open and system files, so you can be sure to get all your files stored on a partition or disk.
Windows has a basic imaging tool built in, though I’m not a fan of its interfaces limited features. There are some better free tools available, such as AOEMI Backupper, and a wide range of paid tools such as Acronis and ShadowProtect. The free tools such as Backupper are adequate in many situations, though their features tend to be more limited and you may need to use supplementary tools when handling related functions such as retention and cleanup.
With any block level tool you intend to use, look for features including:
- Support for Full and incremental backups (and differential if you need it, but you probably don’t)
- Automate scheduled backups.
- Options to encrypt and/or compress backups.
- Process to verify condition of backup archive (test if files are damaged)
- Fast mounting of image files.
- Replication (copy images to additional locations)
- Retention (automatically clean up older backups to manage space based on age and/or size basis)
- Ability to exclude specific files or folders. This is very handy, and not offered with all image tools so pay particular attention to this one.
- Bare metal restore to different hardware.
- Support for “continuous” backups and related consolidation and retention (advanced feature where frequent incrementals are merged into the archive and older files stripped out to manage space – excellent when uploading images offsite via the Internet)
- Deduplication (useful for larger sites – eg if you back up a dozen windows desktops, but only store one of each system file instead of 12 to save a lot of space)
- Central Management (manage backups across multiple devices from a single interface. Important for large sites)
- Ability to mount and run image of backup files in a VM.
You probably don’t need all of these features, and some can be implemented outside the program. For example, you could use robocopy and windows task scheduler for replication. Don’t just tick off features, go with a product that does what you need reliably.
There are many implementation tricks that may not be obvious. A common possible issue is when you create an image on an attached HDD, then swap the drive, you will at best end up with a different set of backups on each drive, maybe acceptable but not ideal. Instead it is often better to create the archive in a location that’s always available, such as a NAS or internal HDD, then replicate the entire set to the external drives. Think through what you need to achieve and make sure the tool you select can support those outcomes.
A block level backup tool should be used in nearly all backup systems.
File Level Backup Tools
A file level backup tool can be any software that lets you copy files. The Windows File Explorer could be consider a file backup tool. To be more useful however, we need to look at additional features such as
- incremental backups,
- and others depending on your needs.
File level backups can be very simple, quite transparent, and very reliable. This type of backup process is excellent to backup discrete files where you are not concerned about backing up locked files or keeping old versions and deleted files. They can also be used as a replication tool to copy image backups.
My favourite file level backup tool is one you probably already have, a program called Robocopy that is built into windows and accessed by the command line. Its quite a powerful utility that can be automated with use of a batch file and the task scheduler. If you are not comfortable with the command line or creating *.bat files, a better option many be one of the many free graphical interface based utilities, or a GUI shell for Robocopy. Rather than list the many options, I suggest using google to find recommendations from reputable sources (try searching google for “Robocopy GUI”). There are many other similar tools, Fastcopy is another we occasionally find useful.
File level tools may be adequate for a very basic backup system, where you don’t care about backing up windows, applications, or locked files, but for the most part they should be used alongside block level image backup tools.
Batch Files and the Task Scheduler
A batch file is a simple text file you can create in notepad saved with the .bat extension in place of .txt. If you double click the file, windows will read the file one line at a time and try to run the command listed on each line in order.
A batch file can be used to automate file level backups or replication when you set to run on a schedule with the windows task scheduler. For example, if you typed a line something like robocopy d:\MyFiles f:\MyFiles /e /purge and ran it within a batch file, you could mirror your files to a different drive.
If you get a bit more creative you can use the technique for many useful functions including backup systems that retain older and deleted files, and to manage the file retention of image backups. You can also look at Powershell and other scripting options to implement more advanced backup designs.
Designing a backup system is all well and good, but if its too late or your backup system has failed, is there anything you can do?
Deleted files on a mechanical hard disk can often be retrieved by file recovery tools such as Recurva. On a SSD you may be out of luck as with modern SSDs the old files are actively scrubbed shortly after being deleted.
Copies of files may be located in places you would not expect, cached files, online services.
A failed mechanical HDD will usually contain data that can be retrieved. Data recovery experts may be able to help, however costs are often in the $1000s.
If you look to have lost important files, leave the device powered down and ring us.
Bringing it all Together
This third part of our Guide to Personal and Small Business Backups outlined the Storage and Tools commonly by Backup systems. Prior articles have covered the high level conceptual framework around which you can build an efficacious backup system, and many of the technical concepts you need to develop assess an appropriate backup design.
Our final article in this series will get to the nitty gritty by presenting and explaining solutions in detail as they relate to common home and small business environments.