The purpose of a backup is to allow you to recover from a disaster with reasonable cost and effort. If you delete a file you shouldn't have, or make changes that you shouldn't have, backups are meant to save you from having to re-create the file, or undo a large amount of steps.

Speaking very broadly, any copy of your live data is a backup, but this is a uselessly broad definition. For example, if you use an automatic synchronisation system such as Dropbox or git-annex, to keep your live data in sync between two computers, you could pretend they're backups of each other. However, unless the synchronisation also allows you to keep a history of file versions, it's not a very good backup. If you delete your precious file on one computer, and it gets then deleted on the other computer as well, automatically, perhaps in seconds, then the backup is not of much use.

Another common assumption is that a RAID array works as a backup. RAID is an excellent technology that allows you to combine several hard disks so that they protect you against loss of data in case of disk failure. If one disk fails, the others have enough data to re-create the data on the failed disk, using either full copies (RAID-1) or error correction codes (RAID-5, RAID-6). This is not a backup. It doesn't protect you against accidental file deletions. There is also no backup history.

A version control system is very much like a backup. It stores copies of many of the versions of your project. However, in most version control systems it's fairly easy to make changes that lose history. Ask anyone who has used git reset to change the tip of the master branch to undo a wrong commit or merge, and then accidentally force-pushed that to the server. This is arguably a normal, if uncommon use of the version control system. A good backup system will protect you from you own mistakes, when you do the kinds of things you're expected to do. Version control systems also rarely capture all your data.

When you were five, and made some stuff on the family computer, and saved it on a floppy, and then drew a cute little picture of yourself on the floppy to make it clear to everyone it was your floppy, and not anyone else's, certainly not your bully of your brother's, and your mother kept the floppy for decades because of the cute picture, then that is also not a backup. You didn't even know your Mom had kept it.

A reasonable backup is one from which you can restore a working copy of your data, when you need to, without too much effort or expense, compared to the disaster you're experiencing. If the disaster is that you deleted a one-page draft outline of the book you want to write someday, the disaster is not very severe. The cost of restoring should be low.

If the disaster is that your plans to become the supreme emperor of the world, and make all people your slaves, are in a spreadsheet on your laptop, and your minions accidentally drove a car over your laptop, and you had accidentally not used a Thinkpad as your laptop, the disaster is quite severe. Unless you recover the spreadsheet, you'll never be able to tell apart the buttons to launch the Moon rocket, to self-destruct your HQ, and to switch channels on your TV, and all your work will be in vain, and you'll never, ever, ever convince the pretty girl with red hair living in the house opposite that she should be interested in you. Also, you'll never be able to move away from your parent's house. So, quite severe. It will be acceptable to go to quite some effort and expense to recover that spreadsheet. It's better if you don't need to, but you will, if you have to.

Your backup should also be reasonably up to date. Backing up every Christmas is a fine family tradition, but if you don't make a backup also on Easter, Midsummer, and Aunt Agatha's birthday sometime in September was it, or maybe October, you'll risk losing a whole year's worth of work. A year is a long time, and you might never be able to re-do all the work.

Personally, I backup my personal laptop every day to a file server at home, and less often to an online backup server. My work laptop gets backed up once an hour to the company file server, which gets backed up to two backup servers about once a day.

You need to balance the risk of losing data and work, and the expense and effort to back up your data. How much is a day's work worth to you, or your employer? How much does a backup system cost?

In the next episode, I'll ponder on how many backups are enough.

Sorry for being a bit pedantic here, but a git reset (to move your branch head to an older comit) doesn't lose history, it just changes what is the official history, the old history is still there, just a little harder to find, git reflog will help you if you just accidentally did such a thing (by showing the history of references you have been), if you want to change that a long time after, it will be harder to recover, but i believe possible, if garbage collector didn't run in the mean time that is (this one will lose history, history you probably don't care about).

Great post anyway, debunking misconceptions about backup is great, just as debunking misconceptions about git is :P.

Comment by Gabriel Mon Jun 17 23:12:43 2013
When you complete this series of articles, are you going to release a new version of obnam for wheezy-backports? :)
Comment by Regis Tue Jun 18 20:44:55 2013

I haven't done Obnam backports to squeeze, and I don't plan on doing them for wheezy. I'll maintain my own repository at code.liw.fi, though.

Comment by Lars Wirzenius Tue Jun 18 21:32:39 2013