I’ve decided to resurrect development of my backup program, Obnam. This time I thought I’d babble about it in public as I develop it, rather than try to present the world with a finished product.

I have not been happy with any backup solution I’ve tried. I have some fairly specific requirements:

  • Backups must be stored either on a local hard disk, or online. I don’t care at all about tapes, optical media, or anything else that requires repetitive manual work.
  • Server end must be under my control as well. No Amazon S3 for me.
  • Both push and pull backups.
  • Backups must be encrypted at client end.
  • Backups must be incremental, but each generation must look like a full snapshot.
  • Backups must use checkpoints: network connections break, and if they do, the next backup must continue from most recent checkpoint.
  • Setup must be easy. Backups are important, but if they’re at all any kind of pain, I and most others will just postpone them to a future day and one day it will be too late.
  • Fast. If I do some e-mail and write some code while drinking a smoothie in a net cafe, by the time I finish the drink and put away the laptop the backup must be finished.
  • Deals sensibly both with slow and fast networks. An incremental backup should not download any data from server, and should only upload the delta from the previous backup, plus minimal overhead.
  • Reliable. Backups should not require attention. I should just be allowed to assume they work. This also requires unobtrusive feedback that they’re OK, and proper error reporting when something is wrong and does require my attention.

It’s been a while since I did a proper survey, so things may have changed since, but so far, I’ve never found a system that I like. If you know of one, please don’t tell me. I am now deep into thinking about the technical problems I will need to solve, and not that interested in finding an existing solution anymore.

If “hubris” was spelled with an i, it would be my middle name.

I have some code sketched out, but nothing that does anything useful yet. I’ve been playing with the internal architecture, and the interface and abstraction I will want for the “storage subsystem” that stores the backed up data. I have not decided yet how to implement the storage subsystem, but btrfs B-trees interest me a lot.