Happy World Backup Day! May all your restores always be successful.

Our family home file server currently has about 8.3 TiB of data in just under a million files, mostly photographs and video footage by my wife, who is a documentary film maker. We have a backup server in a co-located data center in a different country. I also have a hobby project to develop a backup program, Obnam, but it’s not yet used for the home file server backups. What do I need from Obnam to do that? That’s an interesting question that guides my development work of Obnam. Here’s what I’m thinking.

Note that these are my personal criteria. They might not be your criteria. They may also change over time. If you have specific needs that you’d like Obnam to fulfill, you should open an issue in the Obnam issue tracker.

The general shape of Obnam is:

  • A server component stores data that has been chopped into chunks by the client. The server provides a simple HTTP API for storing and retrieving chunks. The server doesn’t inspect the chunks in any way: they’re just opaque blobs. However, the server does associate a small amount of metadata to each chunk, primarily a client-assigned label.
  • The client divides the contents of each file into chunks, encrypts each chunk, and uploads it to the server. It assigns a label that is derived from the cleartext contents of the chunk: basically, a SHA256 checksum.
  • The list of files in a backup, the metadata about each file, and the list of chunks for each file, is stored in an SQLite database, which is just a file. The database is uploaded to the server as chunks.

This is a fairly simple, easy architecture. All the really complicated data structures are handled by SQLite, which has a long track record of doing it well, and not breaking backwards compatibility. It seems well suited for backups. It seems fast enough: I can insert about a million rows per second.

While the architecture is simple, there are a lot of details to get right, for correctness, security, and for performance. Obnam is at this stage very much a toy, barely even a prototype, and not usable for real backups of real data that someone cares about.

What do I need to start using Obnam for our backups for real, then? Here is my current list:

  • The client needs a good way to authenticate itself to the server.
    • At the moment, there is no authentication. This is clearly not acceptable, but I believe in iterative development, and skipping strong authentication was one obstacle I chose to avoid in the early stages.
  • I need to be able to delete specific backups, to avoid storage filling up on the backup server.
    • I don’t have to have an automatic schedule for deleting old backups. I can do it manually, to start with.
  • I need to be able to verify that my backups can be restored, and to verify the data on the server is OK.
    • Preferably without actually having to restore everything.
  • The initial backup needs to not be excessively slow. Incremental backups must be reasonably fast.
    • The initial backup should be able to use at least half of our home Internet up-link speed: the whole initial backup should take no more than ten days.
    • An incremental backup, when nothing much has changed, should take less than ten minutes.
  • All data the client sends to the server should be encrypted, and the encryption should be at-rest, not in-transit, using a key only the client knows.
    • This ensures the server can’t store any client data in cleartext, since server won’t ever see it in cleartext.
    • The server chooses random identifiers for the chunks: those will be transmitted in cleartext (over an encrypted channel), when the client retrieves a chunk.
    • The chunk labels shall not be cleartext checksums, either.
  • Obnam should be sufficiently future-proof that it’s unlikely I’ll need to a complete backup to the server again.
    • Ten days is a long time to watch a progress bar.
    • If (when!) Obnam makes changes, it shall have a way to migrate a backup server to a new version without forcing a complete re-transfer of the data to the server.

That’s a fair bit of work, and as I only work on Obnam in my free time, it will take a while. If Obnam sounds interesting to you, maybe you’d like to help? I could use a lot of help, for all kinds of things: from applied cryptography to updating documentation to translating the software and the documentation to other languages to making a website that doesn’t make babies cry.