This may be the stupidest thing I will ever have done, but I intend to have fun while doing it.

I’m writing another implementation of a backup system. It is called Obnam (“obligatory name”), just like the previous one that I retired three years ago.

The shape of the new system is roughly as follows:

  • Client/server, with HTTPS (not SFTP like Obnam1). A smart server stores chunks of data but doesn’t look into them, the client has all the interesting logic (encryption, compression, de-duplication, etc).
  • Written in Rust (not Python like Obnam1).

Long term I’m aiming at something like this:

  • Easy to install: available as a Debian package in an APT repository. (I’d appreciate help with other forms of packages.)
  • Easy to configure: only need to configure things that are inherently specific to a client, when sensible defaults are impossible.
  • Easy to run: making a backup is a single command line that’s always the same.
  • Detects corruption: if a file in the repository is modified or deleted, the software notices it automatically.
  • Repository is encrypted: all data stored in the repository is encrypted with a key known only to the client.
  • Fast backups and restores: when a client and server both have sufficient CPU, RAM, and disk bandwidth, the software makes a backup or restores a backup over a gigabit Ethernet using at least 50% of the network bandwidth.
  • Snapshots: Each backup is an independent snapshot: it can be deleted without affecting any other snapshot.
  • Deduplication: Identical chunks of data are stored only once in the backup repository.
  • Compressed: Data stored in the backup repository is compressed.
  • Large numbers of live data files: The system must handle at least ten million files of live data. (Preferably much more, but I want some concrete number to start with.)
  • Live data in the terabyte range: The system must handle a terabyte of live data. (Again, preferably more.)
  • Many clients: The system must handle a thousand total clients and one hundred clients using the server concurrently, on one physical server.
  • Shared repository: The system should allow people who don’t trust each other to share a repository without fearing that their own data leaks, or even its existence leaks, to anyone.
  • Shared backups: People who do trust each other should be able to share backed up data in the repository.

I am primarily writing this for myself, in my free time, but it’d be nice if it was useful to others, or they’d like to contribute.

I’ve written a simplistic prototype, where the backup program reads data from stdin, breaks it into chunks, and uploads chunks to the server, unless they’re already there, and the corresponding restore program downloads the chunks and writes them to stdout.

What little code there is, is on gitlab.com.

If you’re interested in helping, or using, the new Obnam, please get in touch. Email is OK, although GitLab issues or merge requests are preferred. However, please be patient: this is a side project, and I may take a while to respond.