Backup systems could do with a common backup interchange format.

Many version control systems support the git “fast-export” format. It’s a simple format for exporting the contents of a version control repository so that it can be imported into another. It’s a fundamental basis for converting from one system to another.

Backup systems could have something similar. It would be a good to everyone as it frees people to change backup systems without having their backup history locked into their old system. There are several scenarios in which this could be useful:

  • Migrating to a new backup system that can’t read backups made with the old backup system.

  • Migrating from one backup repository to another, when a straight copy of the repository files is not possible, for whatever reason. For example, if there is no file-level access to the backup repository, only a constrained API.

  • Migrating to a new configuration that can’t read backups made with the old backup system. For example, changing encryption secrets in a system that only allows one per repository.

My initial requirements for such a format:

  • It’s as simple as possible while still working. It doesn’t need to be efficient, only efficient enough to be practical to use.

  • It’s possible to stream the format: something like oldbackup export | newbackup import should be possible.

  • It’s as independent of the backup systems as possible, and doesn’t embed unnecessary assumptions of the design or implementation of the system.

My first sketch of a backup interchange format:

  • A sequence of backup generations in order they were made.
  • Each generation has some metadata, and a list of files and hardlinhks.
  • Each file has an identifie unique to the generation, metadata (inode) and file content.
  • File content is a sequence of (offset, blob) pairs, to support sparse files.
  • Hard links are represented as (pathname, inode) pairs, where the first pathname is the name of the hard link, the
  • No attempt at de-duplication or re-using files from a previous generation.

As YAML, this would look something like this:

YAML used here as an example, actual format may be something else.