This post is part of a series on backup software implementation. See the backup-impl tag for a list of all posts in the series.

This is another grab bag of random topics.

Snapshots vs deltas

It is common to talk about "full backups" that are complete self-standing copy of the data versus "incremental backup" that only has the changes since the previous backup. Being someone who has implemented backup software, I prefer to talk about "snapshot" versus "delta" backups.

In a backup system based on snapshots, each backup looks like a complete, self-standing backup, even if it's implemented in a way where common data in several backups is only stored once. One way to implement this is to store each unique chunk of data only once, and each backup refers to the chunks in the files in that backup.

In one based on deltas, each incremental backup is a "delta" against a previous backup. Delta is used here in the mathematical sense of difference: the new backup might store a new file completely, but only the changed parts of a changed part.

The big difference, from my point of view, is that to restore a backup using snapshots is straightforward, but to restore using deltas you start from a full backup and then apply all the deltas needed to get the latest state. Applying deltas can be slower, and is often trickier to implement. "Tricky" is a technical term in software engineering that means "more likely to be wrong".

In my opinion, deltas made a lot of sense for tape based backups: you have to at least seek past all the previous backups on the same tape any way, so you may as well restore deltas on the way. However, for backups stored in random access storage, such as hard drives, snapshots make a lot more sense.

Snapshots are even more important if you want to remove any specific backups, to recover space. This is very tricky with deltas, but can be quite straightforward with snapshot. (I say this as someone who has implemented this.)

For myself, I would only consider snapshots. This is influenced by my strong dislike of tape drives.

If you like tapes, by all means use them. If you want me to implement backup software that uses tapes for storage, the price is going to be higher.

File system deltas

File systems such as ZFS and btrfs support file system deltas. The file system itself constructs the delta, which can be exported as a regular file. The delta can be applied to another file system of the same type.

This can work really well, and it can be quite efficient.

However, I am personally not interested in requiring the same file system type to be used when restoring. I entirely reject this approach for any backup system I may or may not implement in the future.

Again, this is my personal choice. If you're happy with file system deltas, use them. My preference doesn't matter in that case.

Using rsync and directory trees of hard links

I have used, successfully and for years, directory trees of hard linked files. This means that each backup is a directory (e.g., 2024-12-24, 2024-12-25, etc). Every file (anything except directories) that is unchanged from the previous backup is stored as a hard link to the same file in the previous file.

The core of this is approximately:

$ rm -rf $new
$ cp -al $old $new
$ rsync -a --del $HOME/. $new/.

This can work OK. The hard linking saves a ton of space, compared to storing each backup in full. Browsing old backups means looking at files on disk.

It's also very easy to set up. The shell snippet above is almost everything you need. There are plenty of variants of this online, if you don't want to make your own.

However, even though it's my go-to approach for backups that don't rely on complex backup software, it's not something I particularly like. The main problem is that I have millions of precious files, and if each backup has all of them (even if hard linked), it becomes cumbersome to move backups to new storage, or even to remove old backups.

It turns out that dealing with very large numbers of files is not fun. Even when tools can cope, they are often slow. For example, I've not managed to use rsync to transfer a few hundred daily backup directories from one server to another: it always runs out of memory.

Even deleting a few hundred million hard links is slow.

I'd prefer a backup implementation that didn't store each precious file as a separate file, but on the other hand, that is not going to be as simple as cp -al and rsync -a --del.

Feedback

I'm not looking for suggestions on what backup software to use. Please don't suggest solutions.

I would be happy to hear other people's thoughts about backup software implementation. Or what needs and wants they have for backup solutions.

If you have any feedback on this post, please post them in the fediverse thread for this post.