Welcome to my web log. See the first post for an introduction. See the archive page for all posts, and comments for a feed of comments only. (There is an english language feed if you don't want to see Finnish.)

Archives Tags Recent Comments Moderation policy Main site

All content outside of comments is copyrighted by Lars Wirzenius, and licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. Comments are copyrighted by their authors.

A question that I'm asked repeatedly recently is why I chose not to use an existing library for serialising data structures in Obnam. This blog post is the answer.

Obnam is a backup program, and it needs to store various data about files. This includes stat(2) information about each file in the live data, as well as data Obnam needs to keep track of everything. At run-time, Obnam keeps this data in memory data structures, such as Python dicts. For storage, these data structures need to be converted, serialised, to and from streams of bytes.

The are a variety of libraries for doing this, designed for different purposes and with their own constraints and pitfalls. Python's standard library comes with the cPickle library, for example, but its serialisation format is not guaranteed to be compatible with any other version of Python.

For Obnam, I need something that will last a long time. I do not want to have to deal with a library changing its serialisation format, as that would mean either that Obnam can't handle old backups, or that I need to start maintaining the old version of the library.

A way to look at this is that any dependencies your software have a cost, and that cost should be smaller than the benefit you get from them.

For example, Obnam depends on the paramiko library to implement the SSH protocol. This library has some cost, and I've run into one or two bugs in it that have been rather unfortunate. However, the benefit it brings is huge: I don't have to implement SSH myself. I'm happy to have Obnam depend on paramiko.

For the serialisation thing, I wrote my own library, after a small about of research into existing ones. Research time is a cost, too.

Mine is somewhat Obnam specific, in that it can make some assumptions about the data to be serialised, and this allows a simpler library. A generic library would have to handle a number of special cases that mine can ignore.

It took me less than an hour to write this twice. I first wrote a quick prototype and a little microbenchmark to see if my approach would be feasible. Then I deleted that code, and started from scratch, TDD style, to make sure the code was reliable. The cost of writing my own serialisation code was less than the cost of finding, let alone evaluating existing libraries.

It may be that my own library turns out to be inadequate. Then, and only then, is when I start researching other libraries. Until then, I'll avoid the cost of research to find a suitable library, the cost of learning the chosen one, the cost of integrating it into Obnam, the cost to porters of Obnam of dealing with a new dependency, and the risk of the library changing in ways that are unsuitable for Obnam.

Obviously, writing your own code has costs, too. Designing and implementing a library is a cost, as is maintaining it (debugging, changes in requirements, etc).

Write your own or use existing code? It's a cost/benefit analysis. There's no clear one answer that's always correct.

Posted Mon Aug 17 09:42:36 2015 Tags:

I have just released version 1.14 of Obnam, my backup program. See the website at http://obnam.org for details on what the program does. The new version is available from git (see http://git.liw.fi) and as Debian packages from http://code.liw.fi/debian, and uploaded to Debian, and soon in unstable.

The NEWS file extract below gives the highlights of what's new in this version.

Version 1.14, released 2015-08-14

Bug fixes:

  • Since 1.9, Obnam has had trouble with sftp URLs for backup roots, particularly for URLs specifying the server's root directory. Dennis Jacobfeuerborn found the reason: the backup plugin was treating URLs as filenames. This should now be fixed.
Posted Sat Aug 15 11:48:10 2015 Tags:

I have just released version 1.13 of Obnam, my backup program. See the website at http://obnam.org for details on what it does. The new version is available from git (see http://git.liw.fi) and as Debian packages from http://code.liw.fi/debian, and uploaded to Debian, and soon in unstable.

The NEWS file extract below gives the highlights of what's new in this version.

Version 1.13, released 2015-08-01

Bug fixes:

  • Lukáš Poláček found and fixed a repository corruption problem: if obnam forget was interrupted at the wrong moment, it might remove a chunk, but not the reference to it. This would case a future run of obnam forget to crash due to a missing chunk (error code R43272X). obnam forget will now ignore such a missing chunk, since it would've deleted it anyway.

    Lars Wirzenius then changed things so that chunk files are only removed once references to the chunks have been committed.


  • obnam forget now commits changes after each generation it has removed. This means that if the operation is committed, less work is lost. Suggested by Lukáš Poláček, re-implemented by Lars Wirzenius.
Posted Sat Aug 1 17:06:47 2015 Tags:

I've used Reddit for many years. I used it for many years without an account, but eventually I made one. The site has always had its share of unpleasantness, people who're more interested in tearing down than in building. In recent years, it's gotten worse, and getting out of hand.

During the fairly short reign of Ellen Pao as CEO, I found things to be getting better. The site was starting to make it clear that harrassment, for example, was unacceptable. Unsurprisingly, this made some of the nastier people quite upset.

Pao has now resigned, and a new CEO has started. He had an "Ask Me Anything" session yesterday, and made it clear that he's changing things. From my point of view, it's changing to the worse. He made it clear that as long as Reddit itself does not get into legal trouble, and harrassment isn't too overt or particularly public, it's OK now.

I've closed my Reddit account.

Posted Sun Jul 12 06:17:47 2015 Tags:

I have just released version 1.12 of Obnam, my backup program. See the website at http://obnam.org for details on what it does. The new version is available from git (see http://git.liw.fi) and as Debian packages from http://code.liw.fi/debian, and uploaded to Debian, and soon in unstable.

The NEWS file extract below gives the highlights of what's new in this version. It includes the changes for version 1.11, which was a bug fix for 1.10 and not announced separately.

Version 1.12, released 2015-07-08

Bug fixes:

  • Steven Monai reported that using --one-file-system would crash, and it turned out to be a missing import.

  • Jan Niggemann reported that --exclude-caches no longer worked. This was due to a bug introduced when the option was moved to its own plugin (for cleaner code). The bug was masked by another bug, in the Yarn test suite. Both bugs have now been fixed.


  • Jan Niggemann translated the Obnam manpage to German. Due to cliapp not supporting other languages than English yet, the manual page lacks option descriptions.

Version 1.11, released 2015-07-02

  • The 1.10 release failed to correctly include the Green Albatross code, due to a missing line in setup.py. This has been fixed.
Posted Wed Jul 8 14:46:14 2015 Tags:

I have just released version 1.10 of Obnam, my backup program. See the website at http://obnam.org for details on what it does. The new version is available from git (see http://git.liw.fi) and as Debian packages from http://code.liw.fi/debian, and uploaded to Debian, hopefully soon in unstable.

The NEWS file extract below gives the highlights of what's new in this version.

Version 1.10, released 2015-07-01

Major bug fixes:

  • Lars Wirzenius fixed the obnam backup command to lock the whole repository, the same way as obnam forget does, when it removes checkpoint generations. This means that during checkpoint removal, no other client can make a backup, which is unfortunate. To avoid that, set leave-checkpoints = yes in the configuration. That will prevent obnam backup from removing checkpoints.

Minor new features:

  • Lars Wirzenius added the obnam list-formats command to list all repository formats.

  • The default value for the upload-queue-size setting is now 1024, chosen based on some benchmarking made by Lars Wirzenius to balance speed and memory use.

  • An EXPERIMENTAL new repository format, green-albatross, as been introduced. It is not ready for actual use, and is only added so that its code doesn't diverge far from the main line of development.

  • Teemu Hukkanen reported that the Synology NAS device returns EACCES instead of ENOENT when user tries to remove a non-existent file. Obnam now copes with either error code.

Minor fixes:

  • python setup.py build no longer formats the manual page into plain text. This is now done in python setup.py docs instead. The latter is an optional build step, and probably only works on Debian.

  • obnam restore --to=DIR now requires that the directory DIR either doesn't exist, or it is empty when the restore starts. This is to prevent users from restore on top of a running system.

Posted Thu Jul 2 05:10:32 2015 Tags:

Acceptable estimations for software development:

  • Almost certainly doable in less than a day.
  • Probably doable in less than a day, almost certainly not going to take more than three days.
  • Probably doable in less than a week, but who knows?
  • Certainly going to take longer than a week, and nobody can say how long, but if you press me, the estimate is between two weeks and four months.

Reality prevents better accuracy.

Posted Tue May 19 19:50:38 2015

There will be a gathering of Debian people to celebrate the release of jessie this Saturday in Helsinki. For details, see the wiki page. Welcome, everyone.

Posted Wed Apr 22 16:48:53 2015 Tags:

The Debian Project Leader electsions are going on. This is the yearly election for the leader, where members of the project vote for a new leader for a year. The debate this year seemed to me to be quite quiet, and voting activity seems to not be very high, either. Pity. Many years ago, the election period used to be quite energetic, bringing up some quite good viewpoints.

There seems to also not have been the usual repeat of the voting announcement, not sure what's going there. There's time until next Tuesday midnight (in the UTC time zone) to vote. Below are links to the vote page (with instructions for voting) and the (corrected) initial announcment.

I voted for Neil as my top candidate. I think he's got the best background and personality for being the leader of this project of ours.

Posted Thu Apr 9 15:51:40 2015 Tags:

It is with great pleasure and satisfaction that I release version 4.1 of Obnam, my backup program. This version includes a radically innovative approaches to data compression and de-duplication, as well as some other changes and bug fixes.

Major user-visible changes:

  • Obnam now recognises most common image types, and de-duplicates them by substituting a standard picture of a cat or a baby. Statistical research has shown that almost all pictures are of either cats of babies, and most people can't tell cats or babies apart. If you have other kinds of pictures, use the --naughty-pictures option to disable this new feature.

  • Obnam now compresses data by finding a sequence in the value of pi (3.14159...) that matches the data, and stores the offset into pi and the length of the data. This means almost all data can be stored using two BIGNUM integers, plus some computation time to compute the value of pi with necessary precision. The extreme compression level is deemed worth the somewhat slower speed. To disable this new feature, use the --i-like-big-bits-and-i-cannot-lie option.

  • Obnam now uses one-time pad encryption in the repository. It is a form of encryption that is guaranteed to be unbreakable. Given the large amounts of data Obnam users have, the infinitely long value of the mathematical constant e is used as the encryption pad, since it would be bad security practice to use a pad that's shorter than the data being encrypted. To disable this new feature and use the old style encryption using GnuPG, use --i-read-schneier.

Minor user-visible changes:

  • There is a new subcommand obnam resize-disk, which resizes the filesystem on which the backup repository resides. In this version, it works on LVM logical volumes and RAID-0, RAID-5, and RAID-6 drive arrays using mdadm. The subcommand optionally arranges more space by deleting live data files and reducing corresponding LV sizes to make more space for backups. If live data is deleted, the backup generations containing the data is tagged as un-removeable so it's not lost. In the future, the subcommand may get support for purchasing more disk space from popular online storage providers.

  • To reduce unnecessary bloat, the obnam restore subcommand has been removed. It was considered unnecessary, since nobody ever reported any problems with it.

  • Obnam now has a new repository option, --swap-in-repository, which starts a daemon process that holds all backup data in memory. Once the process grows enough, this will result in most of the data to be written to the swap partition. This makes excellent use of the excessively large swap partitions on many Linux systems. This feature does not work on Windows.

Bug fixes:

  • The obnam donate command to send the Obnam developers some money now again works with Bitcoin. There was a bug that broke Obnam's built-in Bitcoin mining software from working.

  • The obnam help command again speaks the user's preferred language (LC_MESSAGES locale setting), rather than Finnish, despite pressure from the Finnish government's office for language export.

Posted Wed Apr 1 04:01:41 2015 Tags:

For more, see the archive.