Welcome to my web log. See the first post for an introduction. See the archive page for all posts, and comments for a feed of comments only. (There is an english language feed if you don't want to see Finnish.)
All content outside of comments is copyrighted by Lars Wirzenius, and licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. Comments are copyrighted by their authors.
Each software tool exists to solve some problem. For each problem, there are many possible solutions. Even when different programs basically do the same thing, they can have quite different shapes.
As an example, this morning I was wondering if it would be possible
for me to use
notmuch to index my entire mail archive. For that, I
needed to convert a number of mbox folders to Maildir format. That's a
resonably easy problem, given access to suitable programming
libraries, but there's an existing tool for that, called
Unfortunately, it has the wrong shape for my needs.
mb2md doesn't just convert one mbox to one maildir. It's designed to
for a mail admin converting all server-side mbox folders for a user
into a corresponding structure of Maildir folders. This seems to be
necessary when switching IMAP servers. That's a fairly specialised
problem, and the program has been written to make it easy for a mail
admin to do that.
What I need is part of the problem solved by
mb2md and indeed it can
do just that part. However, the overall shape of
mb2md is such that
my part is hard to do. The incantation is quite unintuitive and
requires careful reading of the documentation.
The shape of a solution matters.
mb2md could easily have been
written in a way that provides a simple tool for the single folder
conversion, and then a more complex tool for the mail admin's more
complicated problem. This would have resulted in a much more general
tool, and that would make it easier for more people to use it without
Mail folder format conversions are a fairly esoteric thing to do. However, the lack of generality is a frequent issue with how programs are designed. It is easy to fall into the trap of writing a highly specialised tool, instead of taking a step back and making a more general purpose tool. The specialised tool will help a small number of people. The general tool will help many people.
Examples of this are fairly common. Debian has a set of tools for making Debian live CDs; they are not quite able to make a bootable hard disk image as well (thus, vmdebootstrap). There's programs for computing cyclomatic complexity, which produce HTML reports, rather than something that can be processed by other programs without too much effort. There's tools for managing address books that are limited to specific cultures, e.g., by hardcoding assumptions of what a person's name looks like (thus, clab).
One of my favourite examples is
xargs, which by default does the
wrong thing by assuming its input is whitespace delimited. Any
whitespace, not just newlines. Any sensible use requires adding the
-0 option, which makes
xargs that much more tedious to use.
Furthermore, I've often found that the more general tool is simpler. It's functional specification is simpler; it's implementation is simpler, and has fewer special cases; it's user experience is simpler. That's not always true, but often it is.
Sometimes the general solution shape is not worth it. But it's always worth considering whether it might be.
One of the parts of the Unix culture I really like is the preference for general tools that are easy to combine together.
It Will Never Work in Theory is a web site that blogs, though slowly, of important research and findings about software development. It's one of the most interesting sites I've found recently, possibly for a long time.
I disagree with the term "software engineering" to describe the software development that happens today. I don't think it's accurate, and indeed I think the concept's too much of a fantasy for the term to be used seriously about practicing developers do. For software development to be an engineering discipline, it needs a strong foundation based on actual research. In short, we need to know what works, what doesn't work, and preferably why in both cases. We don't have much of that.
This website is one example of how that's now changing, and that's good. As a practicing software developer, I want to know, for example, whether code review actually helps improve software quality, the speed of software development, and the total cost of a software project, and also under what the limits of code review are, how it should be done well, and what kind of review doesn't work. Once I know that, I can decide whether and how to do reviews in my development teams.
The software development field is full of anecdotal evidence about these things. It's also full of people who've done something once, and then want to sell books, seminars, and lectures about it. That's not been working too well: it makes research be mostly about fads, and that's no way to build a strong foundation.
Now I just need the time to read everything, and the brain to understand big words.
Meet Alfred. Alfred is a Debian user. He has a laptop with Debian and a desktop environment running on it. Alfred does a lot of impotant things on his computer: his hobby is to photograph his cat, and also he works for a non-governmental organisation that investigates and reports on human rights violations. His job involves a lot of travel to many parts of the world, and he needs to handle a lot of very sensitive information. His laptop uses full-disk encryption, and it's generally speaking very well secured against the various security threats that are due to his job.
He is worried about losing important data. He's not too worried that the sensitive information he has will leak if his laptop is stolen, but it might be impossible to re-create the data if the laptop is gone. If he interviews a whistleblower for a slave-trading corporation, and his laptop is stolen after that, it might be impossible to ever meet with the whistleblower again.
Alfred wants backups of his data. He gets a USB thumb drive, and plugs it in. The laptop has never seen the drive before, so it asks Alfred if the drive should be used for backups. Alfred says yes.
The laptop formats the thumb drive, again with full-disk encryption, and then runs a backup. The backup automatically picks up all the files from Alfred's home directory, and some system confguration files that may be necessary as well. (Read: /home and /etc.) Files that are usually not very precious, such as web browser caches, are automatically excluded.
Later, when Alfred wants to update the backup, he plugs in the same drive again. The system recognises the drive, and runs the backup. While the backup is running, Alfred has an indicator in his desktop status bar. If Alfred leaves the drive plugged in, and changes anything in his home directory, that gets immediately backed up to the backup drive. Until the changes have been backed up, the indicator stays on Alfred's status bar.
This isn't good enough, however. Alfred needs to carry the USB drive with him, and if he's mugged, he might lose both the laptop and the backup drive. Therefore, the system administrator at Alfred's NGO, Janet, sets up an account on an online backup server, and e-mails Alfred a configuration file, which Alfred drops into the backup system's configuration tool.
From then on, whenever Alfred's laptop is online, and can see the backup server (identified by an SSH host key), any changes Alfred makes are backed up as soon as possible. For the next interview, as soon as the interview is finished and Alfred closes the laptop lid to suspend it, the backup has already finished, both to the online server and the USB thumb drive.
Alfred is now happy, and no longer fears for the safety of his data.
Janet, however, is still a little worried, because the online backup server is an attractive target for attacks. She asks Alfred to configure the backup service on the laptop to encrypt and digitally sign the backups, and sends the master backup public key with the request. Janet keeps the corresponding private key in a secure location.
Alfred goes into the configuration dialog, ticks the right box, and drops in the server public key. The backup software generates a new public key for the laptop to use for encrypting the backups, and Alfred e-mails that to Janet, using PGP encrypted and signed e-mail. He also puts the laptop backup encryption keys on a couple of USB thumb drives, which he stores in safe places (in his sock drawer and coffee jar, but don't tell anyone that).
Alfred's online backups are now encrypted with public keys so that both Alfred and Janet can decrypt them, but only they can do that. The backups are digitally signed so that if the server is hacked, the backups can't be altered without it being detectable.
Some time passes.
Alfred needs to go to speak to the general assembly of the Cat Conference, about how awesome his cat is. This requires him to travel to the US, and he's worried that the US authorities will confiscate his laptop and try to get at his work files that way. He deletes all his work files, ssh keys, and other files that aren't necessary to show his cat pictures at the conference.
The conference goes fine, and when Alfred comes back home, he gets the USB thumb drive that contains his backup encryption key. He plugs it in, tells the backup configuration software to import it. Alfred can then open his backups on the online backup server in his file browser, and can restore back his files by copying them with drag and drop.
However, the next day Alfred's cat, upset at how much he travels, pees on the laptop. It is ruined. Everything is lost.
Alfred gets a new laptop from Janet, and installs Debian on it. During installation, Alfred gives the installer the USB backup drive, and the installer restores all of Alfred's own files, and also restores system configuration. After a little while, Alfred has a newly installed laptop with all his usual software and all of his files.
This is a summary of a vision for backups being a service in a default Debian install in the future. It is currently just a vision, and nobody is currently working on making it reality. Would you like to work on this for the release after jessie?
(No cats were harmed in the production of this vision.)
Kudos to Matthew for taking a stance. It has, not surprisingly, provoked a lot of comments and feedback, most of it unpleasant.
If I did anything that was directly related to Intel, I'd join him, but I do very, very little architecture dependent stuff anymore.
I will, however, say this: Even if the "gamergate" were actually about good journalism and ethics (and it's clear it isn't), if your reaction to a differing opinion is abuse, harrassment, and other kinds of psychological violence, you're not making anything better, you're making it all worse.
Reasonable people can handle disagreement without any kind of violence.
45 today. I should stop being childish, but I don't wanna.
I don't really like any of the ticketing systems I've ever needed to use, whether they've been used as bug tracking systems, user support issue management systems, or something else. Some are not too bad. I currently rely most on debbugs and ikiwiki.
debbugs is the Debian bug tracking system. See https://www.debian.org/Bugs/ for an entry point. It's mostly mail based, with a read-only web interface. You report a bug by sending an email to submission address, and (preferably) include a few magic "pseudo-headers" at the top of your message body ot identify the package and version. There's tools to make this easier, but mostly it's just about sending an e-mail. All replies are via e-mails as well. Effectively, each bug becomes is own little dedicated mailing list.
This is important. A ticket, whether it is a bug report or a support request, is all about the discussion. "Hey I have this problem..." followed by "Have you tried..." and so forth. Anything that makes that discussion easier and faster to have is better.
It is my very strong opinion, and long experience, that the best way to have such a discussion is over e-mail. A lot of modern ticketing systems are web based. They might have an e-mail mode, perhaps read-only, but that's mostly an afterthought. It's a thing bolted onto the side of the system because people like me whinge otherwise.
I like e-mail for this for several reasons.
E-mail is push, not pull. I don't need to go look at a web page to be notified that something's happened.
E-mail requires no extra usernames and passwords to manage. I don't need to create a new account every time I encounter a new ticketing system instance.
E-mail makes it very easy to respond. I can just reply to a message. I don't need to go to a web site, log in, and find a reply button.
I already have archives of my e-mail, so referring to old messages (or finding them) is easy and quick. (Mutt, offlineimap, and notmuch is my particular set of choices. But I'm not locked to them, and you can use whatever you like, too.)
E-mail is a very rich format. Discussions are inherently threaded, and various character sets, languages, attachments, and other such things just work.
For these reasons, I strongly prefer ticketing systems in which e-mails are the primary form of discussions, and e-mail is a first class citizen. I don't mind if there's other ways to participate in the discussion, but if I have to use something else than e-mail, I tend not to be happy.
I use ikiwiki to provide a distributed, shared notebook on bugs. It's a bit cumbersome, and doesn't work well for discussions.
I think we can improve on the way debbugs works, however. I've been thinking about ticketing systems for Obnam (my backup program), since it gaining enough users that it's getting hard to keep track of discussions with just an e-mail client.
Here's what I want:
Obnam users do not need to care about there being a ticketing system. They report a problem by e-mailing the support mailing list, and they keep the list in cc when conducting the discussion. This is very similar to debbugs, with the distinction that there's no ticket numbers that must be kept in the replies.
The support staff (that's me, but hopefully others as well) have access to the ticketing system, which automatically sorts incoming messages into tickets. Tickets have sufficient metadata that it's possible to track which ones have been dealt with, or still need work, and perhaps other things. Each ticket contain a Maildir with all the e-mails belonging to that ticket.
The ticketing system is distributed. I need to be able to work on tickets offline, and to synchronise instances between different computers. Just like git. It's not enough to have an offline mode (e.g., queuing e-mails on my laptop for sending to debbugs when I'm back online).
There is a reasonably powerful search engine that can quickly find the relevant tickets, and messages, based on various criteria.
I will eventually have this. I'm not saying I'm working on this, since I don't have enough free time to do that, but there's a git repository, and some code, and it imports e-mails automatically now.
Some day there may even be a web interface.
(This has been a teaser.)
I have just tagged Obnam (my backup program) 1.8 in git, and built and uploaded Debian packages to code.liw.fi and Debian unstable. NEWS snippet below.
Version 1.8, released 2014-05-13
The error message has been improved for when setting metadata (owner, permission, and similar) of a restored file fails.
obnam force-locknow works even when the client running it is not in the client list.
- Joey Hess found a problem in
obnam restore: restored files would be created with quite liberal default permissions, which would be set to the backed-up permissions later. This could allow a snooper to read files they shouldn't be. This has been fixed now by using restrictive default permissions. A workaround for older versions is to create a directory, set its permissions to 0700, and restore to a subdirectory of that directory.
--helpoutput no longer shows the default value of any options. It was shown only for a few options anyway. The proper way to see the current settings is with the
--dump-configoption. The bug that was fixed that the generated manual page no longer contains values that are specific to the machine doing the generation, such as the hostname as the default value for
--client-name. Reported by SanskritFritz.
When a file was backed up, and later excluded with
--exclude, Obnam wouldn't remove it from the new backups. Now it does. Bug fixed by Anssi Hannula, though his patch got changed because it no longer applied.
When restoring extended attributes not in the user namespace (named like
user.foo) Obnam now ignores them, instead of trying to set them and crashing.
When restoring from a directory that is not a repository, the error message is now clearer.
Obnam would previously allow the backup root to be a symbolic link pointing at a directory. However, this only worked for backups. No other operations would work and would only see the symbolic link, not the directory it pointed at. Obnam now gives an error message even for the backup.
Obnam no longer excludes files named
none, if the setting
Thirty years ago I started to learn programming. To celebrate this, I'm doing a bit of programming as a sort of performance art. I will write a new program, from scratch, until it is ready for me to start using it for real. The program won't be finished, but it will be ready for my own production use. It'll be something I have wanted to have for a while, but I'm not saying beforehand what it will be. For me, the end result is interesting; for you, the interesting part is watching me be stupid and make funny mistakes.
The performance starts Friday, 18 April 2014, at 09:00 UTC. I apologise if this is an awkward time for you. No time is good for everyone, so I picked a time that is good for me.
Run the following command to see what the local time will be for you.
date --date '2014-04-18 09:00:00 UTC'
While I write this program, I will broadcast my terminal to the Internet for anyone to see. For instructions, see the http://liw.fi/distix/performance-art/ page.
There will be an IRC channel as well:
#distix on the OFTC network
irc.oftc.net). Feel free to join there if you want to provide real
time feedback (the laugh track).
When you have a big goal, do at least a little of it every day. Cory Doctorow writes books and stuff, and writes for at least twenty minutes every day. I write computer software, primarily Obnam, my backup program, and recently wrote the first rough draft of a manual for it, by writing at least a little every day. In about two months I got from nothng to something that is already useful to people.
I am now applying this to coding as well. Software development is famously an occupation that happens mostly in one's brain and where being in hack mode is crucial. Getting into hack mode takes time and a suitable, distraction-free environment.
I have found, however, that there are a lot of small, quick tasks that do not require a lot of concentration. Fixing wordings of error messages, making small, mechanical refactorings, confirming bugs by reproducing them and writing test cases to reproduce them, etc. I have foubd that if I've prepared for and planned such tasks properly, in the GTD planning phase, I can do such tasks even on trains and traun stations.
This is important. I commute to work and if I can spend the time I wait for a train, or on the train, productively, I can significant, real progress. But to achieve this I really do have to do the preparation beforehand. Th 9:46 train to work is much too noisy to do any real thinking in.
For more, see the archive.