Welcome to my web log. See the first post for an introduction. See the archive page for all posts. (There is an english language feed if you don't want to see Finnish.)

Archives Tags Moderation policy Main site

Me on Mastodon, for anything that is too small to warrant a blog post.

All content outside of comments is copyrighted by Lars Wirzenius, and licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. Comments are copyrighted by their authors. (No new comments are allowed.)


sopass 0.6.0: a command line password manager

I've just released version 0.6.0 for sopass, my command line password manager that I use instead of pass.

Version 0.6.0, released 2025-10-31

If I were of the American persuasion, this would be a spooky release. But I'm not, so it's a comfy release that doesn't scare anyone.

  • The sopass value generate command generates a new random value for a name.

  • There have also been other changes. A deb package is built and published by CI for every merge into the main branch. The documentation of acceptance criteria is published at https://doc.liw.fi/sopass/sopass.html. Lars has decided to not work on cross-device syncing, as it's not something he needs, even though it's an interesting technical problem.

Obnam 3 status: chunks, credentials, help?

I've spent most Sundays for the past half a year implementing Obnam 3, the third generation of my backup program. I've posted a blog post of each three-hour session on the Obnam blog. It's way too much detail for anyone not passionately interested in this project. Here is a summary of what I've done. There is also an appeal for help.

I've implemented the lowermost storage layer of storing backups: the chunk. A chunk is a piece of data, either a small file or a part of a longer file. The chunk is encrypted in a way that also allows verifying the chunk hasn't been modified while in backup storage.

Each chunk is encrypted with a random, computer-generated symmetric key, which the user never sees. There can be any number of such keys, for different groups of chunks, although the implementation doesn't yet make it convenient to choose the key to use when encrypting a chunk. The chunk keys are stored in a client chunk, which itself is encrypted with another random, computer-generated key, the client key.

The client key is encrypted in various ways, and the result of each of those encryption operations is stored in a credential chunk. I've implemented credential encryption methods using OpenPGP software keys, and OpenPGP cards.

This part works and although it needs polish, I'm pretty happy with it.

There is also a rudimentary backup repository, which stores chunks in a local directory and allows searching for chunks by id or label. Chunk labels are short strings cryptographically attached to the chunk to give the type of a chunk, or the encrypted checksum of the plaintext data in a chunk, for de-duplication.

I've intentionally limited myself to a single Sunday session per week, at most three hours per session. This has been quite enjoyable: I am not in a hurry, and I can try to be careful and deliberate. In my profession that is not as common as I would like. Three hours a week has been enough to make progress, even if slowly. But fast enough for a hobby project.

i'm not yet sure what I will do next, but supporting remote backup repositories seems like a sensible choice. I will need to do some research for that: I will need to learn about the S3 API, and look at the Rust iroh library for NAT hole punching.

Obnam is a large project, more than I can do by myself. Obnam needs, for example, documentation, even if at this stage for developers, not yet end users. There's code changes needed, too: more credential methods (password, TPM2 chip, ...), and all the code actually make backups. Someone will need to research and implement ways of splitting different kinds of files into chunks. It would be good to have a better idea of what's needed: use cases, acceptande criteria. There is no shortage of things to do.

What part of building backup software interests you? How would you like to help?

Posted
clingwrap, a Rust library for command line applications

Every program I write is in some sense a command line program, even if it transmogrifies itself into a server process or an interactive terminal user interface. It starts by doing the kinds of things a Unix command line program does: it parses the command line, maybe loads some configuration files, maybe runs some other programs.

I've made a crate, clingwrap, which makes a couple of the common things easier. I've done my best to implement the well and put them in a library. This means I don't keep copying the code from project to project, inevitably resulting in differences, and bugs fixed in one place, but not the others.

It's a small library, and may never grow big. There's a module for handling configuration files, and one to run other programs. Note that it's a library, not a framework. You call clingwrap, it doesn't call you.

I use clap for command line parsing, and don't feel to wrap or reinvent that.

Example

The code below parses configuration files for a "hello, world" program. It also validates the result of merging many configuration files. The result of the validation is meant to not require checking at run time: if the configuration files can be loaded, the configuration is valid.

use clingwrap::config::*;

use serde::{Deserialize, Serialize};

#[derive(Debug)]
struct Simple {
    greeting: String,
    whom: String,
}

#[derive(Debug, Clone, Default, Serialize, Deserialize, Eq, PartialEq)]
struct SimpleFile {
    greeting: Option<String>,
    whom: Option<String>,
}

impl<'a> ConfigFile<'a> for SimpleFile {
    type Error = SimpleError;

    fn merge(&mut self, config_file: SimpleFile) -> Result<(), Self::Error> {
        if let Some(value) = &config_file.greeting {
            self.greeting = Some(value.to_string());
        }
        if let Some(value) = &config_file.whom {
            self.whom = Some(value.to_string());
        }
        Ok(())
    }
}

#[derive(Default)]
struct SimpleValidator {}

impl ConfigValidator for SimpleValidator {
    type File = SimpleFile;
    type Valid = Simple;
    type Error = SimpleError;

    fn validate(&self, runtime: &Self::File) -> Result<Self::Valid, Self::Error> {
        Ok(Simple {
            greeting: runtime.greeting.clone().ok_or(SimpleError::Missing)?,
            whom: runtime.whom.clone().ok_or(SimpleError::Missing)?,
        })
    }
}

#[derive(Debug, thiserror::Error)]
enum SimpleError {
    #[error("required field has not been set")]
    Missing,
}
sopass and cross-device syncing in a password manager

Last year I wrote a command line password manager, after deciding I didn't like pass any more, and didn't like anything else I found, either. It's called sopass. I've switched over to sopass entirely. I'm happy with it, for my simple needs.

I've been thinking a lot about cross-device and group use. pass supports storing the encrypted secrets in Git and syncing them across computers, even between people. This usually works quite well, because each secret is in a separate file. Thus merge conflicts are unusual, unless the same secret is updated at the same time on two different hosts. That doesn't work with sopass, which puts all secrets in one file. That was one of the reasons I wrote the software.

If I were to support cross-device syncing in sopass, I'd want to do better than pass. I would want to entirely avoid merge conflicts.

The idea for implementing this that I have is to use a CRDT, a conflict-free replicated data type. Basically, a sopass database would be a Git repository and each atomic change would be a separate commit: set key to value, rename key, remove key. The CRDT would merge the changes in a way that guarantess there is never a conflict. This might require arbitrarily, but deterministically, choosing one change from a small set of changes that can't be ordered otherwise. That might result in occasional surprised users (what joy!), but no data is lost, it's still there in Git history. The UI could expose this in some way.

This would actually be an interesting technical challenge to implement, but given that I have a wealth of such challenges, a drought of free time, and no current need for this, I'm going to pass on this. But I thought I'd write up the thought in case it inspires someone else.

I don't accept donations

It is common to suggest to open source projects that they should ask for donations to fund development. My understanding is that this almost never works: very, very few people donate. I have other reasons to not do that.

I live in Finland. We have a law that requires prior permission from the police to appeal to the public for donations. That's why I don't ask for donations.

I also work full time, and I'm well compensated. I live comfortably, and have no significant unmet needs. I have a home, food, and healthcare, and so I'm lucky to not need donations. That's why I don't accept donations. I'd rather you donate to someone who needs it more than I do.

  • The law covers appealing for donations, not accepting donations.
  • This is why the Wikimedia Foundation doesn't fund raise in Finland.
  • The interpretation of the law by the police, the prosecutors, and the courts is sufficiently inconsistent and unpredictable that I don't want to try my luck. I'd rather avoid gray areas.
  • Don't ask me to explain why the law exists.
  • Don't ask me to interpret the law.
  • Don't ask me to defend the law.
  • Do ask me for my availability for your open source develompent needs.
  • I'm happy to sell my time to develop software. I have a company that I can use to invoice that work. Contact me privately if you're interested.

I asked a Finnish law firm to write up an expert opinion about funding open source projects in Finland. It's in Finnish, sorry.

(I've written and published this blog post so I have something to point people at, when the topic comes up in discussion.)

Posted
Future of vmdb2: need help

Summary: I'd like help maintaining vmdb2, my software for creating virtual machine images with Debian installed.

In 2011 I needed to create six similar Debian virtual machines, differing in Debian release and computer architecture. This was tedious, and so it needed to be automated. I wrote vmdebootstrap, which worked OK for a few years, but was not very flexible. It had a fixed sequence of operations that could only be slightly varied using options. When it worked, it was fine, but increasingly it didn't work. I was facing an ever-growing set of options, some of which would be mutually incompatible. With N options, you need to test N2 combinations. That did not appeal.

In 2017 I got tired of the growing complexity and write vmdb2, which didn't have a fixed sequence of operations. Instead, it read an input file that lists the operations to do, and their order. This was much more flexible. Combinatorial explosion averted.

I still maintain vmdb2 but for many years now it has been in a "selfish maintainership" mode, where I only really fix or change anything if it affects me, or I have some other such reason do something. I've done this to protect my free time and my sanity.

Despite this there are a few people using it and I think it's time to make sure vmdb2 has a better future.

The problem, from my point of view, with maintaining vmdb2 is that many people use to build images for systems that are wildly different from what I originally built vmdebootstrap for: Intel architecture virtual machines. Indeed, I do that myself: I built a Debian installer on top of vmdb2 for bare metal PC hardware (https://v-i.liw.fi/).

I am not any kind of deep expert in boot loaders, UEFI, or hardware support, or layers close to these in a Linux operating system. Debugging problems with these is tedious and frustratring. Reviewing changes related to them as well.

I also can't spend a ton more time on vmdb2, as I have an embarrassing plethora of other hobby projects.

Therefore, I'd like help maintaining vmdb2. If you use it, or this area of system software interests you, and you'd like to help, please let me know. If I can do something to make it easier for you to help, let me know.

My contact information is public. Email is preferred.

Posted
callisto: free, experimental CI server, initially for Rust projects

I develop CI systems as a hobby and for work. I want to gain experience in running what I've built, by running a service for others. I've set up a Radicle CI instance with my Ambient engine to run CI for open source Rust projects that have a Radicle repository. See callisto.liw.fi.

The offer:

My server runs CI for your project for free. You get feedback on whether your project builds, and its test suite runs successfully. If you can and want to, you tell me what you think of Ambient and Radicle CI. I find out if my CI system works for other people's projects and learn about missing features and other problems.

The idea is that you do me a favor and I do you a favor. In the best case we both benefit. In the worst case you waste a small amount of time and effort to try a new system.

I can't promise much, but I intend to keep this running for at least until the end of the year.

Some constraints:

  • For ideological reasons, this offer is only open to open source projects.
  • For technical reasons, your project must be in a Radicle repository and must be a Rust program. Radicle is how Ambient is notified that something is changed and that CI needs to run. Rust is required because Ambient downloads dependencies, and that is so far only implemented for Rust.
  • You get pass/fail status and a log for each run.
  • You don't get build artifacts. There is no delivery or deployment available. For now, I don't want to provide a service that publishes arbitrary files or that can access other servers. My server contains no secrets and has no access to anywhere else.

Some caveats:

  • Ambient is not mature software. It is not polished at all. It's a hobby project. User visible behavior in Ambient may change without warning. I try to avoid breaking anything, of course.
  • When I update software on the server, CI runs in progress may be terminated. Sorry. You can trigger a new run.
  • Counter caveat: I've been using Radicle with Ambient as my only CI system for most of this year so it's probably not entirely useless, maybe, possibly, I hope, but this experiment is to find out.
  • The CI server is configured so it will run when the default branch of the Radicle repository changes or when a Radicle "patch" is created or modified. A patch corresponds to a PR or MR.
  • CI runs in a virtual machine with no network access. The operating system is Debian 12 (bookworm), using CPU architecture amd64, with several Rust versions installed, with 2 virtual CPUs, 12 GiB RAM, a few tens of GB of disk space, about 30 GB of cache, and a maximum run time of 20 minutes. If these limits aren't enough, I may be able to accommodate special requests, but I'm trying to have little variation between CI projects for now.
  • Rust crate dependencies are downloaded before the VM starts and provided in the VM in /workspace/deps. If you need other dependencies that aren't in the VM, I'm going to say "sorry" for now.
  • The lack of network access is part of the security design of Ambient.
  • The server and service may go away at any time. This offer is an experiment and if I deem the experiment not worth continuing, I will terminate the service. Possibly without notice.
  • I may need to remove data and projects the server at any time, because hardware resources are limited. This might happen without warning.
  • I may need to wipe the server and re-install it from scratch, to recover from bad mistakes on my part. This too may happen without warning.
  • The above has a lot of warnings, sorry. I'm trying to manage expectations.

Selection process:

  • If you'd like your open source Rust project to use my server, post a message on the fediverse mentioning me (@liw@toot.liw.fi) with a short explanation of your project, and a link to its public Git repository. You can also email me (liw@liw.fi). If the repository is already in a Radicle repository, tell me the repository ID. You can create a Radicle repository after I tell you I'd like to select your project, if you prefer to wait.
  • I select some number of projects using nebulous and selfish criteria and add your repository to my CI server node and you can watch https://callisto.liw.fi/ for run information, including run logs. I'm likely to select all projects that seem benign, while the server has spare capacity.

Communication:

  • You follow me on the fediverse to get updates, or follow my blog. You can send me a direct message or email, if you prefer, while the experiment is running.
  • The Radicle Zulip chat system is also available, if you're willing to create an account. See the #radicle-ci channel there.

Documentation:

Human vs JSON output formatting: avoid mixing concerns

Two of the original ideas about Unix is that each program should do one thing and that programs should be able to be combine so they consume each others' output. This led to the convention and tradition that Unix command line programs produce output that's relatively easy for other programs to parse.

In practice, this meant that output was line based, one record per line, and columns on a line were separated by white space or other characters that were easy to match on, such as colons. In simple cases this is very easy, and so it's common, but as the world gets more complicated, simple cases are sometimes not enough.

Today, it's a common request today that a Unix command line program should optionally format output in a structured format, such as JSON.

Luckily, this is easy enough to do, in most languages. In the Rust language, the powerful serde set of libraries makes this particularly easy.

However, adding JSON output support to an existing program can be tedious. A very common implementation approach is to mix the logic for figuring out what to output and the logic for how to format the output. If there's only one output format, mixing these concerns is often the simplest path forward. In very resource constrained environments it can be the only way, if there isn't enough memory to store all of the data to be formatted to output at once.

When multiple output formats need to be supported, and it's possible to store all of the output data in memory at once, I prefer to separate the concerns. First I collect all the data to be output, then I produce output in the desired output format.

As an example in the Rust language:

#![allow(dead_code)]

use serde::Serialize;

fn main() {
    let output = Output {
        name: "example".into(),
        values: vec![
            OutputValue {
                key: "foo".into(),
                value: "bar".into(),
            },
            OutputValue {
                key: "yo".into(),
                value: "yoyo".into(),
            },
        ],
    };

    println!("========================================");
    println!("humane output:");
    println!();
    println!("name: {}", output.name);
    println!("values:");
    for v in output.values.iter() {
        println!("  {}={}", v.key, v.value);
    }

    println!("========================================");
    println!("debug output:");
    println!();
    println!("{output:#?}");

    println!("========================================");
    println!("JSON output:");
    println!();
    println!("{}", serde_json::to_string_pretty(&output).unwrap());
}

#[derive(Debug, Serialize)]
struct Output {
    name: String,
    values: Vec<OutputValue>,
}

#[derive(Debug, Serialize)]
struct OutputValue {
    key: String,
    value: String,
}

This is a very simplistic example, of course, but shows how the two concerns can be separated.

I've converted a few programs to this style over the years. The hard part is always teasing apart the data collection and the output formatting. It needs to be done carefully to avoid breaking anything that depends on the existing output format.

In any new programs I write, I separate the concerns from the beginning to be kind to my future self.

Why I use Debian

I've been using the Debian Linux distribution since the mid-1990s. I still use it. I had a brief exploration of Linux distributions, early on. It was brief partly, because there were so few of them.

  • My first Linux installation was by Linus, to develop and test the installation method. He'd never installed Linux, because it grew on his PC on top of an existing Minix installation. He used my PC to figure out how to install Linux. This was in 1991.

  • I then tried subsequent boot+root floppy images, by Linus or others, and the MCC Interim Linux distribution, and SLS. Possibly one or two others.

  • Then in 1993, Debian was announced. I think I tried it first in 1994, and was hooked. Debian was interesting in particular because it was a community project: I could join and help. So I did. I became a Debian developer in 1996.

I've since used Ubuntu (while working for Canonical), and Baserock (while working for Codethink), and I've looked at several others, but I always return to Debian.

I like Debian for several reasons:

  • I know it very well.
  • I know the community of Debian developers.
  • I trust the community of Debian developers.
  • I trust the Debian project to follow its Social Contract.
  • I trust the Debian project to vet software freedom of what they package.
  • I trust the Debian project to update its packages for security fixes.
  • I trust the Debian project to keep the privacy of its users in mind.

The key word here is trust. Over thirty years, I've built very strong trust in Debian doing the right thing, from my point of view. That's a pretty high bar for any other distribution to clear.

Posted
riki page names and internal links

I'm building riki, my partial ikiwiki clone, from the bottom up. My previous two attempt have been more top down. I'm now thining that for this project at least it makes sense to first build the fundamental building blocks that I know I'll be needing, and do the higher level logic on top of that later.

The first building block is a way to represent page names, and to resolve references from one page to another, within a site. I'm trying to mimick what ikiwiki does, since I'm aiming to be compatible with it.

  • A page is the unit of a site. A site consits of one or more pages.
    • note that I consider "blobs" such as images to each be pages, but they're not parsed as wiki text, and merely get copied to the output directory as-is
  • A "page path name" is the file system pathname to the source file of a page, relative to root of the site source directory.
  • A "page name" is the path from the root of site to a page. It refers to the logial page, not the source file or the generated file.
  • A "link" is a path from one logical page to another.

Examples:

  • a file index.mdwn (page path name) at the root of source tree becomes page / and output file index.html
  • file foo/bar.mdwn becomes page foo/bar and output file foo/bar/index.html
  • if file foo/bar.mdwn links to (refers to) page /bar, it refers to page bar at the root of the site, which corresponds to file bar.mdwn in the source and bar/index.html in the source tree

I like the ikiwiki default of using a directory for each page in the output. In other words, source file foo.mdwn becomes foo/index.html in the output, to represent page foo.

I'm not fond of the .mdwn suffix for markdown files, which ikiwiki has been using for a very long time. I will make riki support it, but will later also support the .md suffix. Initially, I'll stick with the longer suffix only.

I've implemented a Rust type PageName to represent a page name, and RelativePath to represent the path from one page to another. I like to use Rust types to help me keep track of what's what and to help me avoid silly mistakes. These two types are basically slighly camouflaged strings.

More interestingly, I've also implemented a type Pages that represents the complete set of pages in a site. This exists only to allow me to implement the method resolve:

fn resolve(&self, source: &PageName, link: &str) -> Result<PageName, PageNameError>

This returns the name of the page that the link refers to. ikiwiki has a somewhat intricate set of linking rules which this method implements. This will be used in many places: everywhere a page refers to another page on the site. Thus, this is truly a fundamental building block that has to be correct.

The source code module implemetning all of the above is in Git if you want all the dirty details. I expect it to change, but I wanted to at least get the logic for linking rules done and that was easier if it's all in one module.