blog.liw.fi

Welcome to my web log. See the first post for an introduction. See the archive page for all posts. (There is an english language feed if you don't want to see Finnish.)

Archives Tags Moderation policy Main site

Me on Mastodon, for anything that is too small to warrant a blog post.

All content outside of comments is copyrighted by Lars Wirzenius, and licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. Comments are copyrighted by their authors. (No new comments are allowed.)

RSS Atom

Ambient CI is the CI engine I'm building for myself, and tentatively for Radicle CI. I last blogged about it as a project over a year ago. Since then, the focus and goal of the project has crystallized in my head as:

You should be able to run CI for other people safely and securely, without much effort.
We as software developers should be able to share our computing resources with each other to run CI for each other. This should be safe and secure for us to do.
We should be able to run CI locally and get the same exact results as when a server does it.

In other words, safe and securet distributed and local CI for everyone.

The approach I've taken is that any code from the software under test, or any of its dependencies, none of which can inherently be trusted, is only ever run in an isolated, network-less virtual machine. I've proven to myself that this works well enough. It will not, of course, work for any software project that inherently requires network access to build or test, but those are rare excecptions.

For now, I mostly care about automatically building software, and running its automated tests. Ambient can publish build artifacts with rsync and deb packages with dput; other delivery methods are easy to add, too. I have not worried about deployment, yet. Deployment seems a tricky problem in a distributed system, but I'll worry about it when the integration parts work well.

Ambient is not there yet. It will take a lot more work, but there's been some progress.

Most importantly for me, I've integrated the Ambient engine with the Radicle CI subsystem. This is important to me because Radicle, the distributed Git forge, handles all the boring server parts and I don't need to implement them. I get paid to work on Radicle CI and we're experimenting with using Ambient as the default CI engine.
I've used Ambient, with Radicle, as my primary CI system for this year. The combination has worked well. I also use GitLab CI on gitlab.com for one or two projects where I collaborate with people who don't want to use Radicle.
I've set up https://callisto.liw.fi/, a host that runs CI with Ambient+Radicle for open source Rust projects. See https://callisto.liw.fi/callisto/ for instructions, and https://blog.liw.fi/posts/2025/callisto/ for the announcement. I do this to get more experience with running CI for other people.

There are, of course, problems (see open issues). Apart from the recorded issues, I worry about what I don't know about. Please educate me.

Two of the big problems I know about are:

Before the actual build happens, in a network-less virtual machine, build dependencies need to be downloaded. I've only implmented this for Rust crates, and even that needs improvement. Other languages and other build dependencies need to supported too. This is an area where I will certainly need help, as I don't know most language ecosystems.
The dependency downloading and the delivery actions are run on the host system. Ambient needs to isolate them, too, into being run in a virtual machine. If nothing else, this relieves the host system from having go have language tool chains installed.
So far I'm using a custom build Debian image for the virtual machine. I write my own VM image building software, so this is no big deal for me. However, Ambient really should be able to use published "cloud images" for any operating system, as a base image. I only deal with Debian, really, so I'll need help with this, even for other Linux distributions. From a software architecture point of view, Ambient requires fairly little from the VM image. Maybe some day someone will add support for Windows and macOS, too.
Ambient runs the VM using QEMU, and needs to unlock architecture support by not assuming VM image is in the host architecture. Emulating a foreign architecture is not very fast, but slow is better than impossible. Imagine being able to produce binaries for x86_64, aarch64, riscv64, and any of the other architecture QEMU supports.
Ambient currently only supports a very straightforward CI run plan, with a linear sequence of actions to execute. Ambient needs to support matrix builds, and more generally build graphs. By build graphs I mean things like generating some artifacts once, then using those on a list of architectures to produce more, then process those artifacts once, and finally deliver some or all of the artifact to various places. In a distributed manner, of course.
I need to learn how to count. I also need to learn to resist the temptation to make a tired joke about hard problems in computer science, but that keeps dropping out of my short term memory.

That's where Ambient stands currently. It works, at least in my simple use, and it's distributed.

If CI systems interest you, give Ambient a look. Let me know what you think, by email or on the fediverse.

I've just released version 0.6.0 for sopass, my command line password manager that I use instead of pass.

Version 0.6.0, released 2025-10-31

If I were of the American persuasion, this would be a spooky release. But I'm not, so it's a comfy release that doesn't scare anyone.

The sopass value generate command generates a new random value for a name.
There have also been other changes. A deb package is built and published by CI for every merge into the main branch. The documentation of acceptance criteria is published at https://doc.liw.fi/sopass/sopass.html. Lars has decided to not work on cross-device syncing, as it's not something he needs, even though it's an interesting technical problem.

I've spent most Sundays for the past half a year implementing Obnam 3, the third generation of my backup program. I've posted a blog post of each three-hour session on the Obnam blog. It's way too much detail for anyone not passionately interested in this project. Here is a summary of what I've done. There is also an appeal for help.

I've implemented the lowermost storage layer of storing backups: the chunk. A chunk is a piece of data, either a small file or a part of a longer file. The chunk is encrypted in a way that also allows verifying the chunk hasn't been modified while in backup storage.

Each chunk is encrypted with a random, computer-generated symmetric key, which the user never sees. There can be any number of such keys, for different groups of chunks, although the implementation doesn't yet make it convenient to choose the key to use when encrypting a chunk. The chunk keys are stored in a client chunk, which itself is encrypted with another random, computer-generated key, the client key.

The client key is encrypted in various ways, and the result of each of those encryption operations is stored in a credential chunk. I've implemented credential encryption methods using OpenPGP software keys, and OpenPGP cards.

This part works and although it needs polish, I'm pretty happy with it.

There is also a rudimentary backup repository, which stores chunks in a local directory and allows searching for chunks by id or label. Chunk labels are short strings cryptographically attached to the chunk to give the type of a chunk, or the encrypted checksum of the plaintext data in a chunk, for de-duplication.

I've intentionally limited myself to a single Sunday session per week, at most three hours per session. This has been quite enjoyable: I am not in a hurry, and I can try to be careful and deliberate. In my profession that is not as common as I would like. Three hours a week has been enough to make progress, even if slowly. But fast enough for a hobby project.

i'm not yet sure what I will do next, but supporting remote backup repositories seems like a sensible choice. I will need to do some research for that: I will need to learn about the S3 API, and look at the Rust iroh library for NAT hole punching.

Obnam is a large project, more than I can do by myself. Obnam needs, for example, documentation, even if at this stage for developers, not yet end users. There's code changes needed, too: more credential methods (password, TPM2 chip, ...), and all the code actually make backups. Someone will need to research and implement ways of splitting different kinds of files into chunks. It would be good to have a better idea of what's needed: use cases, acceptande criteria. There is no shortage of things to do.

What part of building backup software interests you? How would you like to help?

Every program I write is in some sense a command line program, even if it transmogrifies itself into a server process or an interactive terminal user interface. It starts by doing the kinds of things a Unix command line program does: it parses the command line, maybe loads some configuration files, maybe runs some other programs.

I've made a crate, clingwrap, which makes a couple of the common things easier. I've done my best to implement the well and put them in a library. This means I don't keep copying the code from project to project, inevitably resulting in differences, and bugs fixed in one place, but not the others.

It's a small library, and may never grow big. There's a module for handling configuration files, and one to run other programs. Note that it's a library, not a framework. You call clingwrap, it doesn't call you.

I use clap for command line parsing, and don't feel to wrap or reinvent that.

Example

The code below parses configuration files for a "hello, world" program. It also validates the result of merging many configuration files. The result of the validation is meant to not require checking at run time: if the configuration files can be loaded, the configuration is valid.

use clingwrap::config::*;

use serde::{Deserialize, Serialize};

#[derive(Debug)]
struct Simple {
    greeting: String,
    whom: String,
}

#[derive(Debug, Clone, Default, Serialize, Deserialize, Eq, PartialEq)]
struct SimpleFile {
    greeting: Option<String>,
    whom: Option<String>,
}

impl<'a> ConfigFile<'a> for SimpleFile {
    type Error = SimpleError;

    fn merge(&mut self, config_file: SimpleFile) -> Result<(), Self::Error> {
        if let Some(value) = &config_file.greeting {
            self.greeting = Some(value.to_string());
        }
        if let Some(value) = &config_file.whom {
            self.whom = Some(value.to_string());
        }
        Ok(())
    }
}

#[derive(Default)]
struct SimpleValidator {}

impl ConfigValidator for SimpleValidator {
    type File = SimpleFile;
    type Valid = Simple;
    type Error = SimpleError;

    fn validate(&self, runtime: &Self::File) -> Result<Self::Valid, Self::Error> {
        Ok(Simple {
            greeting: runtime.greeting.clone().ok_or(SimpleError::Missing)?,
            whom: runtime.whom.clone().ok_or(SimpleError::Missing)?,
        })
    }
}

#[derive(Debug, thiserror::Error)]
enum SimpleError {
    #[error("required field has not been set")]
    Missing,
}

Last year I wrote a command line password manager, after deciding I didn't like pass any more, and didn't like anything else I found, either. It's called sopass. I've switched over to sopass entirely. I'm happy with it, for my simple needs.

I've been thinking a lot about cross-device and group use. pass supports storing the encrypted secrets in Git and syncing them across computers, even between people. This usually works quite well, because each secret is in a separate file. Thus merge conflicts are unusual, unless the same secret is updated at the same time on two different hosts. That doesn't work with sopass, which puts all secrets in one file. That was one of the reasons I wrote the software.

If I were to support cross-device syncing in sopass, I'd want to do better than pass. I would want to entirely avoid merge conflicts.

The idea for implementing this that I have is to use a CRDT, a conflict-free replicated data type. Basically, a sopass database would be a Git repository and each atomic change would be a separate commit: set key to value, rename key, remove key. The CRDT would merge the changes in a way that guarantess there is never a conflict. This might require arbitrarily, but deterministically, choosing one change from a small set of changes that can't be ordered otherwise. That might result in occasional surprised users (what joy!), but no data is lost, it's still there in Git history. The UI could expose this in some way.

This would actually be an interesting technical challenge to implement, but given that I have a wealth of such challenges, a drought of free time, and no current need for this, I'm going to pass on this. But I thought I'd write up the thought in case it inspires someone else.

It is common to suggest to open source projects that they should ask for donations to fund development. My understanding is that this almost never works: very, very few people donate. I have other reasons to not do that.

I live in Finland. We have a law that requires prior permission from the police to appeal to the public for donations. That's why I don't ask for donations.

I also work full time, and I'm well compensated. I live comfortably, and have no significant unmet needs. I have a home, food, and healthcare, and so I'm lucky to not need donations. That's why I don't accept donations. I'd rather you donate to someone who needs it more than I do.

The law covers appealing for donations, not accepting donations.
This is why the Wikimedia Foundation doesn't fund raise in Finland.
The interpretation of the law by the police, the prosecutors, and the courts is sufficiently inconsistent and unpredictable that I don't want to try my luck. I'd rather avoid gray areas.
Don't ask me to explain why the law exists.
Don't ask me to interpret the law.
Don't ask me to defend the law.
Do ask me for my availability for your open source develompent needs.
I'm happy to sell my time to develop software. I have a company that I can use to invoice that work. Contact me privately if you're interested.

I asked a Finnish law firm to write up an expert opinion about funding open source projects in Finland. It's in Finnish, sorry.

(I've written and published this blog post so I have something to point people at, when the topic comes up in discussion.)

Summary: I'd like help maintaining vmdb2, my software for creating virtual machine images with Debian installed.

In 2011 I needed to create six similar Debian virtual machines, differing in Debian release and computer architecture. This was tedious, and so it needed to be automated. I wrote vmdebootstrap, which worked OK for a few years, but was not very flexible. It had a fixed sequence of operations that could only be slightly varied using options. When it worked, it was fine, but increasingly it didn't work. I was facing an ever-growing set of options, some of which would be mutually incompatible. With N options, you need to test N² combinations. That did not appeal.

In 2017 I got tired of the growing complexity and write vmdb2, which didn't have a fixed sequence of operations. Instead, it read an input file that lists the operations to do, and their order. This was much more flexible. Combinatorial explosion averted.

I still maintain vmdb2 but for many years now it has been in a "selfish maintainership" mode, where I only really fix or change anything if it affects me, or I have some other such reason do something. I've done this to protect my free time and my sanity.

Despite this there are a few people using it and I think it's time to make sure vmdb2 has a better future.

The problem, from my point of view, with maintaining vmdb2 is that many people use to build images for systems that are wildly different from what I originally built vmdebootstrap for: Intel architecture virtual machines. Indeed, I do that myself: I built a Debian installer on top of vmdb2 for bare metal PC hardware (https://v-i.liw.fi/).

I am not any kind of deep expert in boot loaders, UEFI, or hardware support, or layers close to these in a Linux operating system. Debugging problems with these is tedious and frustratring. Reviewing changes related to them as well.

I also can't spend a ton more time on vmdb2, as I have an embarrassing plethora of other hobby projects.

Therefore, I'd like help maintaining vmdb2. If you use it, or this area of system software interests you, and you'd like to help, please let me know. If I can do something to make it easier for you to help, let me know.

My contact information is public. Email is preferred.

I develop CI systems as a hobby and for work. I want to gain experience in running what I've built, by running a service for others. I've set up a Radicle CI instance with my Ambient engine to run CI for open source Rust projects that have a Radicle repository. See callisto.liw.fi.

The offer:

My server runs CI for your project for free. You get feedback on whether your project builds, and its test suite runs successfully. If you can and want to, you tell me what you think of Ambient and Radicle CI. I find out if my CI system works for other people's projects and learn about missing features and other problems.

The idea is that you do me a favor and I do you a favor. In the best case we both benefit. In the worst case you waste a small amount of time and effort to try a new system.

I can't promise much, but I intend to keep this running for at least until the end of the year.

Some constraints:

For ideological reasons, this offer is only open to open source projects.
For technical reasons, your project must be in a Radicle repository and must be a Rust program. Radicle is how Ambient is notified that something is changed and that CI needs to run. Rust is required because Ambient downloads dependencies, and that is so far only implemented for Rust.
You get pass/fail status and a log for each run.
You don't get build artifacts. There is no delivery or deployment available. For now, I don't want to provide a service that publishes arbitrary files or that can access other servers. My server contains no secrets and has no access to anywhere else.

Some caveats:

Ambient is not mature software. It is not polished at all. It's a hobby project. User visible behavior in Ambient may change without warning. I try to avoid breaking anything, of course.
When I update software on the server, CI runs in progress may be terminated. Sorry. You can trigger a new run.
Counter caveat: I've been using Radicle with Ambient as my only CI system for most of this year so it's probably not entirely useless, maybe, possibly, I hope, but this experiment is to find out.
The CI server is configured so it will run when the default branch of the Radicle repository changes or when a Radicle "patch" is created or modified. A patch corresponds to a PR or MR.
CI runs in a virtual machine with no network access. The operating system is Debian 12 (bookworm), using CPU architecture amd64, with several Rust versions installed, with 2 virtual CPUs, 12 GiB RAM, a few tens of GB of disk space, about 30 GB of cache, and a maximum run time of 20 minutes. If these limits aren't enough, I may be able to accommodate special requests, but I'm trying to have little variation between CI projects for now.
Rust crate dependencies are downloaded before the VM starts and provided in the VM in /workspace/deps. If you need other dependencies that aren't in the VM, I'm going to say "sorry" for now.
The lack of network access is part of the security design of Ambient.
The server and service may go away at any time. This offer is an experiment and if I deem the experiment not worth continuing, I will terminate the service. Possibly without notice.
I may need to remove data and projects the server at any time, because hardware resources are limited. This might happen without warning.
I may need to wipe the server and re-install it from scratch, to recover from bad mistakes on my part. This too may happen without warning.
The above has a lot of warnings, sorry. I'm trying to manage expectations.

Selection process:

If you'd like your open source Rust project to use my server, post a message on the fediverse mentioning me (@liw@toot.liw.fi) with a short explanation of your project, and a link to its public Git repository. You can also email me (liw@liw.fi). If the repository is already in a Radicle repository, tell me the repository ID. You can create a Radicle repository after I tell you I'd like to select your project, if you prefer to wait.
I select some number of projects using nebulous and selfish criteria and add your repository to my CI server node and you can watch https://callisto.liw.fi/ for run information, including run logs. I'm likely to select all projects that seem benign, while the server has spare capacity.

Communication:

You follow me on the fediverse to get updates, or follow my blog. You can send me a direct message or email, if you prefer, while the experiment is running.
The Radicle Zulip chat system is also available, if you're willing to create an account. See the #radicle-ci channel there.

Documentation:

Instructions on callisto
I name my computers after Finnish heavy metal bands. This srver is named after Callisto.

Two of the original ideas about Unix is that each program should do one thing and that programs should be able to be combine so they consume each others' output. This led to the convention and tradition that Unix command line programs produce output that's relatively easy for other programs to parse.

In practice, this meant that output was line based, one record per line, and columns on a line were separated by white space or other characters that were easy to match on, such as colons. In simple cases this is very easy, and so it's common, but as the world gets more complicated, simple cases are sometimes not enough.

Today, it's a common request today that a Unix command line program should optionally format output in a structured format, such as JSON.

Luckily, this is easy enough to do, in most languages. In the Rust language, the powerful serde set of libraries makes this particularly easy.

However, adding JSON output support to an existing program can be tedious. A very common implementation approach is to mix the logic for figuring out what to output and the logic for how to format the output. If there's only one output format, mixing these concerns is often the simplest path forward. In very resource constrained environments it can be the only way, if there isn't enough memory to store all of the data to be formatted to output at once.

When multiple output formats need to be supported, and it's possible to store all of the output data in memory at once, I prefer to separate the concerns. First I collect all the data to be output, then I produce output in the desired output format.

As an example in the Rust language:

#![allow(dead_code)]

use serde::Serialize;

fn main() {
    let output = Output {
        name: "example".into(),
        values: vec![
            OutputValue {
                key: "foo".into(),
                value: "bar".into(),
            },
            OutputValue {
                key: "yo".into(),
                value: "yoyo".into(),
            },
        ],
    };

    println!("========================================");
    println!("humane output:");
    println!();
    println!("name: {}", output.name);
    println!("values:");
    for v in output.values.iter() {
        println!("  {}={}", v.key, v.value);
    }

    println!("========================================");
    println!("debug output:");
    println!();
    println!("{output:#?}");

    println!("========================================");
    println!("JSON output:");
    println!();
    println!("{}", serde_json::to_string_pretty(&output).unwrap());
}

#[derive(Debug, Serialize)]
struct Output {
    name: String,
    values: Vec<OutputValue>,
}

#[derive(Debug, Serialize)]
struct OutputValue {
    key: String,
    value: String,
}

This is a very simplistic example, of course, but shows how the two concerns can be separated.

I've converted a few programs to this style over the years. The hard part is always teasing apart the data collection and the output formatting. It needs to be done carefully to avoid breaking anything that depends on the existing output format.

In any new programs I write, I separate the concerns from the beginning to be kind to my future self.

I've been using the Debian Linux distribution since the mid-1990s. I still use it. I had a brief exploration of Linux distributions, early on. It was brief partly, because there were so few of them.

My first Linux installation was by Linus, to develop and test the installation method. He'd never installed Linux, because it grew on his PC on top of an existing Minix installation. He used my PC to figure out how to install Linux. This was in 1991.
I then tried subsequent boot+root floppy images, by Linus or others, and the MCC Interim Linux distribution, and SLS. Possibly one or two others.
Then in 1993, Debian was announced. I think I tried it first in 1994, and was hooked. Debian was interesting in particular because it was a community project: I could join and help. So I did. I became a Debian developer in 1996.

I've since used Ubuntu (while working for Canonical), and Baserock (while working for Codethink), and I've looked at several others, but I always return to Debian.

I like Debian for several reasons:

I know it very well.
I know the community of Debian developers.
I trust the community of Debian developers.
I trust the Debian project to follow its Social Contract.
I trust the Debian project to vet software freedom of what they package.
I trust the Debian project to update its packages for security fixes.
I trust the Debian project to keep the privacy of its users in mind.

The key word here is trust. Over thirty years, I've built very strong trust in Debian doing the right thing, from my point of view. That's a pretty high bar for any other distribution to clear.