Welcome to my web log. See the first post for an introduction. See the archive page for all posts. (There is an english language feed if you don't want to see Finnish.)

Archives Tags Moderation policy Main site

Me on Mastodon, for anything that is too small to warrant a blog post.

All content outside of comments is copyrighted by Lars Wirzenius, and licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. Comments are copyrighted by their authors. (No new comments are allowed.)


Mike Godwin in an essay on slate.com:

That’s the biggest thing I learned at the Wikimedia Foundation: When ordinary people are empowered to come together and work on a common, humanity-benefiting project like Wikipedia, unexpectedly great and positive things can happen. Wikipedia is not the anomaly my journalist friend thinks it is. Instead, it’s a promise of the good works that ordinary people freed by the internet can create. I no longer argue primarily that the explosion of freedom of expression and diverse voices, facilitated by the internet, is simply a burden we dutifully have to bear. Now, more than I ever did 30 years ago, I argue that it’s the solution.

I thought that was well said.

Posted Mon Mar 9 09:35:00 2020 Tags:

I asked a couple of weeks ago what people like or hate about email. Here's a summary of the responses. I admit the summary may be tainted by my current thinking about re-inventing email.

Like

  • It's not real time. Sender and recipient do not net need to be participating in the communication at the same time. The sender can take their time to craft their message, the recipient can take their time to ponder on the message and how to respond.

  • It's established, ubiquitous.

  • It's de-centralized.

  • It's built on top of well-known data formats and protocols, and data can be stored locally under user control, and is highly portable. There are a variety of client software to choose from.

  • Separate discussions are kept separate.

  • Formatting, attachments, and lenght is flexible.

  • Mailing lists can be archived publically.

  • One can have many accounts, and people comprehend this.

  • Subject lines.

  • Email providers are neutral, commodity entities. Choosing one doesn't imply membership in a community.

Not like

  • Unreliable for communication, often due to bad anti-spam.

  • People sending one-line replies that don't add actual value or that miss the point entirely.

  • Encryption, security, privacy, rich media content, formatted messages, etc, are all built on top of older protocols, often resulting in unfortunate consequences.

  • Top quoting.

  • De-facto oligopoly.

  • Spam.

  • Abuse.

  • Configuring and administering email servers is complex.

  • Filters and organisation of email is often difficult. The tools provided are not always well suited for the task.

  • Threading is unreliable.

  • Email addresses are too tightly tied to your identity.

  • Searching is often inadequate.

Posted Sun Mar 8 09:33:00 2020 Tags:

A friend expressed interest in how I keep my journal, so I set up a demo site. In short:

Posted Sat Mar 7 11:32:00 2020

I wrote an alternative Debian installer as a toy, called v-i. One of the following two bullet points is correct:

  • v-i can install a very rudimentary Debian onto exactly one computer in the world: my very own spare Thinkpad x220 laptop. It might not work on your x220. v-i almost certainly won't work on any other kind of computer. If you try, it will probably delete all your data. Make sure your backups work.

  • v-i is perfect in every way. There are not even any typos in the manual. There are no bugs, and all features are fully implemented. Every possible use case is supported. Not only is there no danger to your data, v-i will prevent it from ever disappering. Even your hardware will never break again. Your laptop will have infinite battery life, and your screen resolution will require 64 bit integers to express.

The v-i installer is based on the vmdb2 tool, which I also wrote. It has nothing to do with debian-installer, which is the official Debian installer, also known as d-i. I use d-i, but have a couple of things I wanted to change:

  • I'd like something I can easily modify. d-i requires building special udeb packages for any software that's to be part of the installer. v-i is happy with normal debs.

  • Debian in general uses preseeding for automating an installation. Preseeding means providing answers, in a file, to questions the package may ask during its installation. This is fine, if a little cumbersome, but only helps when the packages ask the right questions. v-i lets you have the full power of Ansible during initial installation, which is much more flexible.

On the other hand, d-i is mature software and tested by a very large number of people, on a very large number of different hardware. v-i is not. v-i might, at best, be the beginning of something useful for a small number of people.

I can now install Debian onto my x220 with v-i. It's a very basic install, without LVM2, full-disk encryption, or a graphical desktop, but it does have sshd and I can configure the laptop further with Ansible from another host. I've installed the GNOME desktop that way, after rebooting into a v-i installed system. (In theory, I could install GNOME directly from v-i. In practice, there are bugs in packages and/or how vmdb2 runs Ansible.)

The installed system is also highly configured to my needs and preferences. It uses Finnish locales, and requires my SSH key to log in. The root account has no password. All of this could be made better with a bit of work.

The code is at https://gitlab.com/larswirzenius/v-i. Check the README for more instructions if you're curious. If you do give it a try, I'd love to hear from you, unless you just lost all your data. Please don't lose all your data.

If you'd like to help build a more viable installer from v-i, please talk to me. I dream of a future where I can install a bare metal machine as easily as I can create and configure a VM.

PS. A 128 GB USB3 flash drive can be had for as little as 20 euros, and that has enough disk space for v-i and a Debian mirror.

If you want to respond to this blog post, please email me (liw@liw.fi) or respond to this fediverse post.

Posted Sat Feb 29 20:12:00 2020 Tags:

A continuous integration engine (CI) takes the source code for a software project and ensures it works. In less abstract terms, it builds it, and runs any automated tests it may have. The exact steps for that depend heavily on the CI engine and the project, but can be thought of as follows (with concrete examples of possible commands):

  • retrieve the desired revision of the source code (git clone, git checkout)
  • install build dependencies (dpkg-checkbuilddeps, apt install)
  • build (./configure, make)
  • test (make check)

This is dangerous stuff. In the specific case of an open, hosted CI service, it's especially dangerous: anyone can submit any build, and that build can do anything, including attack computers anywhere on the Internet. However, even in a CI engine that only builds projects for in-house developers, it's risky: most attacks on IT are done by insiders.

Apart from actual attacks, building software is dangerous also due to accidents: a mistake in the way software is built, or automatically tested, can result in what looks and behaves like an attack. An infinite loop can use excessive amounts of CPU resources, or block other projects from getting built.

I've been thinking about ways to deal with this, in the context of developing a CI engine, and here's a list of specific threats I've come up with:

  • excessive use build host resources
    • e.g., CPU, GPU, RAM, disk, etc
    • mitigation: use quotas or other hard limits that can't be exceeded (e.g., dedicated file system for build, virtual machine with a virtual memory limit)
    • mitigation: monitor use, stop build if use goes over a limit, if a quota is infeasible (e.g., CPU time)
  • excessive use of network bandwidth
    • mitigation: monitor use, stop build if it goes over a limit
  • attack on a networked target via a denial of service attack
    • e.g., build joins a DDoS swarm, or sends fabricated SYN packets to prevent target from working
    • mitigation: prevent direct network access for build, force all outgoing connections to go via a proxy that validates requests and stops build if anything looks suspicious
  • attack on build host, or other host, via network intrusion
    • e.g., port scanning, probing for known vulnerabilities
    • mitigation: prevent direct network access for build, force all outgoing connections to go via a proxy that validates requests and stops build if anything looks suspicious
  • attack build host directly without network
    • e.g., by breaching security isolation using build host kernel or hardware vulnerabilities, or CI engine vulnerabilities
    • this includes eavesdropping on the host, and stealing secrets
    • mitigation: keep build host up to date on security updates
    • mitigation: run build inside a VM controlled by CI engine (on the assumption that a VM provides better security isolation than a Linux container)

I'm sure this is not an exhaustive list. If you can think of additional risks, do tell me.

My current plan for mitigating all of the above looks as follows:

  • there are two, nested virtual machines
  • the outer VM is the manager, the inner VM is the builder
  • the manager creates, controls, monitors, and destroys the builder
  • the outer VM is probably Debian Linux, since that what I know best, using libvirt with Qemu and KVM to manage the inner VM
  • the inner VM can be any operating system, as long as it can run as a Qemu/KVM guest, and provides ssh access from the outer VM
  • the manager runs commands on the builder over ssh, or possibly via serial console (ssh would be simpler, though)
  • both VMs have a restricted amount of CPUs, RAM, disk space
  • the manager monitors the builder's use of CPU time, bandwidth use
  • the manager proxies and firewalls all outgoing network access to prevent any access that isn't explicitly allowed

To look at the build steps from the top of this article, they would work something like this:

  • retrieve the desired revision of the source code: the builder does this, but proxied via the manager, which checks that only from servers listed as allowed for this project are connected
  • install build dependencies: the builder downloads the build dependencies, but proxied via the manager, which checks that downloads come only from servers listed as allowed for this project
  • build: runs inside the builder
  • test: runs inside the builder

It would be awesome if the manager could disable the builder from having network access after build dependencies are installed. This would be feasible if the build recipe is structured in a way that allows the manager to know what part is doing what. (If I'm designing the CI engine, then I can probably achieve that.)

It would be even more awesome if the manager could do all the downloading, but given the guest may need to use tools specific for its operating system, which might not be available on the operating system of the manager, this might not be feasible. A filtering HTTP or HTTPS proxy may need to be enough.

What threat am I missing? Are my mitigations acceptable?

If you want to comment on this blog post, please send me email (liw@liw.fi), or respond on the fediverse on this thread. Thank you!

Posted Fri Feb 28 09:28:00 2020 Tags:

Would you be willing to try Subplot for acceptance testing for one of your real projects, and give us feedback? We're looking for two volunteers.

given a project
when it uses Subplot
then it is successful

Subplot is a tool for capturing and automatically verifying the acceptance criteria for a software project or a system, in a way that's understood by all stakeholders.

In a software project there are always more than one stakeholder. Even in a project one writes for oneself, there are two stakeholders: oneself, and that malicious cretin oneself-in-the-future. More importantly, though, there are typically stakeholders such as end users, sysadmins, clients, software architects, developers, and testers. They all need to understand what the software should do, and when it's in an acceptable state to be put into use: in other words, what the acceptance criteria are.

Crucially, all stakeholders should understand the acceptance criteria the same way, and also how to verify they are met. In an ideal situation, all verification is automated, and happens very frequently.

There are various tools for this, from generic documentation tooling (word processors, text editors, markup languages, etc) to test automation (Cucumber, Selenium, etc). On the one hand, documenting acceptance criteria in a way that all stakeholders understand is crucial: otherwise the end users are at risk of getting something that's not useful to help them, and the project is a waste of everyone's time and money. On the other hand, automating the verification of how acceptance criteria is met is also crucial: otherwise it's done manually, which is slow, costly, and error prone, which increases the risk of project failure.

Subplot aims to solve this by an approach that combines documentation tooling with automated verification.

  • The stakeholders in a project jointly produce a document that captures all relevant acceptance criteria and also describes how they can be verified automatically, using scenarios. The document is written using Markdown.

  • The developer stakeholders produce code to implement the steps in the scenarios. The Subplot approach allows the step implementations to be done in a highly cohesive, de-coupled manner, making such code usually be quite simple. (Test code should be your best code.)

  • Subplot's "docgen" program produces a typeset version as PDF or HTML. This is meant to be easily comprehensible by all stakeholders.

  • Subplot's "codegen" program produces a test program in the language used by the developer stakeholders. This test program can be run to verify that acceptance criteria are met.

Subplot started in in late 2018, and was initially called Fable. It is based on the yarn tool for the same purpose, from 2013. Yarn has been in active use all its life, if not popular outside a small circle. Subplot improves on yarn by improving document generation, markup, and decoupling of concerns. Subplot is not compatible with yarn.

Subplot is developed by Lars Wirzenius and Daniel Silverstone as a hobby project. It is free software, implemented in Rust, developed on Debian, and uses Pandoc and LaTeX for typesetting. The code is hosted on gitlab.com. Subplot verifies its own acceptance criteria. It is alpha level software.

We're looking for one or two volunteers to try Subplot on real projects of their own, and give us feedback. We want to make Subplot good for its purpose, also for people other than us. If you'd be willing to give it a try, start with the Subplot website, then tell us you're using Subplot. We're happy to respond to questions from the first two volunteers, and from others, time permitting. (The reality of life and time constraints is that we can't commit to supporting more people at this time.)

We'd love your feedback, whether you use Subplot or not.

Posted Sat Feb 15 18:48:00 2020 Tags:

I retired from Debian as a developer a year ago. I said then that it was because Debian wasn't fun anymore, but I didn't unpack that much. It's been long enough that I feel I can do that. I should've done it back then, but I wasn't strong enough.

A big part of Debian not being fun is that there's so much hatred in the project. There's people attacking others for who they are, be it women or trans or non-binary. There's people standing up to defend the attackers. Debian is just now going through another bout of that. It's sad and it's disgusting. And it reaffirms that I made the right decision getting out.

People denying other people their humanity, their very right to exist, is something Debian should not tolerate. I think Debian should exclude people who do that from the project. Likewise, people defending the right to deny others their humanity should equally unacceptable.

De-humanizing rhetoric isn't the only reason Debian stopped being fun. Everything else seems irrelevant, though. If people don't want others to even exist, there's no point in discussing minor points like improving a consensus building culture, paying off at least a noticeable part of the technical debt Debian carries from the past quarter century, or smoothing away some of the worst sources of friction in the development process of the project.

Stop the hatred. The good will follow.

Posted Sat Dec 21 10:13:00 2019 Tags:

I made a poll on the fediverse yesterday.

In an international context (e.g., company that works around the globe, or a free software project with participants from several continents), what's the right date format?

The options and results:

  • 80% — 2019-12-13
  • 11% — 19 December 2019
  • 9% — 13/12/2019
  • 0% — 12/13/2019

The one with the name of the month is a different date than the others. That was a typo; mea culpa. Nobody commented on that, though, and I doubt it affected the results.

Here's my commentary. It was a bit of a trick question. Sorry. The first two options are both unambiguous as to which part is day, month, and year. The last two are entirely ambiguous, and require contextual information to interpret correctly. Thus, even though the third option is closest to what I'm used to from my own culture, I think it's utterly unsuitable in an international context.

My own preference is to express the month as a word, or abbreviation, but in many cases being all numeric is easier.

The most important bit is to be clear and unambiguous. Sometimes that means getting used to an unfamiliar notation.

Posted Fri Dec 13 10:38:00 2019

I want to develop free software with people who lift up each other, and aren't arseholes.

A year ago I left Debian. The process is called retiring in Debian, and it's not final: if you do it in an orderly manner, you can come back again, and be re-instated as a Debian developer with a faster, more lightweight process than is used for entirely new developers. This was the third time I retired. The reasons were different than previously.

The first two times I retired because I was pursuing other passions, and did not feel I could give Debian even the minimal attention and effort required to keep a few minor, leaf packages maintained. This time, I retired because Debian was not fun; it was in fact becoming awful, and I didn't want to participate anymore.

Debian had stopped being fun for me in several ways. One was that the tools, file formats, and workflows Debian uses are getting a little archaic, and generally sub-optimal. Even mundane, everyday tasks involved much more friction than they should. Another is that making any large changes in Debian is too much of an effort, these days, partly because of inertia, partly because it involves so many people.

All of that could have been tolerable, if not for the people. Some of the nicest, most competent people I know work on Debian. It has been a privilege and a joy to work with them.

A few of the other people in Debian I don't want to be associated with in any way, any more.

Debian has some vocal people who treat other people in ways that I don't want to accept. I don't want to go into specifics, or names, because that's not going help me move forward.

This is of course not a new thing. Debian has had problems with people behaving badly for years. I may have contributed to that, passively if not actively. However, as I get older, the friction from dealing with abrasive people is sanding off what thick skin I may have had when younger.

As I get older, I am also learning that some of the things I thought were OK when I was younger, are in fact harmful to other people. I don't want to harm other people, and I don't want to participate in a project where some of the people insist on what I think is harmful behaviour, because they feel it's their right.

Long after I left Debian, RMS managed to collapse the reality distortion field that's been surrounding and protecting him for many years. The triggering event for this was comments he made in a context involving Jeffrey Epstein. The comments caused a public uproar, and as a result RMS resigned from role of president of the Free Software Foundation, which he founded. He is currently still the leader of the GNU project. A lot of people are religously defending RMS and attacking his detractors. I find this to be problematic.

LWN has an excellent article on the topic. RMS has been behaving in problematic ways for a long time. He's not been publicly confronted about it before, at the scale he has been now.

RMS has done some awesome things that he should be honoured for. He started the GNU project and gave it, and the world, a vision of being able to use computers whose entire software stack is free, and inventing copyleft, a legal tool to protect software freedom, as well as writing large amounts of the initial code in the GNU project. He has worked hard and long to help drive the vision of freedom into reality. For this, he shall always be remembered and revered.

That doesn't excuse bad behaviour, such as insisting on abortion jokes, making women feel unwelcome in the GNU project, or various other things. I'm not going to make a list of his shortcomings, because this isn't a critique of RMS specifically. The problem I want to discuss isn't RMS or his personal behaviour.

The problem I do want to discuss is that almost everywhere in the free and open source development communities there's a lot of harmful behavour, and tolerance of it.

Harmful behaviour comes in many forms. Some people, for example, say outright that they don't want women involved in free software development. Others attack gay, lesbian, trans, queer, black, old, young, Christian, Muslim, atheist, any other other group of people identified by whatever attribute the attacker happens to dislike. Yet others are more subtle, not attacking directly, but not giving people in the group they dislike the same chance to participate, learn, grow, and generally be the best person they can be in the context of free software development.

This doesn't just harm the groups of people being targeted. It harms others, who see it happen, and think they might be targeted too, later, maybe for some other reason. It harms reaching the vision of software freedom, because it shoves large parts of humanity outside the software freedom movement, robbing the movement from many voices and much effort. This makes it harder to achieve the vision.

Excluding people from the movement for irrelevant reasons also harms humanity in general. It propagates the hate, hurt, and harm that is emblematic of life and politics around the world. While the software freedom movement can't solve all of those problems, we can and should at least not make it worse.

What should we in the software freedom movement do about all this? I've come to a few conclusions so far, though my process to think about this is ongoing.

  • Most importantly, we need to stop being tolerant of intolerance and bad behaviour. It's time for all project, groups, and organisations in the movement to have and enforce at least a minimal level of civil behaviour. We are a movement consisting of many communities, and each community may want or need their own norms, and that's OK. Some norms may even be in conflict. That's also OK, if unfortunate.

    Some people react to this kind of suggestion with hyperbolic claims and conspiracy theories. I don't want to debate them. It's possible to discuss community norms in a civil and constructive way. I know this, because I've seen it happen many times. However, it requires all participants to at least agree that there's behaviour that's unwelcome, and not reject the notion of community norms outright.

  • I am by nature averse to conflicts. I will try to confront bad behaviour in the future, rather than slinking away and going elsewhere. I will at least speak out and say I think something is unacceptable, when I see it.

  • I think the era for dictatorial models of governance for large free software projects is over. For small projects, it's unavoidable, because there's only one person doing any development, but when a project grows, it doesn't work to have one person, or a small anointed group, making all decisions. It's time to have more democratic governance.

    There are technical decisions that probably can't be done well by letting everyone vote on them. However, every substantial project will have other decisions that the whole community around the project should have a say in. Voting on how to fix a bug may not be workable, but voting on the minimum criteria for determining if a bug fix is acceptable is. Should the project require adding a regression test for any bug found in production? Should any such test and bug fix be subjected to code review? Should such bugs and their fixes be documented, even announced, publicly or kept secret? How should security problems be handled?

    In Debian, one of the guiding principles is that those who do, decide. It seems time to involve those who use in the decision making process as well.

I can't force all communities in the software freedom movement to agree with me. Obviously not. I won't even try. I will, however, in the future be wary of joining dictatorial projects where bad behaviour is tolerated. I'm hoping this will have at least some effect.

What about you?

(My blog does not have comments, but you can respond to this fediverse thread: https://toot.liw.fi/@liw/103080486083100970.)

Posted Sun Nov 3 09:43:00 2019 Tags:

Measuring test coverage by measuring which parts of the code are executed tests is not useless, but it usually misses the point.

Tests are not meant to show the absence of bugs, but to show what aspects of a program or system work. (See my previous rant.) If your automated tests execute 90% of your code lines, is that good enough? It doesn't really tell you what is tested and, crucially, what isn't. Those using the software don't care about code lines. They care about being able to do the things they want to do. In other words, they care about use cases and acceptance criteria.

The 10% of code not covered by your tests? If users never exercise that code, it's dead code, and should probably be removed. If those lines keep crashing the program, producing wrong results, causing data loss, security problems, privacy leaks, or otherwise cause dissatisfaction, then that's a problem. How do you know?

Test coverage should be measuring use cases and acceptance criteria. These are often not explicit, or written down, or even known. In most projects there are a lot of acceptance criteria and use cases that are implicit, and only become explicit, when things don't work the way users want them to.

A realistic test coverage would be how many of the explicit, known, recorded use cases and acceptance criteria are tested by automated tests.

Use cases are not always acceptance criteria. Acceptance criteria are not always use cases. Both need to be captured during the development process, and ideally recorded as automated test cases.

Code coverage is most useful when the main user of a piece of code is a developer, usually the developer who writes or maintains the code. In this case, coverage helps ensure all interesting parts of the code are unit tested. The unit tests capture the use cases and acceptance criteria, in very fine-grained details.

Posted Sat Oct 12 16:20:00 2019 Tags: