Welcome to my web log. See the first post for an introduction. See the archive page for all posts. (There is an english language feed if you don’t want to see Finnish.)

Archives Tags Moderation policy Main site

Me on Mastodon, for anything that is too small to warrant a blog post.

All content outside of comments is copyrighted by Lars Wirzenius, and licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. Comments are copyrighted by their authors. (No new comments are allowed.)


Keyboardio Model100 keyboard: review at one month
Keyboardio Model100 with screwdriver, key cap puller, spare key switches, and other included parts.

I bought a new keyboard and have been using it for about a month now as my daily driver. It’s wonderful.

In 2019 I was given a Keyboardio Model01 keyboard as a gift. I had been curious of it for a while, so this gave me a chance to try one out without spending a big chunk of my own money. It’s a split ortholinear keyboard with custom designed key caps and a wooden enclosure in the shape of butterfly wings. It took me a day or two to start getting used to typing on, and about three months to be comfortably fast. I’m a software developer, and most of my day consists of typing, and so typing comfort and speed is important to me.

The switch to the new keyboard also involved switching to the US keyboard layout from my native Finnish one. I’d been touch typing with the Finnish keyboard since about 1983, so the layout change was also a big change. I switched so that typing program code would be easier. The Finnish layout hides some common characters (especially braces, brackets, and backslashes) behind modifier keys in a way that’s sometimes a little uncomfortable. The Model01 would’ve allowed me to keep using the Finnish layout almost without problems, I just chose not to. (The Swedish letter a-with-ring, or å, was a little difficult on the Model01, but I could’ve lived with it had I wanted to.)

The Model01 quickly became my favorite keyboard. Every time I used my laptop keyboard I found it quite uncomfortable, both mentally (“where is my Enter?!”) and physically (“oh my poor wrists!”). Part of the discomfort was due to the US/FI layout switch, but mostly it was how typing on a laptop keyboard feels, and how the keys are physically laid out, and how my hands and arms have to twist. There’s just no comparison with the Model01. For a while I even carried the Model01 to cafes and on overseas trips, but stopped doing that because it is quite a large extra thing to lug around. It’s easier to type with two fingers.

I’ve recently upgraded to the Model100, buying it from the Kickstarter campaign. It’s nearly identical to the Model01, but the key switches are different. I chose switches that are silent, but tactile, and it’s quite a quiet keyboard. My spouse appreciates the lack of noise. I appreciate that the typing feel is awesome.

The Model100 is without doubt the best keyboard I’ve ever used.

It’s not without issues. The wood enclosure will move with the seasons, like wood does. I had some issues with that with the Model01, so it’s not a new problem. I will cope, but I wish I could get an enclosure without this problem. Possibly one made out of plywood? Or metal?

I also wish the keyboard didn’t taunt me with all the possibilities that come from being able to, and encouraged to, change its firmware. I really want to, but I know it can become an endless time sink for a tinkering geek like myself, so I’m trying to resist.

Rust training for FOSS devs: how did it go?

For the past three Saturdays, I’ve been training half a dozen free and open source software developers in the basics of the Rust programming language. It’s gone well. The students have learned at least some Rust and should be able to continue learning on their own. I’ve gotten practice doing the training and have made the course clearer, tighter, and generally better.

The structure of the course was:

  • Session 1: quick start
    • what kind of language is Rust?
    • the cargo tool
    • using Rust libraries
    • error handling in Rust
    • evolution of an enterprise “hello, world” application
  • Session 2: getting things to work
    • memory management in Rust
    • the borrow checker
    • concurrency with threads
    • hands-on practice: compute sha256 checksums concurrently
  • Session 3: getting deeper
    • mob programming to implement some simple Unix tools in Rust

I am going to run the course again. I’ll give people on the waiting list first refusal, but if my proposed times don’t fit them, I’ll ask for more volunteers. Watch my blog to learn about that. If you want to get on the waiting list, follow instructions on the course page.

If you’d like me to teach you and others Rust, ask your employer to pay for it. I can do training online or on-site. See my paid course page

Linus has just recently merged in initial support for Rust in the Linux kernel. If you’re a professional Linux developer and would like to learn Rust, please ask your employer to fund a course for you and your colleagues.

(I’m afraid I don’t publish my materials: they’re not useful on their own, and there’s a lot of really good Rust learning materials out there already.)

Rust training for FOSS programmers

Do you write code for free and open source projects? Would you like to learn the basics of the Rust programming language? I’m offering to teach the basics of Rust to free and open source software programmers, for free.

After the course, you will be able to:

  • understand what kind of language Rust is
  • make informed decisions about using Rust in a project
  • read code written in Rust
  • write simple command line programs in Rust
  • understand memory management in Rust
  • study Rust on your own

To be clear: this course will not make you an expert in Rust. The goal is to get you started. To become an expert takes a long time and much effort.

For more information, including on when it happens and how to sign up, see my Rust training for FOSS programmers page.

(Disclaimer: I’m doing this partly ad advertising for my paid training: if you’d like your employer to pay me to run the course for their staff, point them at my training courses page.)

Not breaking things is hard

Building things that work for a long time requires a shift in thinking and in attitude and a lot of ongoing effort.

When software changes, it has repercussions for those using the software directly, or using software that builds on top of the software that changes. Sometimes, the repercussions are very minor and require little or no effort from those affected. More often, the people developing, operating, or using software are exposed to a torrential rain storm of changes. This thing changes, and that thing changes, and both of those mean that those things need to change. Living in a computerized world can feel like treading water: it’s exhausting, but you have to continue doing it, because if you stop, you drown.

The Linux kernel has a rule that changes to the kernel must never break software running on top of the kernel. A large part of the world’s computing depends on the Linux kernel: there are billions of devices, on all continents, in orbit, and also on Mars. All of those devices need to continue working. The Linux kernel developers are by no means perfect in this, but overall, upgrading to a new version is nearly a non-event: it requires installing the new version and rebooting to start using it, but everything usually just works. (Except when it doesn’t.)

Linux is not unique in this, of course. I use it as an example because it’s what I know.

Achieving that kind of stability takes a lot of care, and a lot of effort. This is not without cost. Sometimes it prevents fixing previous mistakes. Sometimes it turns a small change into a multi-year project to prevent breakage. For a project used as widely as Linux, the cost is worth it.

Most other software changes less carefully. For example, there is software that implements web sites and web applications: search engines, email, maps, shops, company marketing brochures, personal home pages, blogs, etc. A small number of web sites are used by such large numbers of people that they have a big impact: the people behind these sites take care when making changes. Most sites have little impact: if, say, https://liw.fi/training/rust-basics/ is down or renders badly for some people, it affects mostly just me. That means I can make changes more easily and with less care than, say, Amazon can change its web shop.

Much of the world’s software is code libraries. Applications, and web sites, build on top of those libraries to reduce development and maintenance effort. If a library already provides code to, for example, scale a photo to a smaller size, an application developer can use the library and not have to learn how to write that code themselves. This also raises quality: someone whose main focus is resizing photos can spend much more effort on how to do it well than someone whose main focus is making an email program that just happens to show thumbnails of attached images.

A well-made library that does something commonly useful might be used by thousands, even millions, of applications and web sites.

However, if the photo resizing code library changes often, and changes in ways that breaks the applications using it, all those thousands or millions of applications have to adapt. If they don’t adapt, they won’t benefit from other changes that they do want: say, a new way to resize that results in smaller image files with better clarity. They will also miss out on fixes for security issues.

There is an ongoing discussion among software developers about stability versus making changes more easily. Some people get really frustrated by how hard it can be to get new versions of their software to people who want it. For those people, and their users, the cost of stability is too high. They want something that takes less effort and goes faster, and are willing to instead pay to cost of things changes frequently and occasionally breaking.

Other people get really frustrated by everything breaking all the time. For those people, the cost of stability is worth it.

It’s a cost/benefit calculation that everyone needs to do for themselves. There is no one answer that serves everyone equally well. Telling other people that they’re wrong here is the only poor choice.

v-i version 0.2: non-interactive Debian installer for bare metal machines

I’ve just released version 0.2 of v-i, my non-interactive, fairly fast, unofficial, alternative installer of Debian for physical computers (“bare metal”). It’s what I use to install Debian on my PCs now. I blogged about it previously.

Below is a transcript of me installing Debian to my Thinkpad T480 laptop, from my desktop machine.

$ scp exolobe1-spec.yaml root@v-i:
exolobe1-spec.yaml                             100%  228   190.6KB/s   00:00
$ ssh root@v-i
Linux v-i 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Aug  7 13:10:02 2022 from 10.1.1.54
root@v-i:~# time ./v-i exolobe1-spec.yaml
OK, done

real    1m11.003s
user    0m30.671s
sys 0m8.645s
root@v-i:~# reboot
Connection to v-i closed by remote host.
Connection to v-i closed.
$

The exolobe1-spec.yaml file contains:

hostname: exolobe1
drive: /dev/sda
extra_lvs:
  - name: home
    size: 300G
    mounted: /home
ansible_vars:
  user_pub: |
   ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPQe6lsTapAxiwhhEeE/ixuK+5N8esCsMWoekQqjtxjP liw personal systems

To summarize what happened above:

  • I wrote the installer image to a USB drive, and booted the laptop with that drive
  • I logged into the installer, running on the laptop, using SSH, to run the installer
  • the installer asks no questions: I give it a file with all the information it needs to install Debian
  • the installation command took 1 minute 11 seconds
    • Disclaimer: this is the second install with the same USB drive. The first install takes about four minutes longer, because it runs debootstrap. Later installs use a cache of its output.
  • that time doesn’t include the time to write the USB drive, or to boot the target PC to the installer, or to boot the installed system
  • the installed system is fairly minimal, and does not have, say, a desktop system, a software stack to run web applications, or anything much else that’s actually useful
  • the installed system is, however, accessible over SSH, and can be provisioned with a configuration management system such as Ansible

See https://files.liw.fi/v-i/0.2/ for the installer image I used above, and some other bits and bobs. See git repository to open issues or to contribute.

On home Internet routers

Recently, the power brick for my home Internet router PC failed. It had worked flawlessly for six years. To get something working as soon as possible, I bought a cheap consumer router from a local store. I’d managed to forget how awful they are.

I have a number of computers at home, both physical hardware and virtual machines. They act as servers,. and I need to access them. By name. My undead router PC runs dnsmasq, which provides both DNS and DHCP server, and populates DNS using host names from DHCP requests. This is quite comfortable.

I’ve never had a consumer router do that. I’ve never understood why. They all seem to require manually maintaining that kind of MAC/IP/name mapping, if they provide it at all. Some of the ones then forget them after a week or two.

I know it’s possible to install something like OpenWRT on some consumer routers, and that’s great, when it’s possible. But I don’t like OpenWRT either. For me, it makes too many compromises to fit into minuscule hardware resources.

There’s many router software distributions out there. pfSense is perhaps the best known one. For myself, I tend to just use the Debian Linux distribution, which I know much better than the FreeBSD pfSense uses. I administer it via Ansible, which is how I like it.

After a week of swearing at the consumer router, I replaced it with an old laptop and a USB Ethernet adapter. Installing my Debian based router distribution (Puomi) took a few minutes: I’ve made my own installer that’s fully automated and fast. I then configured the installed system with Ansible to have the exact same setup as on my normal router PC. Keeping configurations in version control and automating installations and deployments feels like a super power.

I’ve demoted the cheap router to a wifi access point and placed it in the part of our home where wifi is most useful. We’d meant to do that anyway.

Now I just need to get a replacement power brick for my six-year-old little fanless PC. It’s surprisingly difficult, even from a store that claims to have over 200 of them in stock.

Posted
Obnam 0.8.0 - encrypting backup program

I’ve just pushed out version 0.8.0 of Obnam, an encrypting backup program. Below are the release notes.

Version 0.8.0, released 2022-07-24

Breaking changes

Breaking changes are ones that mean existing backups can’t be restored, or new backups can’t be created.

  • The list of backups is stored in a special “root chunk”. This means backups are explicitly ordered. This also paves way for a future feature to backups: only the root chunk will need to be updated. Without a root chunk, the backups formed a linked list, and deleting from the middle of the list would updating the whole list.

  • The server chunk metadata field sha256 is now called label. Labels include a type prefix, to allow for other chunk checksum types in the future.

  • The server API is now explicitly versioned, to allow future changes to cause less breakage.

New features

  • Users can now choose the backup schema version for new backups. A repository can have backups with different schemas, and any existing backup can be restored. The schema version only applies to new backups.

  • New command obnam inspect shows metadata about a backup. Currently only the schema version is shown.

  • New command obnam list-backup-versions shows all the backup schema versions that this version of Obnam supports.

  • Obnam now logs some basic performance measurement for each run: how many live files were found in total, backed up, chunks uploaded, existing chunks reused, and how long various parts of the process took.

Other changes

  • The obnam show-generation command now outputs data in the JSON format. The output now includes data about the generation’s SQLite database size.

Thank you

Several people have helped with this release, with changes or feedback.

  • Alexander Batischev
  • Lars Wirzenius
Rust training

I offer a training course in the Rust programming language: Basics of Rust. It’s aimed at corporations and aims to get the participants to a state where they understands the basics, and can learn more on their own. See the training page for details. Contact me if you or your employer is interested.

Posted
Unix command line conventions over time

ETA, 2022-05-19: I’m happy this blog post has gathered a fair bit of interest. However, this post is as much effort as I’m prepared to put into the topic. I think it would be a good idea to write an essay, article, or even a book, on how syntax of the Unix command line has varied over the years, and in different subcultures. Something semi-scholarly with cited sources for claims, and everything. I’d be happy to see this post be used as a basis: the CC license makes that easy. However, such a project would be quite a bit of work that I’m not interested in doing, I’m afraid.

This blog post documents my understanding of how the conventions for Unix command line syntax have evolved over time. It’s not properly sourced, and may well be quite wrong. I’ve not been using Unix until 1989, so I wasn’t there for the early years. Maybe someone has written a proper essay on this, with citations. I’m too lazy to dig them up.

Early 1970s

In the beginning, in the first year or so of Unix, an ideal was formed for what a Unix program would be like: it would be given some number of filenames as command line arguments, and it would read those. If no filenames were given, it would read the standard input. It would write its output to the standard output. There might be a small number of other, fixed, command line arguments. Options didn’t exist. This allowed programs to be easily combined: one program’s output could be the input of another.

There were, of course, variations. The echo command didn’t read anything. The cp, mv, and rm commands didn’t output anything. However, the “filter” was the ideal.

$ cat *.txt | wc

In the example above, the cat program reads all files with names with a .txt suffix, writes them to its standard output, which is then piped to the wc program, which reads its standard input (it wasn’t given any filenames) to count words. In short, the pipeline above counts words in all text files.

This was quite powerful. It was also very simple.

Options

Fairly quickly, the developers of Unix found that many programs would be more useful if the user could choose between minor variations of function. For example, the sort program could provide the option to order input lines without consideration to upper and lower case of text.

The command line option was added. This seems to have resulted in a bit of a philosophical discussion among the developers. Some were adamant against options, fearing the complexity it would bring, and others really liked them, for the convenience. The side favoring options won.

To make command line parsing easy to implement, options always started with a single dash, and consisted of a single character. Multiple options could be packed after one dash, so that foo -a -b -c could be shortened to foo -abc.

If not immediately, then soon after, an additional twist was added: some options required a value. For example, the sort program could be given the -kN option, where N is an integer specifying which word in a line would be used for sorting. The syntax for values was a little complicated: the value could follow the option letter as part of the same command line argument, or be the next argument. The following two commands thus mean the same thing:

$ sort -k1
$ sort -k 1

At this point, command line parsing became more than just iterating over the command line arguments. The dominant language for Unix was C, and a lot of programs implemented the command line parsing themselves. This was unfortunate, but at this stage the parsing was still sufficiently simple that most of them did it in sufficiently similar ways that it didn’t cause any serious problems. However, it was now the case that one often needed to check the manual, or experiment, to find out how a specific program was to be used.

Later on, Wikipedia says 1980, the C library function getopt was written. It became part of the Unix C standard library. It implemented the command line parsing described above. It was written in C, which at that time was quite a primitive programming language, and this resulted in a simplistic API. Part of that API is that if the user used an unknown option on the command line, the getopt function would return a question mark (?) as its value. Some programs would respond by writing out a short usage blurb. This led to -? being sometimes used to tell a program to show a help text.

Long options

In the late 1970s Unix spread from its birthplace, Bell Labs, to other places, mostly universities. Much experimentation followed. During the 1980s some changes to command line syntax happened. The biggest change here was long options: options whose name wasn’t just a single character. For example, in the new X window system, the -display option would be used to select which display to use for a GUI program.

Note the single dash. This clashed with the “clumping together” of single character option. Does -display mean which display to use, or the options -d -i -s -p -l -a -y clumped together? This depended on the program and how it decided to parse the options.

A further complication to parsing the command line was that single-dash long options that took values couldn’t allow the value to be part of the same command line argument. Thus, -display :0 (two words) was correct, but it could not be written as -display:0, because a simple C command line parser would have difficulty figuring out what was the option name and what was the option’s value. Thus, what previously might have been written as a single argument -d:0 now became two arguments.

The world did not end, but a little more complexity had landed in the world of Unix command line syntax.

The GNU project

The GNU project was first announced in 1983. It was to be an operating system similar to Unix. One of the changes it made was to command line syntax. GNU introduced another long option syntax, I believe to disambiguate the single-dash long option confusion with clumped single-character options.

Initially, GNU used the plus (+) to indicate a long option, but quickly changed to a double dash (--). This made it unambiguous whether a long option or clumped short options were being used.

I believe it was also GNU that introduced using the equals sign (=) to optionally add a value to a long option. Values to options could be optional: --color could mean the same as --color=auto, but you could also say --color=never if you didn’t like the default value.

GNU further allowed options to occur anywhere on the command line, not just at the beginning. This made things more convenient to the user.

GNU also wrote a C function, getopt_long, to unify command line parsing across the software produced by the project. I believe it supported the single-dash long options from the start. Some GNU programs, such as the C compiler, used those.

Thus, the following was acceptable:

$ grep -xi *.txt --regexp=foo --regexp bar

The example above clumps the short options -x and -i into one argument, and provided grep with two regular expression patterns, one with an equals, and one without.

The GNU changes have largely been adopted by other Unix variants. I’m sure those have had their own changes, but I’ve not followed them enough to know.

GNU also added standard options: almost every GNU program supports the options --help, --version, and --mail=ADDR.1

Double dash

Edited to add: Apparently the double-dash was supported already in about 1980 in the first version of getopt in Unix System III. Thank you to Chris Siebenmann.

Around this time, a further convention was added: an argument of two dashes only (--) as a way to say that no further options to the command being invoked would follow. I believe this was another GNU change, but I have no evidence.

This is useful to, say, be able to remove a file with name that starts with a dash:

$ rm -- -f

For rm, it was always possible to provide a fully qualified path, starting from the root directory, or to prefix the filename with a directory—rm ./-f—and so this convention is not necessary for removing files. However, given all GNU programs use the same function for command line parsing, rm gets it for free. Other Unix variants may not have that support, though, so users need to be careful.

The double dash is more useful for other situations, such as when invoking a program that invokes another program. An example is the cargo tool for the Rust language. To build and run a program and tell it to report its version, you would use the following command:

$ cargo run -- --version

Without the double dash, you would be telling cargo to report its version.

Subcommands

I think at around the late 1980s, subcommands were added to the Unix command line syntax conventions. Subcommands were a response to many Unix programs gaining a large number of “options” that were in fact not optional at all, and were really commands. Thus a program might have “options” --decrypt and --encrypt, and the user was required to use one of them, but not both. This turned out to be a little hard for many people to deal with, and subcommands were a simplification. Instead of using option syntax for commands, just require commands instead.

I believe the oldest program that uses subcommand is the version control system SCCS, from 1972, but I haven’t been able to find out which version added subcommands. Another version control system, CVS, from 1990, seems to have had them the beginning. CVS was built on top of yet another version control system, RCS, which had programs such as ci for “check in”, and co for “check out”. CVS had a single program, with subcommands:

$ cvs ci ...
$ cvs co ...

Later version control systems, such as Subversion, Arch, and Git, follow the subcommand pattern. Version control systems seem to inherently require the user to do a number of distinct operations, which fits the subcommand style well, and also avoids adding large numbers of individual programs (commands) to the shell, reducing name collisions.

Subcommands add further complications to command line syntax, though, when inevitably combined with options. The main command may have options (often called “global options”), but so can subcommands. When options can occur anywhere on the command line, is --version a global option, or specific to a subcommand? Worse, how does a program parse a command line? If an option is specific to a subcommand, the parsing needs to know which subcommand, if only so it knows whether the options requires a value or not.

To solve this, some programs require global options to be before the subcommand, which is easy to implement. Others allow them anywhere. Everything seems to require per-subcommand options to come after the subcommand.

Summary

The early Unix developers who feared complexity were right, but also wrong. It would be intolerable to have to have a separate program for every combination of a program with options. To be fair, I don’t think that’s what they would’ve advocated: instead, I think, they would’ve advocated tools that can be combined, and to simplify things so that fewer tools are needed.

That’s not what happened, alas, and we live in a world with a bit more complexity than is strictly speaking needed. If we were re-designing Unix from scratch, and didn’t need to be backwards compatible, we could introduce a completely new syntax that is systematic, easy to remember, easy to use, and easy to implement. Alas.

None of this explains dd.


  1. The --email bit is a joke.↩︎