Feed for Planet Debian.

I first started using Python in 1993. It's been my main programming language since about 2000. I've written a ton of code in Python, both for work and in my free time. For several years now, I've been growing dissatisfied with it. Partly it's because I'd like more help from my programming tools, such as static type checking, better handling of abstractions and code modules, and in general aiding me in writing larger, more complex software. Partly it's because I'm writing more challenging software, and trying to get more out of the hardware I have available. Partly it's because I'm not getting the feeling that the Python community is going in a direction I want to follow. Instead I get the feeling that the Python community is happy to cut corners and compromise on things that I'm not willing to. Which is fine, if it makes their lives better, but leaves me wanting something else.

I wrote Obnam, my backup application, in Python, over a period of about fourteen years, until I retired it a year ago. During Obnam's life, Python 3 happened. Python 3 is actually a good thing, I think, but the transition was painful, and Obnam never made it. Obnam had other issues, which made it less fun to work on; Python 3 wasn't what killed it.

Obnam and other large programs I've written in Python gave me the strong feeling that it's a nice language up to a certain size and complexity of program. Obnam is about 15000 lines of Python code. That turned out to be too much for me in Python: too often there were bugs that a static, strong type system would have caught. I could perhaps have been more diligent in how I used Python, and more careful in how I structured my code, but that's my point: a language like Python requires so much self-discipline that at some point it gets too much.

So over the last few months I've been learning Rust and Go, on and off, in short gaps of free time between other duties. Both have static type systems that can be argued to be strong. Both seem to have decent module systems. Both seem to support concurrency well. Either should be a good replacement for Python for non-small software I write. But I expect to be using Rust for any non-work programming and Go only when work needs me to.

Rust is developed by a community, and was started by Mozilla. Go development seems to be de facto controlled by Google, who originated the language. I'd rather bet my non-work future on a language that isn't controlled by a huge corporation, especially one of the main players in today's surveillance economy. I write code in my free time because it's fun, and I release it as free software because that's the ethical thing to do. I feel quite strongly that software freedom is one of the corner stones for the long-term happiness for humanity.

Anyway.

Ignoring ethical concerns, Rust seems like the better language of the two, so far. It has the better type system, the better compiler, the better tooling in general, and seems to me to have learnt better the lessons of programming languages and tools of the past third of a century. Rust exhibits better taste: things are designed the way they are for good reasons. It's not always as convenient or familiar as Go, but it doesn't seem to make compromises for short-term convenience the way Go does.

Note that I've not written any significant code in either language, so I'm just writing based on what I've learnt by reading. My opinions may change in the future, as I get more into both languages.

Posted Sun Mar 24 10:15:00 2019 Tags:

A while I ago I lamented on Mastodon about wanting a place to discuss technical topics online.

I'm missing an online place to have deep, constructive discussions about technical topics. Parts of Usenet and parts of Debian used to have that for me in the 1990s, and some blogs in the early 2000s, but now everywhere seems to become filled with trolls or people who seem to enjoy shooting down every new idea.

It's possible that my recollections of online discussions from those times are coloured by time, and that they were never really as good as I now think they were. It's possible that if I stepped into my time machine and went back to, say, 1995, and participated in Debian development discussions, I'd be horrified at how bad they were.

That doesn't matter. I still want to have interesting discussions now. Preferably with a sufficiently diverse set of people. I have some of that, with a few close friends, but I'm hoping to widen the experience to more participants. A more diverse pool of opinions and experience would teach me more.

There were many responses to my toot. You can read most them in detail via the link above, although some were private, and if you can read those, I'm going to have to have a look at my laptop's security. My thoughts are:

  • I'm not alone in wanting this. This is not a surprise. It's likely that everyone who wants this isn't interested in the same topics, and won't be satisfied with the same solutions. That's OK. Let's see what we can do together.

  • Dr. Edward Morbius (a pseudonym) has many good points about online discussions. The most fundamental point is to start producing the kind of content one wants more of.

  • Any discussion medium is going to require some shared values among its participants, and these will need to be enforced in some manner. When it's two friends chatting privately, this is not problematic. When it's a whole bunch of random strangers, this gets trickier.

  • An interesting point is whether discussions should be public, and open for anyone to participate. A public/open discussion forum that is successful is likely to grow too big to be manageable and will require much effort to moderate. A private/closed forum is easier, but less likely to gather a diverse group of participants, and thus can thus suffer from a lack new thoughts and ideas. Possibly some other mix than those two is better? I don't know.

  • It seems clear to me that relying on advertising-supported platforms to host discussions is unlikely to be a good long-term strategy. At the same time, anything that requires significant financial or time investment to host is unlikely to survive long-term.

  • It's entirely possible that what I want isn't realistic.

For more concreteness, example of what I'd like to have:

  • practical and theoretical software architecture and implementation topics in general

  • design, architecture, and implementation of specific software solutions to problems I'm interested in, currently CI/CD, authentication, storage of large binary files, and small structured data; obviously, others are likely to be interested in other specific topics, and that's OK

  • a culture of moving the discussion forward, and being constructive, rather than "winning" on points; confrontational and combative attitudes tend to ruin things for me, although they do work for some

  • not having to re-debate things that are fundamental for me, such as software freedom, human rights, and specific technical solutions; I don't mind disagreements, but I don't want to have the same discussion all over again every week or have it be inserted into every discussion on any topic

As far as technical solutions go, there are lots of existing options: IRC, Matrix, private email, mailing lists, Usenet/NNTP, blogs, web forums, the lobste.rs code base, the fediverse, secure scuttlebutt, etc. On the whole, I don't care about the solution, except I'd like it to not require real-time participation, and support an "inbox" style instead of an infinite stream of messages. But that's me.

Posted Sun Mar 17 10:41:00 2019 Tags:

I'm starting a new side project: Yuck.

Yuck is an identity provider that allows end users to securely authenticate themselves to web sites and applications. Yuck also allows users to authorize applications to act on their behalf. Yuck supports the OAuth2 and OpenID Connect protocols, and has an API to allow storing and managing data about end users, applications, and other entities related to authentication.

A preliminary architecture document is at https://files.liw.fi/yuck-arch/ and feedback is welcome.

Posted Sat Jan 12 13:43:00 2019 Tags:

It seems Python is now deprecating the imp standard library module, in favour of importlib. Eventually imp will go away. My Python unit test runner uses imp, so I will have to update it to use importlib. All my Python projects use my unit test runner. I suspect I won't be able to drop all my old Python projects in time (in favour of replacing them with other project, possibly written in Rust) to not have to convert things to importlib. Bummer. I dislike the extra work, but at least Python tends to do standard library transitions like this slowly, with clear deprecation warnings, and rarely, so I won't complain too much.

CoverageTestRunner runs Python unittest test suites under coverage.py, and measures test coverage for a module while running that module's unit tests, but ignoring accidental and incidental coverage from that unit getting called from other units. It fails the suite unless all statements are covered by unit tests, or the statements are explicitly marked as being excluded from coverage. I wrote it and use it because it's an easy way to get my projects into a state where I have high confidence that the code will work if it passes unit tests.

I don't really care about reaching "100% test coverage", but I do like being confident, and this approach means I don't accidentally leave something untested. Python's dynamic typing and general "scriptiness" mean that the rigour imposed by CoverageTestRunner has made me much more confident about changing my own code, and has saved me from numerous bugs.

A strongly, statically typed language with a good compiler, like Rust, or even C, gets much of the same benefit from the type system and the compiler, so there's much less need to aim for high unit test coverage to become confident about the code.

(If you with to comment on this blog post, please do so on Mastodon as a response to my toot. You can join an instance to do so.)

Posted Tue Jan 1 12:18:00 2019 Tags:

For the holidays, you could say thank you to some of the people who write free software you use, especially software that isn't hugely popular.

Those of us who write little-known software may go for months without hearing from a user, and it can be a little de-motivating.

Hearing from someone who actually uses one's software gives an energising jolt that can carry one through several weeks of darkness and cold and wet.

On Debian-based systems, you can check the /usr/share/doc/foo/copyright file for copyright information for package foo. This roughly corresponds with authorship, especially for smaller projects. The file also usually has a link to the home page of the project or person who produced the software, with contact information.

(See also the this same message on Mastodon.)

Posted Sun Dec 16 15:04:00 2018 Tags:

For some time now, I've been using a sort of dependency graph to visualise roadmaps. I strongly dislike the more traditional type of roadmap, where one makes empty promises and sets arbitrary deadlines. My roadmaps have no time elements at all. Instead they aim to show the approximate roads one needs to take to reach one's goal.

Here's an example: I want to have a hosted version of Ick, a CI system. To have that, I need to make a few changes to Ick. Some of those changes require other changes. Some of the changes are independent of each other. I visualise this as follows:

Roadmap to hosted Ick as a dependency graph

The pink diamond-shaped goal is at the bottom. The grey oval is a task that is finished, done: it's kept in the roadmap to show progress. The white ovals are changes I could make now, if I chose to: they do not depend on any other changes. The green oval is the change I've chosen to do next. I develop things in iterations, and for each iteration I choose one change. The pink rectangles are blocked: they can't be done until some other change is done first.

Note that there are many roads to the destination. The map metaphor breaks down here: when travelling in real life, any road that leads to Rome is enough. When doing a project, all roads need to be taken to get Rome built.

I update the roadmap for each iteration. I plan those parts of the roadmap that I expect to do soon in more detail, and leave later parts for later. There's no point in breaking down later changes into small details: things might change enough that the change becomes unnecessary, even if it now seems inevitable, and planning in detail things that get discarded is a waste of effort. Also, too much detail in a roadmap makes it hard to follow.

I don't know if this dependecy graph is a known approach, perhaps with a fancy name, but I doubt I'm the first to think of this.

What do you think?

(For Ick, I then plan each iteration in some detail, and have a planning meeting, where each task is described and estimated, and has acceptance criteria. See minutes of Ick meetings for examples.)

Posted Thu Dec 13 11:37:00 2018 Tags:

I've started the process of retiring from Debian. Again. This will be my third time. It'll take a little while I take care of things to do this cleanly: uploading packages to set Maintainer to QA, removing myself from Plant Debian, sending the retirement email to -private, etc.

I've had a rough year, and Debian has also stopped being fun for me. There's a number of Debian people saying and doing things that I find disagreeable, and the process of developing Debian is not nearly as nice as it could be. There's way too much friction pretty much everywhere.

For example, when a package maintainer uploads a package, the package goes into an upload queue. The upload queue gets processed every few minutes, and the packages get moved into an incoming queue. The incoming queue gets processed every fifteen minutes, and packages get imported into the master archive. Changes to the master archive get pushed to main mirrors every six hours. Websites like lintian.debian.org, the package tracker, and the Ultimate Debian Database get updated at time. (Or their updates get triggered, but it might take longer for the update to actually happen. Who knows. There's almost no transparency.)

The developer gets notified, by email, when the upload queue gets processed, and when the incoming queue gets processed. If they want to see current status on the websites (to see if the upload fixed a problem, for example), they may have to wait for many more hours, possibly even a couple of days.

This was fine in the 1990s. It's not fine anymore.

That's not why I'm retiring. I'm just tired. I'm tired of dragging myself through high-friction Debian processes to do anything. I'm tired of people who should know better tearing open old wounds. I'm tired of all the unconstructive and aggressive whinging, from Debian contributors and users alike. I'm tired of trying to make things better and running into walls of negativity. (I realise I'm not being the most constructive with this blog post and with my retirement. I'm tired.)

I wish everyone else a good time making Debian better, however. Or whatever else they may be doing. I'll probably be back. I always have been, when I've retired before.

Posted Sun Nov 18 18:32:00 2018 Tags:

This is an idea. I don't have the time to work on it myself, but I thought I'd throw it out in case someone else finds it interesting.

When you install a Debian package, it pulls in its dependencies and recommended packages, and those pull in theirs. For simple cases, this is all fine, but sometimes there's surprises. Installing mutt to a base system pulls in libgpgme, which pulls in gnupg, which pulls in a pinentry package, which can pull in all of GNOME. Or at least people claim that.

It strikes me that it'd be cool for someone to implement a QA service for Debian that measures, for each package, how much installing it adds to the system. It should probably do this in various scenarios:

  • A base system, i.e., the output of debootstrap.
  • A build system, with build-essentian installed.
  • A base GNOME system, with gnome-core installed.
  • A full GNOME system, with gnome installed.
  • Similarly for KDE and each other desktop environment in Debian.

The service would do the installs regularly (daily?), and produce reports. It would also do alerts, such as notify the maintainers when installed size grows too large compared to installing it in stable, or a previous run in unstable. For example, if installing mutt suddenly installs 100 gigabytes more than yesterday, it's probably a good idea to alert interested parties.

Implementing this should be fairly easy, since the actual test is just running debootstrap, and possibly apt-get install. Some experimentation with configuration, caching, and eatmydata may be useful to gain speed. Possibly actual package installation can be skipped, and the whole thing could be implemented just by analysing package metadata.

Maybe it even exists, and I just don't know about it. That'd be cool, too.

Posted Wed Oct 24 10:42:00 2018 Tags:

I've been learning Rust lately. As part of that, I rewrote my summain program from Python to Rust (see summainrs). It's not quite a 1:1 rewrite: the Python version outputs RFC822-style records, the Rust one uses YAML. The Rust version is my first attempt at using multithreading, something I never added to the Python version.

Results:

  • Input is a directory tree with 8.9 gigabytes of data in 9650 files and directories.
  • Each file gets stat'd, and regular files get SHA256 computed.
  • Run on a Thinkpad X220 laptop with a rotating hard disk. Two CPU cores, 4 hyperthreads. Mostly idle, but desktop-py things running in the background. (Not a very systematic benchmark.)
  • Python version: 123 seconds wall clock time, 54 seconds user, 6 second system time.
  • Rust version: 61 seconds wall clock (50% of the speed), 56 seconds user (104%), and 4 seconds system time (67&).

A nice speed improvement, I think. Especially, since the difference between the single and multithreaded version of the Rust program is four characters (par_iter instead of iter in the process_chunk function).

Posted Mon Oct 15 10:59:00 2018 Tags:

I don't think any of Flatpak, Snappy, traditional Linux distros, non-traditional Linux distros, containers, online services, or other forms of software distribution are a good solution for all users. They all fail in some way, and each of them requires continued, ongoing effort to be acceptable even within their limitations.

This week, there's been some discussion about Flatpak, a software distribution approach that's (mostly) independent of traditional Linux distributions. There's also, Snappy, which is Canonical's similar thing.

The discussion started with the launch of a new website attacking Flatpak as a technology. I'm not going to link to it, since it's an anonymous attack and rant, and not constructive. I'd rather have a constructive discussion. I'm also not going to link to rebuttals, and will just present my own view, which I hope is different enough to be interesting.

The website raises the issue that Flatpak's sandboxing is not as good as it should be. This seems to be true. Some of Flatpak's defenders respond that it's an evolving technology, which seems fair. It's not necessary to be perfect; it's important to be better than what came before, and to constantly improve.

The website also raises the point that a number of flatpaks themselves contain unfixes security problems. I find this to be more worrying than an imperfect sandbox. A security problem inside a perfect sandbox can still be catastrophic: it can leak sensitive data, join a distributed denial of service attack, use excessive CPU and power, and otherwise cause mayhem. The sandbox may help in containing the problem somewhat, but to be useful for valid use, the sandbox needs to allow things that can be used maliciously.

As a user, I want software that's...

  • easy to install and update
  • secure to install (what I install is what the developers delivered)
  • always up to date with security fixes, including for any dependencies (embedded in the software or otherwise)
  • reasonably up to date with other bug fixes
  • sufficiently up to date with features I want (but I don't care a about newer features that I don't have a use for)
  • protective of my freedoms and privacy and other human rights, which includes (but is not restricted to) being able to self-host services and work offline

As a software developer, I additionally want my own software to be...

  • effortless to build
  • automatically tested in a way that gives me confidence it works for my users
  • easy to deliver to my users
  • easy to debug
  • not be broken by changes to build and runtime dependencies, or at least make such changes be extremely obvious, meaning they result in a build error or at least an error during automated tests

These are requirements that are hard to satisfy. They require a lot of manual effort, and discipline, and I fear the current state of software development isn't quite there yet. As an example, the Linux kernel development takes great care to never break userland, but that requires a lot of care when making changes, a lot of review, and a lot of testing, and a willingness to go to extremes to achieve that. As a result, upgrading to a newer kernel version tends to be a low-risk operation. The glibc C library, used by most Linux distributions, has a similar track record.

But Linux and glibc are system software. Flatpak is about desktop software. Consider instead LibreOffice, the office suite. There's no reason why it couldn't be delivered to users as a Flatpak (and indeed it is). It's a huge piece of software, and it needs a very large number of libraries and other dependencies to work. These need to be provided inside the LibreOffice Flatpak, or by one or more of the Flatpak "runtimes", which are bundles of common dependencies. Making sure all of the dependencies are up to date can be partly automated, but not fully: someone, somewhere, needs to make the decision that a newer version is worth upgrading to right now, even if it requires changes in LibreOffice for the newer version to work.

For example, imagine LO uses a library to generate PDFs. A new version of the library reduces CPU consumption by 10%, but requires changes, because the library's API (programming interface) has changed radically. The API changes are necessary to allow the speedup. Should LibreOffice upgrade to the new version of not? If 10% isn't enough of a speedup to warrant the effort to make the LO changes, is 90%? An automated system could upgrade the library, but that would then break the LO build, resulting in something that doesn't work anymore.

Security updates are easier, since they usually don't involve API changes. An automated system could upgrade dependencies for security updates, and then trigger automated build, test, and publish of a new Flatpak. However, this is made difficult by there is often no way to automatically, reliably find out that there is a security fix released. Again, manual work is required to find the security problem, to fix it, to communicate that there is a fix, and to upgrade the dependency. Some projects have partial solutions for that, but there seems to be nothing universal.

I'm sure most of this can be solved, some day, in some manner. It's definitely an interesting problem area. I don't have a solution, but I do think it's much too simplistic to say "Flatpaks will solve everything", or "the distro approach is best", or "just use the cloud".

Posted Thu Oct 11 10:12:00 2018 Tags: