Welcome to my web log. See the first post for an introduction. See the archive page for all posts, and comments for a feed of comments only. (There is an english language feed if you don't want to see Finnish.)
All content outside of comments is copyrighted by Lars Wirzenius, and licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. Comments are copyrighted by their authors.
Today was my last day at Suomen Tilaajavastuu, where I worked on Qvarn. Tomorrow is my first day at my new job. The new job is for a new company, tentatively named QvarnLabs (registration is in process), to further develop and support Qvarn. The new company starts operation tomorrow, so you'll have to excuse me that there isn't a website yet.
Qvarn provides a secure, RESTful JSON HTTP API for storing and retrieving data, with detailed access control (and I can provide more buzzwords if necessary). If you operate in the EU, and store information about people, you might want to read up about the General Data Protection Regulation, and Qvarn may be a possible part of a solution you want to look into, once we have the website up.
In January and February of 2016 I ran an Obnam user survey. I'm not a statistician, but here is my analysis of the results.
Executive summary: Obnam is slow, buggy, and the name is bad. But they'd like to buy stickers and t-shirts.
I wrote up a long list of questions about things I felt were of interest to me. I used Google Forms to collect responses, and exported them as a CSV file, and analysed based on that.
I used Google Forms, even though it is not free software, as it was the easiest service I got to work that also seemed it'd be nice for people to use. I could have run the survey using Ikiwiki, but it wouldn't have been nearly as nice. I could have found and hosted some free software for this, but that would have been much more work.
Most questions had free form text responses, and this was both good and bad. It was good, because many of the responses included things I could never have expected. It was bad, because it took me a lot more time and effort to process those. I think next time I'll keep the number of free text responses down.
For some of the questions, I hand-processed the responses to a more or less systematic form, in order to count things with a bit of code. For others, I did not, and show the full list of responses (I'm lazy, we don't need a survey to determine that).
See http://code.liw.fi/obnam/survey-2016.html for the responses, after hand-processing.
For the questions for which it makes sense, a script has tabulated the various responses and calculated percentages. I haven't produced graphs, as I don't know how to do that easily. (Maybe next time I'll enlist the help of statisticians.)
There were 263 responses in total. I have no idea of knowing if the total number of Obnam users is about that, but the number correlates fairly well with the Debian popcon numbers, so I'm assuming Obnam has on the order of a few hundred users total.
A larger number might be more impressive, but it'd also mean that I would be responsible for much more data loss if I make a horrible mistake. That said, it is probably time to start spending some effort on growing the developer base of Obnam.
People seem to hear about Obnam primarily from my blog posts, or by searching the web for backup software. Also, from the Arch Linux or Gentoo wikis, or Joey Hess.
People use Obnam mostly for personal machines, but also at work.
Those who have tried Obnam, but don't use it, rejected it primarily for speed or because it's unstable or buggy. I hope that the bad bugs have mostly been fixed, and I'm working on improving the speed.
People seem to use either the latest version, or the version included in the release of their operating system (e.g., Debian jessie). Other versions are relatively rare.
Most people started using Obnam in the past two years.
People use Obnam on a variety of Linux based operating systems, but also others. Obnam users are especially skewed towards Debian and Ubuntu, which is not surprising, as I'm involved in Debian and have been publicising it there, and provide package for Debian myself.
About half the people have at least hundreds of thousands of files, containing hundreds of gigabytes of data. All extremes (very few or very many files, very little or very much data) are represented, though. A couple of people have at least a hundred million files, or at least ten terabytes of data.
Most people don't have a backup strategy, or at least not a documented one, and if they do, it's not regularly tested.
This isn't a good thing.
Most people had backed up within the past week as of the time of filling in the survey. This hopefully indicates that they back up frequently. Only one respondent said they'd never backed up.
Rather more people hadn't tested their backups, however, with about a fifth of the people having never tested their backup. This is also not good.
Most people only back up one machine to each repository, or at most a few. A total of 17 respondents reported that they don't have a backup, and do not fear clowns.
About half the people back up to a local drive, and nearly two thirds to an SFTP server.
People ask for more remote storage options, such as support for services like Amazon S3.
The things people like most about Obnam are on its list of core features: de-duplication, encryption, and ease of use / simplicity. FUSE is also well-liked, as are snapshot backups.
I didn't tabulate the reasons why people don't like Obnam, but performance and stability seem to be the most common reasons. My favourite response to this question is "the name obnam, does not sounds like a backup program".
Speed is also the pet bug people seem to have.
People seem to generally find Obnam documentation adequate. There's room for improvement, of course.
Nearly everyone finds it easy to get help if they have a problem with Obnam, but almost no-one uses the Obnam support mailing list or IRC channel.
Some people read the NEWS file, others do not. Few have sent patches, but some would like to. There's a bunch of suggestions for new features.
None of this is surprising to me, except perhaps that so many Obnam users actually do read the NEWS file, as it's been my experience in other projects that that's rare.
About half the people have heard of the green albatross. It's the name of the new way in which Obnam will be storing data on disk, which is a big factor in how fast or slow Obnam is. When the green albatross soars, Obnam will fly faster.
People use other backup software as well, which is sensible: no point in having all one's eggs in one basket. The top choices are rsync, duplicity, attic, and rsnapshot, but the list seems to mention most free backup software.
There's some interest in helping Obnam development, either by direct contributions, donations, paying for support or development, or by buying merchandise. Nearly no-one wants a printed version of the manual, but stickers and t-shirts might sell well enough.
A lot of people don't really want to, or are not able to, contribute, especially not by doing things, and that's OK. (They did contribute, however, by filling in the survey.)
When given an opportunity to say whatever they want to Obnam developers, most people say "thank you" in some form or another. This was very heartwarming.
After some serious thinking, I've decided not to nominate myself in the Debian project leader elections for 2016. While I was doing that, I wrote the beginnings of a platform, below. I'm publishing it to have a record of what I was thinking, in case I change my mind in the future, and perhaps it can inspire other other people to do something I would like to happen.
Why not run? I don't think I want to deal with the stress. I already have more than enough stress in my life, from work. I enjoy my obscurity in Debian. It allows me to go away for long periods of time, and to ignore any discussions, topics, and people that annoy or frustrate me, if I don't happen to want to tackle them at any one time. I couldn't do that if I was DPL.
NOT a platform for Debian project leader election, 2016
Apart from what the Debian constitution formally specifies, I find that the important duties of the Debian project leader are:
- Inspire, motivate, and enable the Debian community to make Debian better: to be the grease that makes the machinery run smoother. It is not important that the DPL do things, except to make sure other people can do.
- Delegate what work can be delegated. The DPL is only one person, and should not be a bottleneck. Anything that can reasonably be delegated should be delegated.
- Deal with mundane management tasks. This includes spending Debian money when it makes things easier, such as by funding sprints.
- Represent the project in public, or find people to do that in specific cases.
- Help resolve conflicts within the project.
I do not feel it is the job of the DPL to set goals for the project, technical or otherwise, any more than any other member of the project. Such goals tend to best come from enthusiastic individual developer who want something and are willing to work on it. The DPL should enable such developers, and make sure they have what they need to do the work.
My plan, if elected
Keep Debian running. Debian can run for a long time effectively on autopilot, even if the DPL vanishes, but not indefinitely. At minimum, the DPL should delegate the secretary and technical committee members, and decide on how money should be spent. I will make sure this minimum level is achieved.
While I have no technical goals to set for the project, I have an organisational one. I believe it is time for the project to form a social committee whose mandate is to step in and help resolve conflicts in their early stages, before they grow big enough that the DPL, the tech-ctte, listmasters, or the DAM needs to involved. See below for more details on this. If I am elected, I will do my best to get a social committee started, and I will assume that any vote for me is also a vote for a social committee.
(Note: It's been suggested that this is a silly name, but I haven't had time to come up with anything better. I already rejected "nanny patrol".)
We are a big project now. Despite our reputation, we are a remarkably calm project, but there are still occasional conflicts, and some of them spill out into our big mailing lists. We are not very good, as a project, in handling such situations.
It is not a new idea, but I think its time has come, and I propose that we form a new committee, a social committee, whose job is to help de-escalate conflict situations while they are still small conflicts, to avoid them growing into big problems, and to help resolve big conflicts if they still happen.
This is something the DPL has always been doing. People write to the DPL to ask for mediation, or other help, when they can't resolve a situation by themselves. We also have the technical committee, listmasters, GRs, and the expulsion process defined by the DAM. These are mostly heavy-weight tools and by the time it's time to consider their use, it's already too late to find a good solution.
Having the DPL do this alone puts too much pressure on one person. We've learnt that important tasks should generally be handled by teams rather than just one person.
Thus, I would like us to have a social committee that:
- is reasonably small, like the technical committee
- attempts to resolve conflicts at an early stage
- is delegated by the DPL to have authority to do this
- doesn't necessarily have direct authority to remove people from mailing lists, IRC channels, or expel them from the project, but has authority suggest such measures to the appropriate team
- may do other things, such as educate people on how to resolve conflicts constructively themselves, or deal with chronic conflicts in addition to acute flare-ups
I've been a Debian developer since 1996. I've been retired twice, while I spent large amounts of time on other things. I haven't been a member of any important team in Debian, but I've been around long enough that I know many people, and have a reasonable understanding of how the projects works.
At Debconf15 I gave a talk on topic of having backups be a default service on Debian machines. In that talk, I proposed that we create infrastructure to be included in a default Debian install to manage backups.
I still think this is a good idea, but over the past several months, I've had nearly no time at all to actually do this.
I'm afraid I have to say now that I won't be able to work on this in time for the stretch release. I would be very happy for others to do that, however.
The Debian wiki page https://wiki.debian.org/Backup acts as a central point of information for this. If you're interested in working on this, you can just do it.
I have just released version 1.19.1 of Obnam, the backup program. See the website at http://obnam.org for details on what the program does. The new version is available from git (see http://git.liw.fi) and as Debian packages from http://code.liw.fi/debian, and uploaded to Debian, and soon in unstable.
The NEWS file extract below gives the highlights of what's new in this version. Basically, it fixes a bug.
NOTE: Obnam has an EXPERIMENTAL repository format under
green-albatross. It is NOT meant for real
use. It is likely to change in incompatible ways without warning. Do
not use it unless you're willing to lose your backup.
Version 1.19.1, released 2016-01-30
- The check for paramiko version turned out not to work with
versions 1.7.8 through 1.10.4, due to the
paramiko.__version_info__variable being missing. It's there in earlier and later versions. Lars Wirzenius added code to make the check work if the
paramiko.__version__variable is there. Jan Niggemann provided research and testing.
Survey URL: http://goo.gl/forms/hdoQZKjs80
I am doing an Obnam survey. The goal of this survey is to collect feedback from those who use Obnam, or have tried it, to guide the project in the future.
The survey will run until February 29, 2016.
- Get a feel for the number of people using Obnam, and how they are using it.
- Find out why those who've tried Obnam have chosen to not use it.
- Get input on roadmap planning: what things are wanted most, or least. What is important for Obnam users?
- Get feedback on what's good or bad about Obnam in general.
- Get feedback about the project in addition to the software.
- Get a feel for whether it's worth pursuing business opportunities around Obnam.
All questions in this survey are optional. I do not collect personal information at all. The survey is implemented using Google Forms, and so Google probably collects some information; sorry. You don't need to log in to Google to fill in the survey, though, and I encourage you to use all the privacy protection tools you have.
I hope as many Obnam users as possible fill in the survey.
I have just released version 1.19 of Obnam, the backup program. See the website at http://obnam.org for details on what the program does. The new version is available from git (see http://git.liw.fi) and as Debian packages from http://code.liw.fi/debian, and uploaded to Debian, and soon in unstable.
The NEWS file extract below gives the highlights of what's new in this version.
NOTE: Obnam has an EXPERIMENTAL repository format under
green-albatross. It is NOT meant for real
use. It is likely to change in incompatible ways without warning. Do
not use it unless you're willing to lose your backup.
Version 1.19, released 2016-01-15
Backup no longer ignores a closed SSH connection. This means it won't keep trying to use it, forever. Instead, it crashes and terminates the backup.
The Paramiko SSH implementation, which Obnam uses, changed the interface to the
prefetchmethod in its 1.16 version. Obnam can now deal with either variant of the method. Found and reported by Kyle Manna, who provided a patch that Lars Wirzenius rewrote to be backwards compatible to older versions of Paramiko.
Improvements to the manual:
The manual now has an appendix listing all Obnam errors, with codes and explanations. This will need to be updated manually from time to time.
The manual now has sections on turning on full debug logging and reporting problems.
Improvements to functionality:
- The output of
obnam generationsnow show time zone. Lars Wirzenius implemented based on suggestion by Limdi.
It was a shock to be back in Finland after five years. The country had gone bad.
I was born in Finland, and lived here all my life, until January, 2010. I then moved abroad for five years, with my then-girlfriend, now wife. We lived in New Zealand, Scotland, and England. In October 2014 we came back, because my next job happened to be in Finland.
When we came back, I found that the country had changed, or I had lost my rose-coloured glasses. The Finland I left was a place where equality and solidarity were assumed, though not universal. The country I came back to turned out to be racist, scared, and quickly abandoning everything that used to be examples of why the country is a good place to live. Some of the change has, possibly, only happened during 2015 and the refugee crisis in Europe, but the change started earlier.
- Racist behaviour and violence is commonplace. Those who do not look like the natives are insulted, threatened with physical violence, and sometimes experience it.
- Political violence is becoming commonplace. Refugee centers are burnt down, sometimes with people inside. Government ministers are attacked, albeit by throwing a drink on them.
- Politicians get away with saying and doing clearly racist things, up to and including celebrating neo-nazism.
- The government spends a lot of effort trying to break any power the labour unions have, so that the employers get to do what they want. The whole stabilising system of negotating between government, employer, and labour organisations is being torn down, and labour is being stripped of any power.
- Prominent cabinet ministers talk condescendingly about science, universities, and cut down educational funding. Universities should concentrate on being R&D factories for business, forget about advancing civilisation, or the humanities, or the arts. Unless it's the type of arts that becomes best-selling games.
Now, it is clear that Finland was never the kind of idyllic utopia that I thought it was, back when I was young. The older I get, the clearer it is that I was naive.
The change over five years is still quite remarkable. I can't ever go back home. Now, I am sometimes ashamed of being Finnish.
I just donated to Software Freedom Conservancy as a supporter. They do great, important work in GPL enforcement, and they need some money to continue it.
You can help, too: https://sfconservancy.org/supporter/.
In September, The FUUG Foundation gave me a grant to buy some hardware for Obnam development. I used this money to buy a new desktop-ish machine, see below for details. It's sat in a corner, and I use it as a server: it's not normally connected to a monitor or keyboard. It runs Obnam benchmarks. Before this, I ran Obnam benchmarks and experiments on my laptop, or on BigV virtual servers donated by Bytemark.
- CPU: Intel Core i7 4790K 4.0 GHz (4 cores, total of 8 hyperthreads)
- RAM: Kingston HyperX Beast 32 GB
- Mainboard: 46400 Asus H97M-Plus Intel H97 LGA 1150 micro-ATX
- SSD: Samsung 850 EVO SSD 120 GB
- HDD: 4 x WD Red 4 TB
- PSU: Corsair CX750M ATX
- Case: BitFenix Phenom micro-ATX
Not to brag, but it's a nice machine. Much more power than my 2012 era laptop.
The SSD is the system drive, the HDDs are for running Obnam benchmarks on. The HDDs are not RAIDed. Each drive is a PV for LVM2. All the data on those drives is scratch data: it's not valuable, and I do not care if it is lost. In fact, most of the data gets created and deleted during a benchmark run, and usually the disks are empty. The SSD contains the host operating system, and the virtual disks for all the virtual machines.
I assembled the machine myself, with the help of a friend, and installed Debian jessie on it. The Debian installation is pretty bare bones, just enough to run and manage a bunch of virtual machines using libvirt and ansible. All the actual work, including benchmarks, are run in virtual machines.
|date||many files||one big file|
In a bit over two months, I've made some significant progress, I think.
The two benchmarks that I currently run are:
- A million tiny files, containing a single, random byte.
- A single, 10 GB file.
These are two extreme cases of what a backup needs to deal with: either the file metadata, or its content. They incur different costs for a backup program. Thus, two benchmarks.
In both cases, the benchmark consists of an initial backup, a restore, and a second backup, without changes to the live data. The second backup is an extreme case of what backups usually do: most data usually doesn't change, so keeping that in mind for optimisation is important.
The above benchmarks are synthetic: they use data that's generated by a program (genbackupdata), so that they can be reproduced. Synthetic benchmarks are useful, especially for looking at particular aspects of a program's operation for optimisation. However, they do not necessarily reflect how a program behaves in actual use.
I also run, by hand, experiments with real data. I have a snapshot of our home file server and my laptop on the benchmark machine. The snapshots are static, and do not get updated. I experiment by running Obnam backups manually, the initial full backup and a no-change incremental one. In early October, I couldn't finish the initial full backups. They took too long, more than a week. Now I can finish them in about a day. This remarkable change is not evident from the synthetic benchmarks.
In numbers: 572986 files in the live data, containing 4.5 TiB. Initial backup, about 18.5 hours. Incremental backup, 4m13s. This is from a local disk to a local disk.
In addition to these, I've run numerous experiments on the new machine. These would have been much less easy to run on my laptop, and so I probably wouldn't have. Running benchmarks was always painful on my laptop, since it does not have the necessary disk space, and I'd really rather like to use it for other things.
Thanks to the benchmarks and experiments I've been able to take the in-development version of Obnam from being quite impractical for real use to being in experimental use for real data. I now use the new version as my primary backup of my laptop, with two secondary backups (with the old Obnam version, and rsync) in parallel. This would not have happened this year without the extra hardware.
In addition to the Obnam work, I've used the new machine to develop a test suite for vmdebootstrap.
My actual development still happens on my laptop, except for things that are heavy enough to be slow on the laptop. I've made sure I can do most development purely on my laptop, while offline, including running a CI system, and testing things on two architecture and three releases of Debian. I do not want my development to be dependent on incidental things such as network access, unless I'm doing things that by their nature depend on the network, such as publishing changes or releases.
For more, see the archive.