Disclaimer: There is a huge discussion going on right now in the debian-devel mailing list, about different ways to rearrange how Debian development happens, and maybe to provide a Debian users with an additional release. This blog entry is not a comment on that discussion, which I am ignoring. The discussion did, however, prompt a dent about "test driven development". That prompted a short exchange of thoughts with Tom Marble, and this blog post is a cleaned up version of my part in that discussion.

Currently Debian development happens roughly like this: all packages are uploaded to a part of the Debian archive called unstable. Once they've been in use for a while without serious problems, they get automatically copied into another part called testing. Every year or two, testing is frozen and any remaining problems are fixed, after which all of testing gets copied into yet another part of the archive called stable, and that's the actual release.

There is a little bit of automatic testing, but only a little. Almost all testing is done by people, by them using unstable or testing on their usual computers. If they have problems, they report them.

Contrast this with development using Test Driven Development, or TDD, and other modern development methodologies. Here's a rough summary of what I do when I write (much) of my software.

  • first write one or more automatic tests (unit or functional ones)
  • write the actual code, enough to make all tests pass
  • add more tests
  • write more code, or change existing code
  • repeat this until tests describe all the behavior desired of the code

In addition, I measure coverage to make sure all parts of the code gets tested. I usually aim for 100% coverage, except for those parts that very hard or quite pointless to test. (That's easier to achieve than you'd think, but that's the topic of another blog post.)

This all sounds like a lot of bureaucratic nonsense, but what I get out of this is this: once all tests pass, I have a strong confidence that the software works. As soon as I've added all the features I want to have for a release, I can push a button to push it out. (Well, actually there's a little bit more to it.)

This does mean I never have bugs in my software. Of course I do. However, there's a lot fewer of them. In fact, there's so few of them, after tests pass, that I would almost be happy to make an automatic release after every successful run of "make check".

Another aspect of the way I do development is distributed version control. The relevant feature is powerful branching and merging. Most things I develop happen in short-lived single-purpose branches: whenever I start work on a feature or bugfix, I create a branch for it. (Unless I'm feeling particularly lazy, which happens more often than you'd think.)

When I've finished the feature, or fixed the bug, I merge the branch back into the trunk branch.

This way, the trunk is always in releaseable state. It might not have all the features I want for the next release, but the features that are there, do work. The branch probably has bugs, but if I've written tests that are good enough, I know the software works well enough.

Or that's the idea. Sometimes things go wrong. Then I write more tests and the next time goes better. (It doesn't have to be perfect, as long as it gets better every time.)

See the contrast between automatic tests with good coverage, and the Debian style of relying on user feedback? There's no need to wait for user feedback with automatic tests. This speeds up development, makes releasing easier, but most importantly takes away any reason to fear making changes, even big changes.

Automatic tests and good test coverage are easy achieve in small projects. For a system as huge as Debian, good test coverage is quite hard to achieve.

The thing about automatic tests, though, is that even a little bit of it is helpful, and after you have a few tests, it gets easier to add more. The first test is the hardest to get done, since you need to set up all the infrastructure for it.

Debian does do a bit of automatic testing already. (See lintian, piuparts, autopkgtest, edos, etc.) I don't want to belittle that, but I think we could do better.

Here's what I would love to see:

  • we have a part of the archive that corresponds to a trunk branch, i.e., it is always in a releasable state (the "testing" area was originally meant to be that; we could make it so now)
  • releaseable state is determined by two things:
    • an automatic test suite
    • user feedback, particularly in the form of bug reports (as now)
  • whenever changes are made, they happen in "branches"
    • the branch is not affected by changes elsewhere in the archive, except by manual synching ("merge from trunk")
    • individual package uploads, as well as groups of packages such as for transitions, are each in their own branch
    • this is sort of similar to a PPA, but more powerful
  • when a branch is to be merged into trunk, the automatic tests must first pass (or at least no new failures in them can be introduced)
    • tests can also be run for any other branch, of course, so that those developing the branch know if they're ready to push their changes into trunk
  • there's a culture of writing tests for bugs (whenever that is feasible), and for new features
    • particularly release goals should be expressed as automatic test suites
  • there's a culture of sharing tests with upstreams, and with other distributions

Since full test coverage is going to be impossible for Debian, some subset should be targeted. Perhaps something like this would suffice to start with:

  • a version of debian-installer is generated from a branch
  • test installation of a new system from scratch
    • with a large set of tasks selected
  • test upgrade from previous stable release to the branch
    • with a large set of packages installed (as many as possible, actually)
  • test upgrade from trunk branch to branch to be merged
    • ditto large set of packages
  • test with lintian, reject on specific tests failing
  • test with piuparts
  • test the whole system for specific functionality
    • ssh access from outside
    • sudo access for a logged in user
    • sending mail with SMTP
    • web server
    • possibly test specific web applications
    • possibly test Samba and NFS services
    • possibly test printing services (CUPS)
    • possibly test essential desktop functionality (automatic login, at least)
  • if package comes with package specific tests, run those too

These tests would not guarantee that a set of changes would not break Debian, but they would give a high confidence that at least basic stuff would still work after the changes.

Now, obviously implementing all of this I'm dreaming of (it is just past midnight, after all) is going to be impossible. There's way too much work to do, there's not enough tools for writing tests, and it would require too much computing power to run, and so on and so forth. But it's late, and I've had a bad day, and I might as well dream of something nice.

Anyway, anyone interested in these things should perhaps help drive DEP8.

:)
Nice summary.
Comment by Bryan Alberto Wed May 4 02:32:11 2011
TDD

Would you be willing to write a planet article/tutorial on test-driven development? I would expect very basic examples but more in-depth explanation of the process and how it is integrated into your work flow.

Alternatively, if you know of a good site with a solid explanation of this could you please link it? I have been learning about software development and am really interested in TDD but haven't had time to research a good approach.

Thanks

Comment by Carl Wed May 4 16:11:04 2011

I'm all for defining the features of your software by the tests that it is working right. But adding tests after the fact will always be extra hard: it isn't really an itch-to-scratch, which defines open source and free software work. It will also rely on capturing the intended functionality in some kind of documentation, first of all, and second, later editing the documentation so that it matches up with test cases and so makes for easy-to-build and maintain test scaffolding. I don't know of a good book recording guidelines or patterns to follow, it can be a huge time-sink: I have spent time in development teams where 85% of the refactoring work is massaging an ill-planned test framework to add new features. (I think that had a lot to do with highly coupled tests and little clear definition of the interfaces to the test framework.)

To summarise TDD for someone who's not done it: TDD is about the feedback loop between building a feature and knowing that the feature is doing its stuff right.

  • Describe the functionality
  • Design tests which tell you that you have the required functionality
  • Rejig the tests to work well together and to have good separation and flexible layering
  • Aim for lots of small atomic tests of units of functionality, a moderate amount of larger tests of how components integrate together and few end-to-end whole-system tests
  • Constrain the number of ways you can actually implement the functionality and still satisfy your tests
  • Pick the design choices which will make it easiest to both extend your functionality and maintain your tests
  • Finally, implement the code
  • Post-finally, add the extra test cases which match up with the funky implementation details you had to pick, whether that comes from the environment you're working in or the way you chose to solve the problem at the core of your program (and note that these are ephemeral and can be removed or ignored when you change the design of your program or the layout of your tests)

The goal of the process is to have an always-working piece of software; the tests provide near-immediate feedback and trouble-shooting pointers. Obviously, you have to maintain your test scaffolding, and it can grow into being more than 90% of your code-base, but if you don't check in changes which break tests and work as hard to design your test suite as your core program then you'll benefit from test-driven development.

(seen from Planet Debian; massive props to liw)

Take care. K3ninho

Comment by ken Wed May 4 20:21:30 2011

Carl, I think a comprehensive tutorial on TDD would be a bit more than I have time for, and writing something quick would be bad. The Wikipedia page on TDD is a reasonable place to start with. You could also check the c2 wiki.

Comment by Lars Wirzenius Thu May 5 10:49:32 2011

Ken, I entirely agree that adding tests afterwards is hard. I've done it, and for small and medium sized projects it is sometimes easier to start over from scratch.

For Debian, most relevant tests would actually be integration tests, and those are easier to do afterwards, I find. But still more work and less fun than doing them beforehand. It's not something we can avoid at this stage, alas.

Comment by Lars Wirzenius Thu May 5 10:51:37 2011
dkg

Daniel wrote an excellent post about TDD: https://www.debian-administration.org/users/dkg/weblog/80. Go read.

Comment by Lars Wirzenius Thu May 5 10:52:44 2011

Here's the problem. TDD assumes three things in order to even remotely make sense:

  • The intended behavior of the software is known up front, so the the tests can be written. Even limiting yourself to just packaging bugs (i.e., ignoring upstream bugs), this isn't a trivial assumption. How often do FTBFS bugs get filed? If they're happening regularly, they tell you that the packager doesn't understand what's supposed to be happening during the build (or is being careless). Either way, it goes against this assumption.

  • Writing the test is easier than writing the code itself. This is frequently the case, but not always the case. Consider writing a complete sets of tests for the C++ STL vector. You'll end up with more code in the tests than you will in the actual implementation and some semantics are particularly difficult to test reliably. Another example: anything to do with hardware. Writing tests for grub packaging and functionality will be difficult, even assuming a VM infrastructure in place and limiting your tests to that infrastructure.

  • The tests can be automated in a meaningful or worthwhile fashion. Again, see the grub example. Also, in general, consider software that does heavy numerical analysis: a mere change in the output doesn't ipso facto mean a bug has been introduced or the test should fail. Someone has to do the work to determine whether the change is significant or not.

I have big problems with 1 and 3 for testing software packaging. I'm not really convinced they can be done generally, or that they'll vastly improve the quality of the software. Someone needs to make the case that a large portion of the bugs found under the current regime would be found under a more thorough automated testing regime (TDD) or not, by actually demonstrating a methodology that would have found them before they were filed that's applicable to the majority of packages in the system and to the packages that most frequently have bugs filed. An empirical argument is fine, there's tons of empirical data for Debian. IOW, don't tell me what the virtues of TDD and/or automated testing are.

Also, when you say 100% coverage, what sort of coverage are you talking about? If you meant function or statement coverage, then I'm afraid you're not saying anything compelling and might actually be harming your own position. Function and statement coverage isn't really interesting, and I hope we all learned that in digital design or another digital logic class.

Comment by Adam Sat May 14 14:39:49 2011

Adam, that's an interesting attempt at dismissing the whole idea of test driven development. I'm not going to try to convince you that you're wrong. I'll just point out that the fact that some software is hard to test automatically does not mean that other (most) software can't or shouldn't be.

I have a fair bit of experience applying TDD, and testing invididual packages in Debian, and testing entire installations, and I know, from that experience, that doing TDD is entirely possible, at the package level, and I claim also at the distro level. However, doing it at the distro level needs some futher tool development to work well. Test coverage won't be complete, but that's OK.

The central point of my rather rambling blog entry is actually not TDD specifically, but doing development in isolated branches and applying automated acceptance tests before branches are merged to the mainline (i.e., "testing"). In other words, before accepting an upload (or group of uploads, such as from a transition) into "testing", automatically answer at least the question "does this break anything essential in the system that will make recovering from it hard".

As an example, such an automatic test suite would almost certainly caught bug 626450, which broke libc6, making it impossible to run dynamically linked software anymore. Catching this bug would not even have required any tests specific to libc6, since a test consisting of "install Debian, upgrade to this package, reboot, do ssh logins still work?" would have caught that.

(I shouldn't have written so much about TDD itself. It obscured things. I'm sorry: I write badly when I'm tired.)

Comment by Lars Wirzenius Sat May 14 17:44:05 2011

It's not the difficulty in writing and performing tests that's a problem per se; though callously dismissing it is pretty arrogant since if testing something as simple as a dynamically growing array class is non-trivial (try it if you don't believe me), there is zero reason to believe testing a whole application is easy. Even when performing the test is trivial, figuring out what to test, writing the test, and maintaining the test is still very hard.

However, do you really think any of my statements are untrue? How are you supposed to write tests if you don't know what your application is supposed to do? What's the value of writing the tests, especially upfront, if the tests are more complicated than the code itself? What's the value of writing the tests in code if the code cannot be executed automatically? Surely, if these things aren't fundamental, you'll be able to answer these questions with only marginally more effort than simply handwaving them away as you've done thus far.

Ok, so you want to make sure broken packages don't get into the system, that's a laudable goal. It goes without saying that automated testing can probably with that, though it's unclear if the value proposition exists. It's even less clear TDD principles can help with this.

You cite as an example the recent /lib64 breakage and give an automated way to avoid that problem in the future. However your solution is convoluted: it relies on another package, meaning there's a wide chance for false positives: what if SSH is installable but broken on its own, or what if a bug in multiarch causes a 32-bit version to be installed instead of 64-bit? Holding up something like libc because ssh is broken is quite ugly, IMO. As such, hopefully we can agree this is problematic.

The simplest solution to that particular bug is to write a test that merely tests for the /lib64 link. And therein lies the problem: that's a one line test for a one-line piece of code! If the developer forgot the line of code in the first place, why should I believe they won't forget the test (regardless of when they write it)?

Of course, that solution is problematic in the general case, since the mere present of the link won't ensure the library actually works. So we need to test the library with something simple like /bin/true. Of course, it has to be totally private, to avoid the dependency problems with your solution. So now we have a few dozen lines of code, plus the supporting build and harness stuff. That's pretty ugly too, it seems like there is no good solution here. As I said, writing tests is hard, even for trivially simple stuff like "Make sure ld.so can find libc"!

Plus, you want people to write these tests upfront (if you want TDD), but how are you going to enforce that? Who's going to make sure the tests are present and the tests are correct? As you said yourself: "Most things I develop happen in short-lived single-purpose branches: whenever I start work on a feature or bugfix, I create a branch for it. (Unless I'm feeling particularly lazy, which happens more often than you'd think.)" This is the really, really hard part of your proposed solution. For little things, developers are likely to not bother; yet little things can lead to rather large bugs!

Branching has another problem too: fewer people will test each branch, which means we should actually expect the number of bugs in the trunk to go up, not down, unless test coverage is very, very good. How are we going to assess coverage? You claim: "Then I write more tests and the next time goes better. (It doesn't have to be perfect, as long as it gets better every time.)" but that's not true in this case: if your proposed solution isn't better than the status quo, there's no reason whatsoever to abandon the status quo.

Now you see why I have my doubts about automated testing really helping. I don't believe it's impossible, I just believe it's extremely hard and that the cost/benefit argument is likely unfavorable in the least. I'd be less pessimistic if I actually saw, for example, a working lists of tests for the 50 most common Debian bugs and that list showed the tests were applicable to a large set of packages. I'd also be pessmisistic if I had a detailed description of how this branching system is supposed to work and actually keep broken packages out, considering the dependencies between packages. Consider your own example: if your SSH test fails, do we hold up just libc6? libc6 and ssh? How do we do that if the two are developed in multiple branches? Accept nothing from the libc6 branch unless it's using the latest and greatest of everything from the trunk? How does that work with things like GNOME, mono, and KDE? How does it work when packages are removed, renamed, and so forth?

Comment by Adam Sat May 14 19:42:07 2011
There are a number of existing pieces in testing distributions but mostly from the embedded side. Maemo uses one (though proprietary?), Meego has a bunch of tools for this, Linaro is developing tools to test various distrubution builds like Ubuntu and Android. And I bet Ubuntu uses something as well as RedHat and Fedora too. IMO, what is needed is merging these activities together. In package build systems quite a few are turning to OBS, I'd like to see Debian try it too some day. And next step should be the test automation machinery. An article to lwn.net could kick things off pretty well...
Comment by mikko.rapeli Thu Jun 9 20:08:24 2011