Bug exchange format

A while I ago, I wrote about some ideas I had for distributed bug tracking. Recently I got to talk about these ideas with Don Armstrong, the current main developer for debbugs, which Debian uses for bug tracking. We cooked up a first draft of a proposal for a bug exchange format, as specified below. Comments very welcome.

Version: 2009-07-26-A
Author: Lars Wirzenius <liw@liw.fi>
Author: Don Armstrong <don@donarmstrong.com>

Introduction

There are many bug tracking systems in the world. They are all sort of similar, but incompatible. This is a problem for Linux distributions and other situations when the same bug needs be tracked by several projects.

As a concrete example, consider a bug in Firefox. It might be reported in Launchpad against Ubuntu, but also affects all other Linux distributions, and of course needs to be fixed by the upstream developers at Mozilla. In order to keep track of the bug across all distributions and upstream and perhaps elsewhere, the bug trackers of each project need to synchronize their information about the bug in some way.

This synchronization would be easier if there was a commonly accepted format for exchanging information about bugs. This way, each bug tracker software could support import and export of that format rather than having to support each other bug tracker specially.

In addition, such a common format might make development of distributed bug tracking systems easier.

Note that we discuss here the exchange format. Each bug tracker may use whatever internal representation they desire. Compare with what RFC822 did for e-mail.

Overview

We consider a bug to conceptually consist of a discussion, with some attachments to various messages, and some metadata added to the entire discussion.

A well-known, well-supported format for representing discussions is the e-mail format. It should have all the features necessary to represent even a complicated discussion. It also supports attachments. We acknowledge that the format is crufty.

We propose that meta data be represented using XML, again because it is well-known and well-supported.

Additionally, we propose that all information about a bug be represented as a Maildir with the metadata in a file inside the Maildir. Maildir is another well-known, well-supported construct, and allows an easy way to examine the discussion part with any e-mail program (e.g.., "mutt -f bug-12345").

Bug meta data

Many parts of bug meta data are shared by all bug trackers, though perhaps with different names. Additionally, there may be stuff that is not commonly supported. XML provides a framework for providing an extensible format for metadata.

An example, showing all common fields that are to be supported by all implementations, plus one optional field that is not part of the bug format spec.

<DOCTYPE whatever>
<bug id="bug://bugs.liw.fi/12765">

    <title>hellopy does not speak English</title>

    <description>The hellopy implementation of the "hello, world"
    application does not speak English.</description>

    <when-submitted>2009-07-26 15:15 UTC</when-submitted>
    <when-updated>2009-07-26 15:15 UTC</when-updated>

    <affects>hellopy</affects>

    <severity project="http://liw.fi/hellopy">minor</severity>
    <severity project="http://packages.debian.org/hellopy">serious</severity>

    <status>open</status>

    <assigned-to>mailto:liw@liw.fi</assigned-to>

    <x-bounty-offered-by>mailto:verybigboss@example.com</x-bounty-offered-by>

</bug>

Some discussion about each element:

bug: This is the root element. The id attribute gives the global identifier for a bug. If the bug gets copied to another bug tracker, the id will be retained.
title: The title of the bug. Meant to be short.
description: A description or summary of the bug. Can be longer than the title. This is different from the original bug reporting message, since it can be updated later. It should represent a summary of the best current understanding of the bug, so that people don't need to read through the entire discussion to find that out.
when-submitted, when-updated: Timestamp for when the bug was originally submitted and when it was last updated. These could be parsed from the discussion, as well, and perhaps should be.
affects: The project or package or whatever that is affected by the bug. Can be used many times.
severity: How bad a bug is it?
status: What is the current status of the bug? This requires some study to see what states are shared between bug trackers.
assigned-to: Who is (or is supposed to be) working on the bug. Value is a URL.
x-bounty-offered-by: A non-standard header. Who knows what it means?

Operation

Let's assume two bug trackers want to keep their information about a bug synchronized. One might be the upstream bug tracker, the other a Linux distribution's bug tracker.

The bug is first filed with the distro's bug tracker. The tracker assigns it an id, and informs the upstream bug tracker about the new bug.

The upstream bug tracker pulls in the information about the new bug, and puts it in its own bug tracker, automatically generating a reference to the distro's bug tracker.

Later, the distro's developers add new info to the bug report, and change its state accordingly. The upstream tracker sees this, and pulls in the changes, and updates its own database to reflect the changes.

Then the upstream developer fixes the bug, and marks it as fixed in the upstream tracker. The distro tracker sees this, and marks it as fixed-upstream.

Eventually the fixed program gets uploaded to the distro, and since the changelog includes a note that the bug has been fixed, the distro's bug tracker marks the bug as fixed in the distro as well.

Everyone is now happy.

Open issues

Markup in text fields, such as description? Markdown? Full or restricted HTML? Plain text only?
Values of severity, status need discussing.
History of changes to metadata?
Should the format be exported as a bunch of files, or a tarball?
Should XML namespaces be used for handling non-standard extension fields?
Would JSON be better than XML?