Bug exchange format
A while I ago, I wrote about some ideas I had for distributed bug tracking. Recently I got to talk about these ideas with Don Armstrong, the current main developer for debbugs, which Debian uses for bug tracking. We cooked up a first draft of a proposal for a bug exchange format, as specified below. Comments very welcome.
Version: 2009-07-26-A
Author: Lars Wirzenius <liw@liw.fi>
Author: Don Armstrong <don@donarmstrong.com>
Introduction
There are many bug tracking systems in the world. They are all sort of similar, but incompatible. This is a problem for Linux distributions and other situations when the same bug needs be tracked by several projects.
As a concrete example, consider a bug in Firefox. It might be reported in Launchpad against Ubuntu, but also affects all other Linux distributions, and of course needs to be fixed by the upstream developers at Mozilla. In order to keep track of the bug across all distributions and upstream and perhaps elsewhere, the bug trackers of each project need to synchronize their information about the bug in some way.
This synchronization would be easier if there was a commonly accepted format for exchanging information about bugs. This way, each bug tracker software could support import and export of that format rather than having to support each other bug tracker specially.
In addition, such a common format might make development of distributed bug tracking systems easier.
Note that we discuss here the exchange format. Each bug tracker may use whatever internal representation they desire. Compare with what RFC822 did for e-mail.
Overview
We consider a bug to conceptually consist of a discussion, with some attachments to various messages, and some metadata added to the entire discussion.
A well-known, well-supported format for representing discussions is the e-mail format. It should have all the features necessary to represent even a complicated discussion. It also supports attachments. We acknowledge that the format is crufty.
We propose that meta data be represented using XML, again because it is well-known and well-supported.
Additionally, we propose that all information about a bug be represented as a Maildir with the metadata in a file inside the Maildir. Maildir is another well-known, well-supported construct, and allows an easy way to examine the discussion part with any e-mail program (e.g.., "mutt -f bug-12345").
Bug meta data
Many parts of bug meta data are shared by all bug trackers, though perhaps with different names. Additionally, there may be stuff that is not commonly supported. XML provides a framework for providing an extensible format for metadata.
An example, showing all common fields that are to be supported by all implementations, plus one optional field that is not part of the bug format spec.
<DOCTYPE whatever>
<bug id="bug://bugs.liw.fi/12765">
<title>hellopy does not speak English</title>
<description>The hellopy implementation of the "hello, world"
application does not speak English.</description>
<when-submitted>2009-07-26 15:15 UTC</when-submitted>
<when-updated>2009-07-26 15:15 UTC</when-updated>
<affects>hellopy</affects>
<severity project="http://liw.fi/hellopy">minor</severity>
<severity project="http://packages.debian.org/hellopy">serious</severity>
<status>open</status>
<assigned-to>mailto:liw@liw.fi</assigned-to>
<x-bounty-offered-by>mailto:verybigboss@example.com</x-bounty-offered-by>
</bug>
Some discussion about each element:
- bug
- This is the root element. The id attribute gives the global identifier for a bug. If the bug gets copied to another bug tracker, the id will be retained.
- title
- The title of the bug. Meant to be short.
- description
- A description or summary of the bug. Can be longer than the title. This is different from the original bug reporting message, since it can be updated later. It should represent a summary of the best current understanding of the bug, so that people don't need to read through the entire discussion to find that out.
- when-submitted, when-updated
- Timestamp for when the bug was originally submitted and when it was last updated. These could be parsed from the discussion, as well, and perhaps should be.
- affects
- The project or package or whatever that is affected by the bug. Can be used many times.
- severity
- How bad a bug is it?
- status
- What is the current status of the bug? This requires some study to see what states are shared between bug trackers.
- assigned-to
- Who is (or is supposed to be) working on the bug. Value is a URL.
- x-bounty-offered-by
- A non-standard header. Who knows what it means?
Operation
Let's assume two bug trackers want to keep their information about a bug synchronized. One might be the upstream bug tracker, the other a Linux distribution's bug tracker.
The bug is first filed with the distro's bug tracker. The tracker assigns it an id, and informs the upstream bug tracker about the new bug.
The upstream bug tracker pulls in the information about the new bug, and puts it in its own bug tracker, automatically generating a reference to the distro's bug tracker.
Later, the distro's developers add new info to the bug report, and change its state accordingly. The upstream tracker sees this, and pulls in the changes, and updates its own database to reflect the changes.
Then the upstream developer fixes the bug, and marks it as fixed in the upstream tracker. The distro tracker sees this, and marks it as fixed-upstream.
Eventually the fixed program gets uploaded to the distro, and since the changelog includes a note that the bug has been fixed, the distro's bug tracker marks the bug as fixed in the distro as well.
Everyone is now happy.
Open issues
-
Markup in text fields, such as description? Markdown? Full or restricted HTML? Plain text only?
-
Values of severity, status need discussing.
-
History of changes to metadata?
-
Should the format be exported as a bunch of files, or a tarball?
-
Should XML namespaces be used for handling non-standard extension fields?
-
Would JSON be better than XML?
https://help.launchpad.net/Bugs/ImportFormat
It's not used for distributed bug tracking, rather for importing and exporting bugs to and from Launchpad, as part of a manual process. Someone can produce this format and we can load it into Launchpad to create new bugs.
The Trac to Launchpad Migrator (https://launchpad.net/trac-launchpad-migrator) produces it for example.
We don't use it to update/sync bugs, but there's nothing stopping it from being extended to support that.
I think the protocol used for synchronization of the bugtrackers should be specified. A problem with maildir is that there is no obvious mapping from it to a network transfer.
I see several options here:
<message id=... url=... date=... />). The pulling bugtracker will read the metadata and than read each message it does not have yet from the URL provided. Advantage is the message is not re-downloaded on further synchronizations, disadvantage is that it's many requests.I would consider the second option actually easiest to implement. The exporting tracker will already have the bug data parsed in some structure, so serializing to XML should be easy and many of them already have RSS feed too. And the downloading one would just issue one request and deal with the result. It would also make it easy to download the file manually and upload it to the other tracker later.
Query parameter
since=timestamp could be specified for the main URL. This would cause the messages received before that timestamp to be omited, but tracker that does not implement it would still be conforming, since it would only send unnecessary data, but nothing would fail.The launchpad format is quite similar to what I proposed in previous comment, except:
The idea of basing on Atom would be roughly like:
Added benefits would be:
The original example could look like:
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:bug="http://liw.fi/bug-exchange"> <title>hellopy does not speak English</title> <updated>2009-07-26T15:15Z</updated> <bug:affects>hellopy</bug:affects> <bug:severity project="http://liw.fi/hellopy">minor</bug:severity> <bug:severity project="http://packages.debian.org/hellopy">serious</bug:severity> <bug:status>open</bug:status> <bug:assigned-to> <email>liw@liw.fi</email> </bug:assigned-to> <x:bounty-offered-by xmlns:x="http://example.com/bug-bounty"> <email>verybigboss@example.com</email> </x:bounty-offered-by> <entry> <title>hellopy does not speak English</title> <published>2009-07-26T15:15Z</published> <author> <name>U. Ser</name> <email>u.ser@example.com</email> </author> <content type="text"> hellopy says 'Huplo' where 'Hello' would be expected. I don't know what language that is supposed to be, but defintiely does not sound like English. </content> </entry> </feed>We're working on designing a standard bug representation format using RDF and ontologies, in much the same way as you've been describing (for metadata).
More details at https://picoforge.int-evry.fr/cgi-bin/twiki/view/Helios_wp3/Web/HeliosBtOntology
Looking forward to be able to discuss that with you
I don't think the protocol for transferring things is as important as the format at this stage. Consider the e-mail format (RFC822): it worked for SMTP as well as UUCP.
I also don't think a completely new format is warranted: re-inventing the "e-mails in Maildir" format in something based on XML may result in something powerful, but I'm afraid it will first result in taking a really long time.
It's fairly important to encode all the information in e-mail headers: it can be important to know, for example, who was Cc'd on the original e-mail. Adding all the details to a new format is going to take quite a while.
However, I'm a really strong believer that whoever does the work, should make the decision. Since that's not going to be me, thus I'm only suggesting things here.