Human vs JSON output formatting: avoid mixing concerns

Two of the original ideas about Unix is that each program should do one thing and that programs should be able to be combine so they consume each others' output. This led to the convention and tradition that Unix command line programs produce output that's relatively easy for other programs to parse.

In practice, this meant that output was line based, one record per line, and columns on a line were separated by white space or other characters that were easy to match on, such as colons. In simple cases this is very easy, and so it's common, but as the world gets more complicated, simple cases are sometimes not enough.

Today, it's a common request today that a Unix command line program should optionally format output in a structured format, such as JSON.

Luckily, this is easy enough to do, in most languages. In the Rust language, the powerful serde set of libraries makes this particularly easy.

However, adding JSON output support to an existing program can be tedious. A very common implementation approach is to mix the logic for figuring out what to output and the logic for how to format the output. If there's only one output format, mixing these concerns is often the simplest path forward. In very resource constrained environments it can be the only way, if there isn't enough memory to store all of the data to be formatted to output at once.

When multiple output formats need to be supported, and it's possible to store all of the output data in memory at once, I prefer to separate the concerns. First I collect all the data to be output, then I produce output in the desired output format.

As an example in the Rust language:

#![allow(dead_code)]

use serde::Serialize;

fn main() {
    let output = Output {
        name: "example".into(),
        values: vec![
            OutputValue {
                key: "foo".into(),
                value: "bar".into(),
            },
            OutputValue {
                key: "yo".into(),
                value: "yoyo".into(),
            },
        ],
    };

    println!("========================================");
    println!("humane output:");
    println!();
    println!("name: {}", output.name);
    println!("values:");
    for v in output.values.iter() {
        println!("  {}={}", v.key, v.value);
    }

    println!("========================================");
    println!("debug output:");
    println!();
    println!("{output:#?}");

    println!("========================================");
    println!("JSON output:");
    println!();
    println!("{}", serde_json::to_string_pretty(&output).unwrap());
}

#[derive(Debug, Serialize)]
struct Output {
    name: String,
    values: Vec<OutputValue>,
}

#[derive(Debug, Serialize)]
struct OutputValue {
    key: String,
    value: String,
}

This is a very simplistic example, of course, but shows how the two concerns can be separated.

I've converted a few programs to this style over the years. The hard part is always teasing apart the data collection and the output formatting. It needs to be done carefully to avoid breaking anything that depends on the existing output format.

In any new programs I write, I separate the concerns from the beginning to be kind to my future self.