Security isolation in CI engines

A continuous integration engine (CI) takes the source code for a software project and ensures it works. In less abstract terms, it builds it, and runs any automated tests it may have. The exact steps for that depend heavily on the CI engine and the project, but can be thought of as follows (with concrete examples of possible commands):

retrieve the desired revision of the source code (git clone, git checkout)
install build dependencies (dpkg-checkbuilddeps, apt install)
build (./configure, make)
test (make check)

This is dangerous stuff. In the specific case of an open, hosted CI service, it's especially dangerous: anyone can submit any build, and that build can do anything, including attack computers anywhere on the Internet. However, even in a CI engine that only builds projects for in-house developers, it's risky: most attacks on IT are done by insiders.

Apart from actual attacks, building software is dangerous also due to accidents: a mistake in the way software is built, or automatically tested, can result in what looks and behaves like an attack. An infinite loop can use excessive amounts of CPU resources, or block other projects from getting built.

I've been thinking about ways to deal with this, in the context of developing a CI engine, and here's a list of specific threats I've come up with:

excessive use build host resources
- e.g., CPU, GPU, RAM, disk, etc
- mitigation: use quotas or other hard limits that can't be exceeded (e.g., dedicated file system for build, virtual machine with a virtual memory limit)
- mitigation: monitor use, stop build if use goes over a limit, if a quota is infeasible (e.g., CPU time)
excessive use of network bandwidth
- mitigation: monitor use, stop build if it goes over a limit
attack on a networked target via a denial of service attack
- e.g., build joins a DDoS swarm, or sends fabricated SYN packets to prevent target from working
- mitigation: prevent direct network access for build, force all outgoing connections to go via a proxy that validates requests and stops build if anything looks suspicious
attack on build host, or other host, via network intrusion
- e.g., port scanning, probing for known vulnerabilities
- mitigation: prevent direct network access for build, force all outgoing connections to go via a proxy that validates requests and stops build if anything looks suspicious
attack build host directly without network
- e.g., by breaching security isolation using build host kernel or hardware vulnerabilities, or CI engine vulnerabilities
- this includes eavesdropping on the host, and stealing secrets
- mitigation: keep build host up to date on security updates
- mitigation: run build inside a VM controlled by CI engine (on the assumption that a VM provides better security isolation than a Linux container)

I'm sure this is not an exhaustive list. If you can think of additional risks, do tell me.

My current plan for mitigating all of the above looks as follows:

there are two, nested virtual machines
the outer VM is the manager, the inner VM is the builder
the manager creates, controls, monitors, and destroys the builder
the outer VM is probably Debian Linux, since that what I know best, using libvirt with Qemu and KVM to manage the inner VM
the inner VM can be any operating system, as long as it can run as a Qemu/KVM guest, and provides ssh access from the outer VM
the manager runs commands on the builder over ssh, or possibly via serial console (ssh would be simpler, though)
both VMs have a restricted amount of CPUs, RAM, disk space
the manager monitors the builder's use of CPU time, bandwidth use
the manager proxies and firewalls all outgoing network access to prevent any access that isn't explicitly allowed

To look at the build steps from the top of this article, they would work something like this:

retrieve the desired revision of the source code: the builder does this, but proxied via the manager, which checks that only from servers listed as allowed for this project are connected
install build dependencies: the builder downloads the build dependencies, but proxied via the manager, which checks that downloads come only from servers listed as allowed for this project
build: runs inside the builder
test: runs inside the builder

It would be awesome if the manager could disable the builder from having network access after build dependencies are installed. This would be feasible if the build recipe is structured in a way that allows the manager to know what part is doing what. (If I'm designing the CI engine, then I can probably achieve that.)

It would be even more awesome if the manager could do all the downloading, but given the guest may need to use tools specific for its operating system, which might not be available on the operating system of the manager, this might not be feasible. A filtering HTTP or HTTPS proxy may need to be enough.

What threat am I missing? Are my mitigations acceptable?

If you want to comment on this blog post, please send me email (liw@liw.fi), or respond on the fediverse on this thread. Thank you!