What Does Pipelines as Code Really Mean

Mirroring the rise of interest in infrastructure as code, there has been a considerable interest in defining builds and pipelines as code. But unlike with infrastructure as code, we see a rise in the assumption that defining pipelines as code requires the use of a full programming language. This assumption puzzles us.

This post will attempt to:

Break down how we got to build and pipelines as code
Fight the assumption that pipelines as code equates to always configuring pipelines with a general purpose programming language
Call out that storing pipeline config in your team’s project repository is orthogonal to pipelines as code

Infrastructure as code

As infrastructure as code is the “as code” movement the majority of us are most familiar with, let’s take a quick look.

Martin Fowler’s bliki tells us that “Infrastructure as code is the approach to defining computing and network infrastructure through source code that can then be treated just like any software system.” We can keep the code in version control for audit and reproducible builds. As it’s code, we can also test it. Two of the bigger benefits are less stress when rolling out changes as well as reduced mean time to recovery when disaster strikes.

Infrastructure as code is enabled by countless libraries, scripts, and definition files. Configuration is typically via a declarative syntax such as YAML or JSON, a DSL, or a full-fledged programming language such as Ruby. Pick the right tool for the job.

Build as code and pipeline as code would have much the same definition. They are an approach to defining your builds and pipelines in diffable, readable files checked into source control that can then be treated just like any software system.

Build as code

The first CI server, CruiseControl appeared around 2000/01. Interestingly, CruiseControl fully supported build as code. CruiseControl ran a build by simply calling a specified target in a specified ant file that lived in the project’s source repository. The build was defined in code, an ant build file, and stored in the team’s source repository.

The next generation of CI servers, mostly notably Hudson/Jenkins (pre 2.0), quickly moved away from that style of build definition. They fully owned the build configuration, often allowed editing only via a GUI, and stored it outside the team’s source code. This caused a number of problems:

Inability to change code and build configuration in lockstep, forcing a lengthy game of whack-a-mole to get the build, the application, and tests in synch.
Inability to track changes to the build configuration, resulting in non-reproducible builds.
Inability to make build configuration changes atomically, resulting in unintended build executions and failures.
In an environment with central build servers, a team was not in control of its own build.

In 2011 Travis CI changed things. Travis builds could only be defined in a travis.yml file that lived in your source repository. Developers absolutely loved the travis.yml file. No doubt one reason is that it fixed the above list of problems. But we think it might have as much to do with devops culture and infrastructure as code being in vogue by this time.

Lessons from the Travis model were as follows:

Keep the build configuration for a specific project alongside the code for that project in the same version control system
Pick an external form for the build configuration that is plain text and is human readable so that the changes could be tracked and diffed much like the changes to any other part of the code.

Travis CI mainstreamed build as code. Note that we aren’t talking about utilizing a full programming language yet. Travis CI’s build as code was a YAML file.

Pipelines as code

Today, we see deployment pipelines and “CI/CD Servers” are mainstream. And pipelines as code requiring a full programming language is the de facto assumption. In fact, Jenkins has clearly set out to own the term “Pipelines as Code,” including popularizing that the configuration of pipelines should only be done via a programming language.

We can understand why equating pipelines as code with configuration via a programming language has happened. Deployment pipelines have more complex workflows. A typical microservice architecture will require many deployment pipelines that are for the most part identical. These are problems that are in fact well-solved by most programming languages. And “Pipelines as Code” makes for some great marketing.

But infrastructure as code doesn’t demand a programming language. Nor should pipelines as code. A CD tool with deployment pipelines really should support the reuse of build and pipeline configuration, parameterization, conditional execution and other common constructs via simple, declarative definitions. We reject the notion that pipelines as code requires writing imperative code.

Is configuring via a general purpose language wrong?

No! Absolutely not. We just don’t want it to be the starting point. It brings more complexity. More complexity brings more errors. And in the same way that complex Chef repositories need test-kitchen based unit tests to make sure that they are doing the right thing, a complex and programmed build configuration needs tests itself. Quis custodiet ipsos custodes?

GoCD tries very hard to support deployment pipelines in such a way that you only need to declare definitions to define your pipelines. But there are scenarios where GoCD’s limits are hit and then something like Gomatic, allowing configuration of pipelines via Python scripts, is useful. As with everything in software, you need to understand the trade-offs and choose the right tools.

Storing build and pipeline config in the main project repository

Lastly, there is great confusion between the concept of pipelines as code and the concept of storing the configuration in the team’s main code repository. While the two concepts are often in play together, they are in fact separate concerns. That the pipeline is stored in version control is non-negotiable. But storing pipeline config in the team’s main application repository is not a requirement of pipeline as code.

In the simple scenarios, where a single team owns everything, by all means store the pipeline config in your main application repository. But in a more complex system, with multiple repositories, multiple dependencies, or other intricacies, there is not enough information in any of the application code repositories to reason how the full pipeline should work. Here you will want to store your pipeline configuration in its own repository.

GoCD supports pipeline configuration via your application code repository, a repository external to your application code, or a central file that GoCD itself manages under git version control.

Summary

It’s great that we all are demanding builds and pipelines as code. Pipelines as code need not always be via a programming language. Don’t awkwardly force your build and pipeline config into your main source repository when it’s not the right thing to do.

We hope this post helped clear up some of the confusion surrounding pipelines as code. Do let us know by commenting below or tweet to us at @david_j_rice and @badrij.

Thanks to Paul Hammant for pushing us to offer more prescriptive guidance on where to store the pipeline configuration.

What Does Pipelines as Code Really Mean?

Badri Janakiraman and David Rice