Terraform has become the nearly ubiquitous way to provision services in a cloud native era. However, when we start to build our infrastructure using Terraform’s as-code approach, there are a few things we need to consider in order to be able to manage these operations at scale, for a diversity of decentralized services, and for distributed teams. At Firefly, we often encounter the challenges of managing IaC at scale as part of our effort to help organizations discover and manage their many cloud assets.
Terraform Management at Scale
To manage your cloud infrastructure at scale, most companies rely heavily on Terraform, combined with an orchestration tool. These tools complement the suite of tools Hashicorp provides, to help get a handle on the many modules, providers, frameworks, and services being provisioned with the sheer scale of cloud operations, as well as a remote backend to maintain the state for your infrastructure.
The companies that choose a fully managed orchestration tool, oftentimes will select Hashicorp’s Terraform Cloud (SaaS solution). However, like all tools that gain widespread popularity, there are many non-Hashicorp alternatives available. Terraform Cloud's benefits are a fully remote backend, native integration with GitHub, State versioning, and advanced features for infrastructure stakeholders, such as platform engineers, DevOps teams and cloud engineers.
However, there is another option that is gaining popularity for large-scale Terraform operations, and that is the GitOps approach for ones who decide to deploy their infrastructure using GitHub Actions, GitLab CI, or other built-in CI pipelines applications. The most popular tool for this use case is Atlantis. Atlantis is a basic solution that integrates automatically with each pull request (PR) and enforces best practices for infrastructure deployments as they are defined in the company policies such as: the code owner, code reviewers, unit tests using tools like TerraTest, among others.
When you choose the GitOps method, this will not come with the managed backend, and therefore this will still be required for those looking to maintain state for their IaC. Terraform currently supports out-of-the-box integration with: AWS S3, GCS, Hashicorp Consul, Kubernetes, and HTTP.
As we all know though, many companies today have chosen to work with Gitlab on-prem, for many reasons, and therefore all of the Github and Github Actions integrations become less relevant with this choice.
Terraform States Using Gitlab Enterprise
Those companies who choose GitLab as their primary source code management (SCM) platform, will also many times choose to deploy their infrastructure using dedicated GitLab pipelines. This leaves us with the question: but what about the Terraform state?
We discovered a new feature for remote backends inside Gitlab. We knew this was just what we needed as a Gitlab shop. However, when we came to try and enable it, we found very little documentation to help us…and so we had to go down the rabbit hole of researching how to configure and setup remote backends with specific requirements dictated in the Gitlab API. We’d like to share with you some of the excellent intel we uncovered.
We’ll start with some of the challenges we immediately encountered. Configuring the Gitlab backend proved itself quite complex, having to understand the Gitlab configuration syntax in-depth and the various S3 configurations to actually get this set up. Once we managed to configure our S3 bucket as the dedicated data store for our Terraform states, we found that these are all encrypted using AES 256 inside the S3 by Gitlab. What this means is that once encrypted, this state is no longer accessible inside Terraform. This requires you to use Gitlab APIs to download them and to be able to use them in your environment.
This is where it gets tricky. So we’ve chosen to deploy and orchestrate our code using Gitlab. Great. Next we want to leverage their new capability of managing state - but this means we can’t actually manage our Terraform State if they are encrypted and not accessible to Terraform.
This adds a particular layer of complexity for this use case, because when you work in the modern engineering format of CI/CD, Gitlab will increase your version number with each deployment, to maintain the log and change history of deployed versions. All of this is fine, and important as an engineering best practice - however this introduces a few gaps when it comes to Terraform state management.
If we take a look at the Gitlab API documentation the way to download the state is as follows:
This means that in order to be able to download the state you have to have a few critical pieces of information:
- Your Access Token
- Your Project ID
- Your State Name
- Your Version Number
Not only does one rarely know the specific name of their deployment, it’s very rare to know the latest version number (Gitlab doesn’t expose this in the UI, only if you hover over the deployment or click on it will you see this number in the URL)––as this is constantly changing with continuous deployment.
In very large scale operations, there are hundreds of environments running Terraform all the time, and news ones constantly being deployed. Not to mention different kinds of environments–– development, staging, production, with all of these having multiple dev accounts. It’s a needle in a haystack.
We felt like we hit a wall. We knew there had to be a better way. We went back to researching.
Gitlab GraphQL API for Terraform State Management
After digging deeper, we found a gold mine. There IS another way.
We found a hidden GraphQL API that reveals all of your Gitlab environments built through GitLab Pipelines which enables you to extract quite simply all of the critical information you will need to be able to download and access the Terraform State.
See it in action - below is the GraphQL code snippet that enables you to query and extract the required data.
This API returns the latest version of all environments in the project.
Using the response, we can download the latest version of the Terraform State leveraging the previously mentioned Gitlab API:
That’s it! It’s that easy.
For anyone using GitLab Enterprise, leveraging GitLab Pipelines, you have what you need to manage infrastructure as code via Terraform with Terraform’s integration with S3. With the GraphQL API access you can access the required info to download your state from storage via the GitLab API.
By adding Firefly, you can gain visibility into cloud infrastructure not yet managed by Terraform, automatically create the Terraform code, and quickly ramp your efforts to manage your cloud infrastructure as IaC. Even the most stringent GitOps practitioners will encounter configuration drift and legacy unmanaged assets - where your Terraform State doesn’t match the underlying cloud configuration. Firefly detects drift, identifies its context, and finds the drift’s dependencies. Firefly integrates with Terraform Enterprise run tasks to help cloud teams cast a cloud safety net and foresee the implications of any change on unmanaged assets before deploying it — helping you move faster without breaking your cloud. Firefly also integrates with GitLab to automatically create a GitLab merge request whenever drift is detected, helping you easily follow best practices and enforce your cloud policies.
See Appsflyer's GitOps conference presentation on how they are using GitLab, Terraform, and Firefly to manage their cloud as IaC.
Image source: Appsflyer