How to Identify and Remediate Cloud Configuration Drift (and Implement Best Practices for Prevention)

By Firefly

In this blog, we discuss configuration drift and identify its root causes. Then, we explore why it’s important to spot it early, and how you can prevent and remediate configuration drifts in your infrastructure — with Firefly.

Drift detection

Explore the resource

Keeping your cloud infrastructure secure and stable, including your Google Cloud security measures, is important, but as it grows, it becomes more unmanageable. You might change an EC2 instance using the AWS CLI to test out your solution without automating the changes. Due to even small changes like these over time, your current cloud setup can ultimately drift away from what’s defined in your IaC code: creating security risks and unexpected issues that might affect your infrastructure.

In this blog, we’ll discuss configuration drift and identify its root causes. Then, we’ll explore why it’s important to spot config drift early and how you can prevent and remediate configuration drifts in your infrastructure.

What is Configuration Drift?

Cloud Configuration Drift, also referred to as cloud drift, happens when the actual configuration of your cloud resources no longer matches the defined configuration in your IaC config, such as .tf files while using Terraform. For example, the security group settings of an EC2 instance are changed to open additional ports using AWS UI or console, which creates a configuration drift example. This kind of change could expose your infrastructure to security risks by allowing unauthorized access.

And since it is not a change made using IaC, you can’t track it.

📹 Watch for an engineer's point of view on why drift happens, and how to remediate ↓

Cloud infrastructure drift can happen when you are adjusting an S3 bucket's access policy or changing instance types to meet a testing need directly using the AWS console, even though doing so seems like a great way to save time.

Eventually, these small changes can build up, making it harder to manage your infrastructure. And unless you monitor, detect, and address these drifts regularly, your automation will start to fail and become inconsistent.

What Causes Configuration Drift?

Configuration drift often occurs due to changes made to fix urgent issues or temporary adjustments through UI or CLI that aren’t reflected in the IaC configuration. This is a common configuration drift example that triggers the need for configuration drift management tools. To better understand how configuration drift unfolds in real-world cloud environments, here’s a visual of the configuration drift lifecycle:

Identify and Remediate Cloud Configuration Drift

Now let’s explore some of the main reasons why this happens, triggering a need for configuration drift management tools.

Manual Changes

One of the most common causes of configuration drift is making changes directly through the cloud provider’s interface. These changes often happen when there's an urgent need, such as resolving business application downtime, quickly fixing an ongoing incident like a sudden spike in traffic, or applying an update to address a newly discovered security vulnerability. This might involve adjusting your infrastructure, such as increasing the storage capacity of an EC2 instance using the AWS Console. However, your infrastructure drifts apart if you don’t update your IaC code to align with these changes.

This means that the actual state of your EC2 instance no longer matches what’s defined in your IaC config files, such as those created with Terraform or Pulumi. (For example, if you initially defined the instance type as t2.micro in your Terraform code, but someone changes it to t4.micro via the AWS UI, this creates a configuration drift.)

Additionally, if someone on your team made these changes without informing the rest of the team or updating the IaC config, you might not even know that the drift occurred.

To understand how configuration drift can occur, let’s consider an example using Terraform. For example, changing an EC2 instance’s storage size through AWS UI creates a mismatch between your cloud infrastructure and the configuration defined in your Terraform code, resulting in a configuration drift. Let us define an EC2 instance:

provider "aws" { region = "us-east-1" } resource "aws_instance" "firefly-d-01" { ami = "ami-0e86e20dae9224db8" instance_type = "t2.micro" subnet_id = "subnet-02efa144df0a77c13" root_block_device { volume_size = 10 } }

We’ve defined an EC2 instance with a root block device that has a storage size of 10 GB. Let’s say your teammate logs into the AWS Management Console and notices that the 10 GB of storage isn’t sufficient for the workload running on this EC2 instance and increases the volume size to 20 GB directly from the console without updating the Terraform code, as shown below:

The EC2 instance now has a 20 GB volume, but the Terraform code still defines it as 10 GB, creating a configuration drift. We can perform a Terraform drift detection by running terraform plan. This command checks for differences between the state of your infrastructure and what’s defined in your Terraform code, revealing any configuration drift as shown below:

The output from terraform plan clearly shows a difference between the current state with 20 GB of storage and the Terraform code, which specifies 10 GB.

You can now decide what to do next:

Update the Terraform code: Adjust the Terraform code to reflect the new 20 GB volume size so your Terraform code and infrastructure match.
Revert the change: If the change for 20 GB volume was done by mistake, run terraform apply to change it back to 10 GB within your infrastructure. This will make sure that your infrastructure matches with your Terraform code.

Using Script

Configuration drift doesn’t just happen due to changes via the cloud’s UI; it can also occur when using tools and scripts like PowerShell, AWS CLI, or Ansible because these tools don’t maintain state. Unlike IaC tools like Terraform or Pulumi, which track the state of your infrastructure, these other tools execute commands directly on the infrastructure.

For example, you might use AWS CLI to update an S3 bucket’s settings or adjust security policies. It will create a drift if the changes made aren’t updated in your IaC config. Suppose you have an S3 bucket that was created and managed by Terraform. Initially, everything is in sync, and your terraform state file accurately reflects your infrastructure. However, someone within your team uses the AWS CLI to update the tags on this bucket instead of modifying the Terraform code by running the given command:

aws s3api put-bucket-tagging --bucket my-terraform-firefly-bucket-integration-through --tagging 'TagSet=[{Key=Name,Value=MyBucket},{Key=Environment,Value=Development}]'

The S3 bucket now has tags that were added using the AWS CLI. However, these tags are not recorded in the Terraform state file, meaning the actual state of the bucket in AWS no longer matches what’s defined in your IaC config.

To identify the drift caused by using the AWS CLI, you would run a terraform plan. For the S3 bucket, the terraform plan will display where the current setup doesn’t match the Terraform code with an update, as shown below:

The output shows that the tags on the S3 bucket have been modified. Terraform detects that the actual tags on the bucket don’t match what’s specified in the Terraform configuration.

To resolve this drift, you have two main options:

Update the Terraform code: You can update the Terraform configuration to include the new tags. This ensures that your IaC configs are in sync with the actual state of the bucket, preventing future drifts.
Revert the changes: If the tags were added for testing or are not needed, you could remove them using the AWS CLI or by running terraform apply, which would revert the tags to your original Terraform configuration.

Why is it important to identify configuration drift?

Drift can create mismatches between your infrastructure and IaC configs, causing unexpected errors or downtime. It can also result in the configuration of a system falling out of compliance with regulations, leading to potential fines and security risks. Additionally, drift can negatively impact system reliability by causing automated deployments to fail and increasing costs due to misconfigured or over-provisioned resources. The right approach to configuration drift management, and the right configuration drift management tools, can make all the difference.

Let’s break down these challenges to understand why detecting and addressing drift is important:

1. You risk inconsistent IaC

Configuration drift happens when changes to your cloud infrastructure aren’t recorded in your IaC config. This can lead to deployment failures when you try to manage or update your infrastructure.

For example, you’ve set up an EC2 instance using Terraform, and the security group settings are defined in your IaC. Later, someone from your team changes the security group rules in the AWS UI to allow access from a different IP address. If this change isn’t updated in the code, your IaC will be out of sync with the actual setup.

The next time you deploy your infrastructure, Terraform will reset the security group rules to match what’s in the IaC config, removing the added IP address through UI, and leading to access issues for users who rely on that IP address. This might cause business loss.

When your IaC doesn’t align with your actual infrastructure, the code no longer represents the real state of your resources, making it harder to track. For example, if there’s a problem with access or configuration, you might look at your IaC config expecting it to show the current setup, but it doesn’t. This mismatch can lead to delays in identifying and fixing the problem, as what’s in the code doesn’t reflect the actual environment.

2. You may have to deal with compliance violations

Many industries have strict guidelines on how infrastructure should be configured to ensure security, privacy, and reliability. When your actual infrastructure doesn't match what's written in your IaC, it can lead to non-compliance, which may result in fines, penalties, or legal trouble.

For example, your organization needs to follow regulations like GDPR or HIPAA, which require all data stored in S3 buckets, such as customer information or financial records, to be encrypted. Your IaC is set up to ensure that encryption is always turned on. But if someone disables encryption on a bucket through the AWS UI and this change isn’t updated in your IaC, the bucket is now out of compliance. This drift might not be noticed until an audit is done for the deployed resources, at which point your company could face fines or other penalties for not following the required standards.

Drift can also make it hard to pass security audits. Auditors will check if your actual setup matches your documented configuration. If they find differences, it could lead to deeper scrutiny and bigger problems.

3. It can cause deployment failures

Configuration drift can cause deployment failures by creating differences between your actual infrastructure and IaC config. You can manage your infrastructure deployments using the CI/CD pipelines that support continuous deployment. When drift happens, your deployment might fail due to duplicate resources or limits being exceeded.

For example, you’ve set up a CI/CD pipeline that applies the Terraform code to configure your environment with roles and users. If one of your team members created the ‘cost-optimizer’ role using the AWS Console to test access, but doesn’t update the Terraform code. When you deploy your code with the same role, your deployment fails due to the role duplication. It causes delays as you rush to fix the issue by importing the resource or removing the existing role. Regularly checking for drift and keeping your IaC updated is important to avoid these disruptions.

4. It could lead to an increase in unnecessary operational costs

Configuration drift can lead to higher costs by making your cloud resources either misconfigured or over-provisioned. For example, if someone manually upgrades an EC2 instance from a t2.micro to an m5.large to handle a temporary surge in traffic, and this change isn’t updated in your IaC, you could end up paying for more resources than needed.

This drift can increase your cloud bills, as unused resources continue to run. In a cloud environment like AWS or GCP, where costs are based on usage, these unnoticed changes can lead to unnecessary expenses. Keeping your infrastructure aligned with your IaC helps you ensure that you’re only paying for what you need.

How to Prevent Configuration Drift?

To keep your infrastructure aligned with your IaC, it’s important to prevent configuration drift, which can cause unexpected errors, deployment failures, or security vulnerabilities.

But without proper governance over changes made to your cloud configuration, it’s near impossible. Detecting this drift problem isn’t easy, and remediation is, for some, not happening at all.

Firefly’s 2024 State of Infrastructure as Code report shows that:

Over the last year, many cloud practitioners have adopted dedicated tools to detect configuration drift
Still, 20% of survey respondents report that they can’t detect drift. And most others aren’t able to do so until their organization has been exposed to (and vulnerable as a result of) unauthorized changes for days or even weeks.

Want to ensure you stay on top of your drift management? Let’s look at steps you can take to keep your infrastructure in sync with your IaC:

Tip: Use IaC Consistently

One of the most effective ways to prevent configuration drift is to ensure that all infrastructure changes are made through IaC tools like Terraform rather than the cloud provider’s UI. This practice helps keep your infrastructure consistent and avoids the risks associated with untracked changes.

For example, you manage an RDS database instance using Terraform, defining its configurations, including storage size, instance class, and backup settings. One day, a team member realizes that the backup retention period needs to be extended to meet a sudden compliance request. To solve the problem, the team member logs into the AWS Management Console and changes the backup retention period through the UI.

This adjustment resolves the issue temporarily but introduces configuration drift. Now, the RDS instance’s backup settings don’t match what’s defined in the Terraform code. If you later run terraform apply, Terraform will update the backup settings to the original configuration, reducing the retention period and risking data loss.

This situation shows why it’s important to make all changes through IaC tools like Terraform. When you consistently use IaC:

All changes are tracked and documented: Every change you make is reflected in the code, making it easier to review and understand what’s been done.‍
You avoid unexpected reversions: Since all changes are applied through Terraform, you won’t accidentally overwrite adjustments done through AWS UI, reducing the risk of downtime or other issues.‍
Team collaboration is smoother: Everyone on the team is working from the same set of configurations, ensuring consistency and reducing the chance of miscommunication.

Tip: Conduct Regular Audits

Regular audits are important for keeping your infrastructure in check and preventing configuration drift from going unnoticed. By routinely comparing your current setup with your IaC config, you can easily identify and address potential problems and fix any drifts.

Let’s say you manage a cloud environment with various resources like EC2 instances, S3 buckets, WAF rules, and RDS databases. Over time, small changes, including missing security patches, might be made, sometimes without your knowledge. For example, there was an incident with some malicious API attacks, and you created WAF rules using the AWS console to prevent them. This change can create drift, where your actual infrastructure no longer matches what’s defined in your IaC.

To stay ahead of these issues, you perform regular audits. For example, you might manually see if your infrastructure matches what’s in your IaC config as you passed certain tags in your Terraform code. During one of these audits, you might find that an EC2 instance has a security group setting that doesn’t match your IaC config, causing unexpected URL blocks or access issues with the applications.

Catching this drift early allows you to either update your Terraform code to match the current state or revert the change to align with your original plan. Regular audits like this help keep your infrastructure consistent, minimize risks, and ensure everything runs smoothly. With consistent audits, you can:

Spot Unapproved Changes: Audits help you catch changes made outside of your IaC process before they lead to bigger issues.
Stay Compliant: Regularly verifying your setup against standards ensures you’re meeting regulatory requirements.
Avoid Surprises: By detecting drift early, you can fix problems before they cause downtime, security gaps, or unexpected costs.

Tip: Use CI/CD Pipelines

Regularly running the CI/CD pipeline is another way to prevent configuration drift. By integrating checks for drift into your CI/CD process, such as running terraform plan frequently, against your Git repository, you can catch inconsistencies between your infrastructure and IaC early and frequently.

For example, you can set up your CI/CD pipeline to automatically run terraform plan at regular intervals with each commit made towards the infrastructure deployment. This will help identify any drift by comparing the current state of your infrastructure with the intended state of the IaC configuration.

If any drift is detected, the pipeline can alert you in real time and fail, allowing you to address the drift before it affects your deployment. This practice ensures your infrastructure remains in sync with your IaC, reducing the risk of errors, downtime, or security issues caused by drift.

Tip: Lean on Proactive Alerts and Monitoring

Using monitoring and alerting systems is an effective way to keep track of your infrastructure. These tools help you spot unplanned or unauthorized changes so you can address them quickly.

For example, if you’re managing a cloud setup with multiple team members making configuration changes, you can use tools like AWS Config or CloudWatch to monitor resources like EC2 instances and S3 buckets.

If someone changes the security settings on an S3 bucket through AWS UI, the monitoring system will send you an alert, allowing you to quickly detect and decide whether to approve or reverse it. Let’s discuss the importance of alerts and monitoring:

Instant alerts: You find out about changes as soon as they happen.
Quick fixes: You can quickly deal with any changes that weren’t planned.
Consistency: Regular monitoring helps keep your infrastructure consistent and aligned with your IaC.

Why Choose Firefly Among Configuration Drift Management Tools?

We know that drift happens. Concerningly, however, drift remediation often doesn’t.

Data from our 2024 State of IaC report shows that when it comes to drift remediation, fewer than half can implement a fix within 24 hours. Even more worryingly, 13% do not fix the issue at all.

That’s never the case with Firefly.

Firefly helps you prevent configuration drift by automatically detecting drifts and misconfigurations, making it easier to keep your cloud environment consistent. With Firefly, you can monitor drifts, view change history, and roll back to previous settings if needed.

Let’s explore how Firefly can assist with Drift Detection:

Monitor Drift in Firefly's Dashboard

Firefly's centralized dashboard provides a clear and comprehensive view of your infrastructure, allowing you to monitor for configuration drift. As shown in the screenshot, you can see detailed insights into your cloud resources, such as the percentage of unmanaged assets, the occurrence of drift, and the status of various IaC stacks. The visual representation makes it easy to spot where drift has occurred and how your infrastructure is aligned with your IaC.

You can quickly identify issues like drift, unmanaged resources, or potential cost savings and take immediate action to address them. This dashboard not only shows the current state but also tracks changes over time, helping you maintain a consistent and secure cloud environment.

By clicking on the drifted data source, you can see exactly what has changed compared to your IaC. This provides a comprehensive breakdown of the drift, highlighting differences in properties, tags, and other key configurations. You can either codify these changes back into your IaC or revert the resource to match the original configuration, ensuring that your infrastructure remains consistent and secure. That's the power of configuration drift management tools.

With options like ‘Drift Details’, ‘Codify’, and ‘Migrate’, you can explore the specific changes that caused the drift, update your IaC to match the current state or revert the resource to its original configuration. This detailed view in Firefly allows you to quickly take precise actions, ensuring your infrastructure remains consistent with your IaC and functions smoothly without any deployment failures.

Codify your Drift

Firefly allows you to automatically generate codified versions of your infrastructure, including unmanaged resources. As shown in the image below, Firefly presents a detailed view of the infrastructure's code, highlighting key attributes such as instance type, storage settings, and security groups. This codified view makes it easy to incorporate any unmanaged resources back into your IaC setup.

You can directly export this codified configuration, create pull requests, or integrate it into your existing IaC tools like Terraform, Pulumi, or Ansible. This feature helps ensure that your infrastructure is always aligned with your desired state, across various environments, reducing the risk of drift and making it easier to manage and scale your cloud environment.

Stay Informed with Alerts

Firefly makes it easy to stay informed about any configuration drift by sending alerts directly to your Slack or email. This ensures you're immediately aware of any changes, allowing you to quickly address issues and keep your infrastructure stable. Let’s look at how you can set up a new notification in Firefly:

Navigate to ‘Notifications’.
Click on ‘+ Add New’.

From the ‘Event Type’ dropdown, select the event you want to be notified about.

Under ‘Criteria’, choose the relevant data source.

Select your notification ‘Destination’ (Slack or email) and click ‘Create’.

With these steps, you’ll receive timely alerts that help you maintain control over your cloud environment with quick feedback.

Firefly is a configuration drift management tool that simplifies cloud management by helping you detect and correct configuration drift. With features like monitoring, detailed codification, and instant alerts, you can keep your infrastructure aligned with your IaC. It makes it easy to manage your cloud environment, preventing costly mistakes using a single platform. By using Firefly, you can streamline your infrastructure, embrace best practices for configuration drift management, and keep your infrastructure running smoothly.

Featured blog posts

IaC Automation in Action - DIY CI Pipelines without the Pain

The Misconfig Heard Around the World: Why Ops is Always Business Critical

Embracing the Future: Firefly Innovation and the Gartner SRE Hype Cycle 2024

Related case studies

How ZoomInfo fixed their enterprise cloud incident response with Firefly’s Backstage Plugin

How a celebrity-led brand codified legacy resources, migrated to Terraform, and got disaster-ready

How a global healthcare organization automated compliance for a cloud estate with 75% untagged assets

Play Asset Mutations Racer

Welcome to the Asset Mutations Racer

Your mission: track, manage, and control changes across your entire cloud ecosystem.

An asset mutation occurs when an asset revision is made in your cloud infrastructure. Some are beneficial and lead to a well-controlled cloud, but others are harmful, creating risk and waste.

Use your ↑up and ↓down arrow keys to collect as many beneficial asset mutations as possible.

Avoid harmful asset mutations! Firefly enables rollbacks, but—in this game—you are only allowed 3. When you apply a harmful mutation and are out of rollbacks, your services will be disrupted and it is game over.

Play Drift Defender

Firefly Drift Defender

Score: 0 | High Score: 0

Welcome to Firefly Drift Defender!

Your mission is to prevent drifts in your cloud infrastructure. A drift occurs when the desired state defined in your configuration files doesn't match the actual state of your cloud infrastructure, which can cause deployment issues and security risks.

In this game, you are trying to prevent drift in your Databases, Network, Server, and Storage configurations. When a drift occurs, a resource will catch on fire.

Click on the drifted resource to automatically remediate it, and earn points.

Sadly, your platform engineers are making several manual changes in your cloud consoles, so you'll experience more drifts over time. When you have 5 drifts simultaneously, your services will be disrupted and the game will be over.

Game Over

Your Score: 0

Your High Score: 0

Play Ghosty Cloud

Firefly Ghosty Cloud

score2: 0 | High score2: 0

Welcome to Firefly Ghosty Cloud!

Your mission is to avoid ghosted resources in your cloud infrastructure.

A ghosted resource was once created through Infrastructure as Code (IaC) but has since been deleted or is missing from the actual cloud infrastructure.

In this game, use your spacebar to avoid ghosted resources in your cloud.

The further you go without encountering a ghost resource, the more points you earn for having a reliable and immutable cloud infrastructure.

Game Over

Your score: 0

Your high score: 0

How to Identify and Remediate Cloud Configuration Drift (and Implement Best Practices for Prevention)

What is Configuration Drift?

What Causes Configuration Drift?

Manual Changes

Using Script

Why is it important to identify configuration drift?

How to Prevent Configuration Drift?

Tip: Use IaC Consistently

Tip: Conduct Regular Audits

Tip: Use CI/CD Pipelines

Tip: Lean on Proactive Alerts and Monitoring

Why Choose Firefly Among Configuration Drift Management Tools?

Monitor Drift in Firefly's Dashboard

Codify your Drift

Stay Informed with Alerts

Featured blog posts

IaC Automation in Action - DIY CI Pipelines without the Pain

The Misconfig Heard Around the World: Why Ops is Always Business Critical

Embracing the Future: Firefly Innovation and the Gartner SRE Hype Cycle 2024

Related case studies

How ZoomInfo fixed their enterprise cloud incident response with Firefly’s Backstage Plugin

How a celebrity-led brand codified legacy resources, migrated to Terraform, and got disaster-ready

How a global healthcare organization automated compliance for a cloud estate with 75% untagged assets

Firefly: alien technology, now available on Earth

Firefly: alien technology, now available on Earth

Play Asset Mutations Racer

Firefly Asset Mutations Racer

Welcome to the Asset Mutations Racer

Your Cloud Asset Mutations

Game over

Play Drift Defender

Firefly Drift Defender

Welcome to Firefly Drift Defender!

Your Infrastructure

Game Over

Play Ghosty Cloud

Firefly Ghosty Cloud

Welcome to Firefly Ghosty Cloud!

Game Over