A few months ago, a Reddit post perfectly captured a mistake many of us have either made or seen firsthand.

ā€œ$120,000 in 5 days. VP R&D set up the infra for a stress test… traffic out was insane… didn’t set up a VPC endpoint to S3. All the traffic was going through the internet.ā€

Reddit post

Here’s what happened.

The VP of R&D set up a few EC2 instances for a quick POC. It was meant to be done quietly without looping in the rest of the team. But instead of using a VPC endpoint for S3 access, which would have routed traffic over the AWS backbone, it defaulted to the public internet. As a result, all S3-to-EC2 traffic was treated as internet egress.

Obviously, since this was spun up in a rush, no one bothered to add tags, set up cost alerts, or define a budget. The Terraform code was merged without a single cost-related guardrail.Ā  By the time anyone noticed, the AWS bill had jumped by $120K without a single production user involved.

This wasn’t a one-off. It’s a common pattern: infrastructure gets built, costs get ignored, and then finance steps in when it’s already too late.

This blog walks through how to avoid incidents like this by catching cost risks early, right at the point of infrastructure planning. We’ll focus on how to make Terraform workflows enforce cost tagging, budget thresholds, and policy checks that block expensive changes before they hit your cloud bill, before you run terraform apply. And we’ll look at how Firefly fits into that picture to automate and scale these practices across teams.

Why Cloud Spend Needs to Shift Left

In most engineering teams, cloud costs are still treated as a backend finance problem. You ship infrastructure, someone checks the bill a few weeks later, and by the time a cost spike shows up, it’s already been running in production for several weeks.

That delay is exactly the problem.

The further to the right you catch a mistake (after infra is deployed, users are active, and systems are running), the harder it is to fix. You have more blast radius, more ownership confusion, and in many cases, you just live with the waste because rollback would disrupt something else.

Shifting left means we start treating cloud costs like performance or security: a consideration at design time. Not an afterthought.

For infrastructure, that shift starts with Terraform. terraform plan is the best opportunity to inspect what’s about to change before it goes live. That’s when we should be asking:

  • Are all resources tagged correctly?
  • Are any large EC2 instances being introduced?
  • Is anything violating environment-specific budget thresholds?
  • Is this introducing drift or unmanaged infrastructure?

This is where cost governance belongs, not in a billing dashboard two weeks later, but in the same CI/CD pipelines where you review and approve infra changes.

By embedding these checks in your Terraform workflows, you reduce Total Cost of Ownership (TCO), avoid over-engineering, and align infrastructure choices with business outcomes. You make cost a design-time constraint, not a post-incident headache.

Metrics to Measure Before Cloud Resources Are Deployed

If you're using Terraform to manage infrastructure, you already have a chance to inspect what’s about to be deployed. But it’s not enough to just check for functional correctness; you also need to evaluate the cost impact of what resource you’re provisioning. The only way to do this effectively is by measuring the right things before the infrastructure hits production.

Here are the core metrics you should track inside your Terraform workflow:

Estimated Cost per Change

Every pull request or Terraform plan represents a change to your infrastructure, and that change might be associated with some charge. You need to surface that charge at review time, not after deployment.

For example, if a developer adds three t3.xlarge EC2 instances to a compute module, the pipeline should estimate how much those instances will cost per month. That number needs to show up directly in the pull request, so the reviewer understands the financial impact of the change.

Example: ā€œ+3 EC2 instances (t3.xlarge) in us-east-1 → projected cost increase: $280/monthā€

This allows reviewers to catch over-provisioning early and make informed decisions before merging.

Tagging Resources

Tagging is essential for effective cloud cost management. When resources like EC2 instances, S3 buckets, or RDS databases are created without proper tags, they show up in AWS or GCP cost reports under ā€œunallocatedā€ or ā€œunknown.ā€ This makes it difficult to assign ownership, allocate budgets, or report usage back to business units.

To prevent this, most organizations define a standard tagging schema, typically including keys like:

  • team
  • env (or environment)
  • owner
  • cost_center

Here’s how an EC2 instance in the eu-north-1 region can be tagged using Terraform to comply with the organization's tagging policy:

provider "aws" {
  region = "eu-north-1"
}
variable "environment" {
  type    = string
  default = "dev"
}
variable "cost_center" {
  type    = string
  default = "CC1234"
}
resource "aws_instance" "example" {
  ami           = "ami-00b3234e97386251c"
  instance_type = "t2.micro"
  tags = merge(
    {
      Name        = "example-instance"
      Environment = var.environment
      CostCenter  = var.cost_center
      Owner       = "Chris Harris"
    },
    {
      SpecificTag = "value"
    }
  )
}

In the above example:

  • EC2 instance includes tags for Environment, CostCenter, and Owner.
  • These tags enable accurate cost attribution, ownership tracking, and auditability.

While tagging in Terraform is a good start, teams often forget or bypass tag policies in fast-moving environments. To ensure all resources are tagged consistently, teams can use Firefly Guardrails to enforce tagging policies for the whole infrastructure automatically.

The snapshot below shows a live Firefly Guardrail named required-tags-enforcement, configured to block any deployment that misses one or more required tags:

live Firefly Guardrail named required-tags-enforcement, configured to block any deployment that misses one or more required tags

Guardrail, named required-tags-enforcement, is configured to enforce tag compliance across all infrastructure changes. It is of type Tag and explicitly requires the presence of four critical tags: team, env, owner, and cost-center. The violation behavior is set to block deployments, meaning any resource creation or modification that lacks these tags will be automatically denied. The Guardrail applies across all workspaces and repositories, ensuring organization-wide consistency.

Core Practices to Control Spend in Terraform

After establishing tagging and metric visibility in your Terraform plans, the next step is to apply real controls that prevent waste, enforce ownership, and block risky deployments before they go live.

These aren't just soft recommendations. They're guardrails that are defined, codified, and executed in both your Terraform code and your CI/CD workflows.

Here’s exactly how to do that:

Estimate and Show Cost in Code Reviews

Every infrastructure change should include a clear cost impact, visible right inside the PR. That means showing cost deltas during terraform plan, not in a separate billing dashboard, two weeks later.

Use tools like Firefly that parse the planned resource changes and generate summaries like:

  • Cost Change: +$ 154.20/month
  • Resources Added: 3 EC2 m5.large in us-east-1
  • Tag Compliance: 100% tagged
  • Policy Violations per resource: associate_public_ip_address should be set to false.

By this, the engineer immediately knows what’s being added, whether it’s tagged, what it will cost, and if it violates any infra policies. This prevents bad infra decisions from getting merged silently.

Set Budget Thresholds using aws_budgets_budget

To prevent uncontrolled spending, you can embed monthly budget limits directly into your Terraform code using the aws_budgets_budget resource. This lets you define cost thresholds for a specific team, project, or environment, and get alerts when actual or forecasted usage exceeds them. By codifying budgets, you're shifting cost control from the finance team into the infrastructure layer, early, visible, and enforced automatically.

Here’s a practical example: for a data pipeline tagged with team=data, you can define a $1,000/month budget. Alerts are triggered when actual spend goes above 80%, and again if forecasted spend hits 99%. Notifications can go to your engineering or FinOps Slack channel or email you have added, depending on how you route them. Here’s an example:

resource "aws_budgets_budget" "team_data_budget" {
  name         = "data-pipeline-prod"
  budget_type  = "COST"
  time_unit    = "MONTHLY"
  limit_amount = "1000"
  limit_unit   = "USD"

  cost_filters = {
    "TagKeyValue" = "team$data"
  }
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 80
    threshold_type             = "PERCENTAGE"
    notification_type          = "ACTUAL"
    subscriber_email_addresses = ["alerts@datateam.example.com"]
  }
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 99
    threshold_type             = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_email_addresses = ["ops@datateam.example.com"]
  }
}

This pattern works well for teams practicing chargebacks or who need guardrails in dev/test accounts. It’s low-effort to implement and runs natively within AWS, making it a simple but powerful FinOps control to stop overages before they land on the bill.

Automating Cost Visibility and Governance with Firefly

Once integrated, Firefly becomes your central FinOps engine, pointing cost, compliance, and tag visibility directly at the developer in real time.

The core idea is simple: make cost and compliance feedback part of every change, in real time. Whether you're deploying a new environment, updating a module, or catching a drifted resource in production, Firefly provides actionable insights where they matter: your pull requests, pipelines, and dashboards.

What Firefly Surfaces in PRs and Pipelines

For every plan run, Firefly evaluates your infrastructure against defined Guardrails. These include tagging requirements, instance type standards, networking rules, and cost thresholds. If a change violates policy, it’s blocked before merge and surfaced with full context.

Here’s a Firefly’s comment in the PR in the Github repo before terraform apply in a workflow deployed using Firefly:

Firefly’s comment in the PR in the Github repo before terraform apply in a workflow deployed using Firefly

Firefly tells you what changed, how it affects your bill, and whether it aligns with your organization’s cost governance policies. Developers get everything in the same loop, no waiting on audits or guessing what went wrong.

Enforcing Cost Discipline with Firefly Guardrails

Let’s say your team wants to ensure that no infrastructure change, no matter how small, gets merged if it introduces unexpected cost increases. You can restrict this using a Guardrail rule in Firefly.

The workflow below illustrates how Firefly enforces cost guardrails during your Terraform workflow:

The workflow below illustrates how Firefly enforces cost guardrails during your Terraform workflow:

Here’s how the ec2-cost-control Guardrail is set up:

how the ec2-cost-control Guardrail is set up
  • Rule Name: ec2-cost-control
  • Rule Type: Cost
  • Violation Behavior: Block deployment
  • Scope: All workspaces, repositories, and branches (or scoped specifically if needed)
  • Criteria: Cost Change → Exact amount > $50

Firefly uses the Terraform plan to calculate the estimated cost delta. If any PR or pipeline introduces a change that would increase cost by over $50/month, the deployment is blocked and flagged in CI.

Here’s how guardrails are listed in Firefly:

how guardrails are listed in Firefly:

This may sound strict, but for production environments or cost-sensitive projects, it enforces discipline around changes. Developers can’t accidentally slip in a new EC2 instance, upscale an RDS tier, or provision EBS volumes with excessive IOPS unless it's intentional and reviewed.

Continuous Monitoring With Firefly Post-Deployment

Firefly doesn’t stop working once your code is merged or terraform apply completes. Post-deployment, it continues monitoring your cloud environments for any cost-impacting drift, misconfigured resources, or infrastructure that was created outside of Terraform (e.g., manually via the AWS Console or CLI).

This is critical for FinOps maturity: even if your provisioning process is controlled, ongoing cost optimization and governance enforcement are what prevent long-term cloud waste. One of the most impactful categories tracked under Governance is Cloud Waste. This includes issues from low-priority severity to high:

Here’s how Firefly's Governance panel filtered specifically by Cloud Waste policies:

how Firefly's Governance panel filtered specifically by Cloud Waste policies

These issues include:

  • Unused volumes
  • Unattached disks
  • Non-optimized instance families
  • Legacy configurations (e.g., gp2 instead of gp3)

These often go unnoticed but can silently accumulate unnecessary cloud spend.

A few months back, Firefly flagged a fleet of EBS volumes running on gp2, which are 20% more expensive than gp3 for most workloads. Firefly tagged the issue under the asset AWS GP2 Type EBS Volumes and showed that switching these to gp3 would result in immediate savings.

Here’s the snapshot showing how Firefly’s UI tagged it:

showing how Firefly’s UI tagged it

The above shown EC2 volume group alone was marked with a potential optimization of $17.60/month, simply by changing the volume type, without any downtime or re-provisioning. This wasn’t a single instance; Firefly generated a list of 20+ volumes across multiple regions, each with its associated CLI command, tagged with the projected savings.

FAQs

What is the Difference Between Cloudops and Finops?

CloudOps is focused on keeping cloud infrastructure running: provisioning, monitoring, scaling, securing, and maintaining availability and performance. It’s about uptime, automation, and operational stability.

FinOps, on the other hand, is focused on cost accountability and optimization. It brings financial visibility into how cloud infrastructure is used, helping teams track spend, forecast usage, and eliminate waste, without slowing down engineering velocity.

What is FinOps in AWS?

FinOps in AWS means applying financial accountability and governance across your AWS infrastructure. It involves practices like:

  • Tagging resources for cost allocation (owner, cost_center, environment, etc.)

  • Setting budgets using aws_budgets_budget

  • Using tools like AWS Cost Explorer, AWS Budgets, or third-party platforms (e.g., Firefly) for visibility

  • Enforcing cost thresholds and policies in CI/CD before deployment

AWS supports this through native services, but FinOps maturity typically involves extending these with IaC automation and cost feedback embedded in engineering workflows (e.g., Terraform plans + guardrails).

What is the Lifecycle of Cloud Finops?

The FinOps lifecycle typically follows this pattern:

  1. Inform – Gain visibility into who is spending what, where, and why (e.g., via tagging, reports, dashboards).

  2. Optimize – Identify and remediate waste (idle resources, oversized instances, unused volumes).

  3. Operate – Automate enforcement using policy-as-code, budget controls, and continuous monitoring to ensure alignment over time.

This cycle repeats continuously as cloud usage changes. Mature FinOps orgs integrate all three phases directly into dev and CI/CD workflows, not just finance teams.

Is Finops Part of Devops?

FinOps complements DevOps, but it’s not a sub-discipline. Where DevOps focuses on delivery speed, automation, and reliability, FinOps brings cost visibility and accountability into those same workflows.

In practice, FinOps becomes part of the platform engineering function, embedding cost controls, budgets, and tagging standards into Terraform modules, pipelines, and pull request reviews. This ensures infrastructure is not just secure and scalable, but also cost-efficient and aligned with business goals.

ā€