Terraform Data Sources: Why Your Lookups Keep Breaking and How to Fix Them

By Firefly

Terraform data sources let you query existing infrastructure at runtime, but they introduce fragile dependencies on tags, timing, and external APIs. This guide covers the most common failure modes in production and CI/CD pipelines, and how to make data source lookups reliable and policy-compliant at scale.

Terraform

Explore the resource

TL;DR

Terraform is great at creating and managing infrastructure. But once that infrastructure exists, running operations on it like migrations, credential rotations, and restarts has always lived outside Terraform in scripts and pipelines.
Data sources are how Terraform references infrastructure it didn't create. Instead of hardcoding IDs, you query for them. Find me the VPC tagged "prod", find me the latest AMI.
The problem is that these queries are fragile. A missing tag, multiple matches, or a resource that doesn't exist yet will fail your entire deployment.
In CI/CD pipelines, this gets worse. Different environments have different tags, modules race against each other, and API failures break builds in ways that are hard to debug.
Firefly adds a validation layer on top, ensuring the resources your queries return are correctly tagged, policy-compliant, and safe to use before anything gets deployed.

When you're managing infrastructure with Terraform, there's almost always existing infrastructure in the account that you didn't create and don't own: VPCs from the networking team, shared subnets, org-level IAM roles, and security-approved AMIs. You need to use these resources, but you're not the one managing them.

This is where data sources come in. Instead of hardcoding resource IDs, you query for them at runtime. Find me the VPC tagged "prod-vpc" and the latest approved AMI. It works, but it introduces a new set of problems. Your deployment now depends on tags being consistent, queries returning exactly one match, and external resources being in the right state before your plan runs.

This came up in a Reddit thread where someone had two modules in the same project. Module A creates a VPC, Module B needs that VPC ID and queries for it using a data source. Module B runs before the VPC exists, the query returns nothing, and the plan fails. They spent time debugging timing issues when the fix was simply passing the VPC ID as a module output. This is the core challenge with data sources: they tie your deployment to external state and timing that Terraform doesn't control, and the failure modes aren't always obvious.

This blog covers how data sources work, where they break in production, and how to make them reliable.

What Is a Terraform Data Source and When Should You Use It?

A data source is a read-only lookup. It queries the provider API (AWS, GCP, Azure) to fetch details about existing resources. It does not create anything, does not manage lifecycle, and it only retrieves data at runtime.

Example:

data "aws_vpc" "main" {
  tags = {
    Name = "prod-vpc"
  }
}

This calls the AWS API, finds VPCs with tag Name=prod-vpc, and returns attributes like data.aws_vpc.main.id and data.aws_vpc.main.cidr_block. You use data sources inside resource blocks:

resource "aws_subnet" "app" {
  vpc_id     = data.aws_vpc.main.id
  cidr_block = "10.0.1.0/24"
}

Terraform resolves the data source first, then creates the subnet using that VPC ID.

How Data Sources Differ from Variables and Resources

Data sources look similar in usage to variables, but they behave very differently. A variable is a static value you pass into your config. A data source is a live API query that executes during terraform plan.

This means the data source depends on external state and API behavior. If the query fails, your plan fails. If the API returns different results, your plan changes. If multiple resources match your filter, Terraform errors out.

Think of it this way: variables are values you control, data sources are values you discover.

When to Use Data Sources vs. Resources

Use a data source when the resource already exists, you don't manage it in this Terraform config, and you need runtime discovery.

Don't use a data source when you already control the resource in Terraform (use the resource block instead), you can pass outputs between modules, or you need reproducibility.

If you own the resource, don't query it with a data source. Use resource "aws_vpc" "main" or pass outputs between modules. Data sources are for: "I don't own this, but I need to use it."

How Data Sources Execute During terraform plan

Understanding how data sources work internally is critical for debugging failures.

Data Sources Call Provider APIs, Not Local Lookups

When you write:

data "aws_vpc" "main" {
  tags = {
    Name = "prod-vpc"
  }
}

Terraform doesn't search locally. It calls the AWS API through the provider: "List VPCs where tag Name = prod-vpc."

The provider sends the request, gets the response, and maps it into Terraform attributes.

Data Sources Use Filters, Not Exact Lookups

Data sources aren't "Get me VPC with ID X." They're "Find resources matching these conditions."

Behavior depends on results:

0 matches: error
1 match: success
Multiple matches: often errors

That's why loose filters break things.

Data Sources Run at Plan Time, Not Apply Time

Data sources execute during terraform plan, not after. Not during apply.

Consequences:

They must succeed before Terraform can show a plan
They can't depend on resources created in the same run (by default)

Problem example:

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

data "aws_subnets" "all" {
  filter {
    name   = "vpc-id"
    values = [aws_vpc.main.id]
  }
}

This often fails because:

Data source runs before VPC exists
API returns nothing

Data Sources Are Nodes in Terraform's Dependency Graph

When you write:

subnet_id = data.aws_subnet.main.id

Terraform builds a graph: data.aws_subnet.main → aws_instance.app

Data source runs first, resource runs after.

Data sources are upstream nodes in Terraform's execution graph. They feed values into resources.

Data Sources Have No Lifecycle or State Ownership

Resources:

Are stored in Terraform state
Have lifecycle (create/update/destroy)

Data sources:

Are not owned
Are re-fetched every run
Are not versioned

Every terraform plan:

Hits provider APIs
Re-evaluates queries
May return different results

Your Terraform run now depends on API availability, correct filters, and consistent external state. Not just your code.

When to Use Data Sources: Four Scenarios

Data sources make sense in messy, shared, multi-team environments. Not in isolated examples.

1. Referencing Shared Infrastructure (VPCs, Subnets, Security Groups)

Most common case. You don't own the network layer; it's provisioned separately, sometimes in a different repo or account.

data "aws_vpc" "shared" {
  tags = {
    Name = "prod-vpc"
  }
}

data "aws_subnets" "app" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.shared.id]
  }
}

resource "aws_instance" "app" {
  subnet_id = data.aws_subnets.app.ids[0]
}

What's happening:

You're discovering infrastructure, not defining it
You're depending on naming/tagging conventions
Any inconsistency there breaks your deployment

2. Dynamic Discovery (Latest AMIs, Availability Zones)

Some values change constantly. Hardcoding them isn't practical.

data "aws_ami" "latest" {
  most_recent = true
  owners      = ["amazon"]
}

resource "aws_instance" "app" {
  ami = data.aws_ami.latest.id
}

Another example:

data "aws_availability_zones" "available" {}

‍

What's happening:

You're pulling "current state" from the cloud
Your config becomes environment-aware
But also non-deterministic

3. Cross-Stack Integration (Remote State)

How teams connect Terraform projects.

data "terraform_remote_state" "network" {
  backend = "s3"
  
  config = {
    bucket = "tf-state-prod"
    key    = "network/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "app" {
  subnet_id = data.terraform_remote_state.network.outputs.subnet_id
}

What's happening:

You're consuming outputs from another stack
This is a tighter coupling than it looks
State changes upstream can break downstream plans

4. Environment-Aware Configuration (Account ID, Region Metadata)

You don't want to hardcode account IDs or region-specific values.

data "aws_caller_identity" "current" {}

resource "aws_iam_policy" "example" {
  name = "policy-${data.aws_caller_identity.current.account_id}"
}

What's happening:

Config adapts based on where it runs
Useful in multi-account setups
But again, runtime dependency

Common Pattern Across All Use Cases

You don't control the source of truth; you query it.

Where teams get it wrong:

Using data sources where outputs should be used
Overusing "latest" for critical infrastructure
Relying on weak filters (tags like prod)
Mixing ownership (half resource, half data source)

Practical rule:

Use data sources when:

The resource already exists
You don't manage it in this config
You need runtime discovery

Avoid them when:

You control the resource
You can pass outputs cleanly
You need reproducibility

Why Data Sources Fail: Five Common Failure Modes

1. Plan-Time Failures from Timing Issues

Data sources run during terraform plan. If they depend on something that doesn't exist yet, they fail immediately.

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

data "aws_subnets" "all" {
  filter {
    name   = "vpc-id"
    values = [aws_vpc.main.id]
  }
}

Terraform tries to read subnets before the VPC exists. API returns nothing. Plan fails.

2. Fragile Queries That Break on Tag Changes

data "aws_vpc" "main" {
  tags = {
    Name = "prod"
  }
}

No match → hard failure. Multiple matches → ambiguous → failure. Tag changes → silent break. You're relying on conventions, not guarantees.

3. Non-Deterministic "Latest" Queries

data "aws_ami" "latest" {
  most_recent = true
}

Today → AMI A. Tomorrow → AMI B. Same code → different infrastructure. You lose reproducibility. This becomes a problem in rollbacks, debugging, and compliance audits.

4. External System Dependency and API Reliability

Every plan triggers API calls. If AWS/GCP/Azure throttles requests, has latency issues, or returns inconsistent results, your Terraform run fails. This shows up under load or in large organizations.

5. Performance Overhead at Scale

Large configs with many data sources slow down terraform plan, increase API calls, and hit rate limits. Especially bad in CI pipelines or parallel jobs.

Data sources introduce runtime uncertainty into Terraform. Instead of everything being defined in code, part of your system depends on external state, external APIs, and external ownership.

Where Data Sources Break in Production CI/CD Pipelines

Everything we've discussed becomes very real once you run Terraform in pipelines, across environments, and with multiple teams. Data sources stop being "just lookups" and start breaking builds.

1. CI/CD Failures Across Environments

Typical pipeline: PR → terraform plan → approval → apply

Now introduce a data source:

data "aws_subnet" "main" {
  tags = { Name = "app-subnet" }
}

What goes wrong:

Tag doesn't exist in staging → plan fails
Multiple matches in prod → plan fails
Resource renamed → pipeline breaks

Same code, different environment → different result.

This is one of the biggest sources of "it works locally but not in CI."

2. Race Conditions Between Modules

Very common in modular setups.

Example:

Module A → creates VPC
Module B → uses data source to fetch that VPC

data "aws_vpc" "main" {
  tags = { Name = "prod-vpc" }
}

Problem:

Module B runs before A finishes
Data source returns nothing
Plan fails

Even worse in parallel pipelines.

3. Environment-Specific Tag Mismatches

You assume:

tags = { Name = "prod-vpc" }

But reality:

dev → dev-vpc
staging → staging-vpc
prod → prod-vpc

Now your data source works only in one environment. People patch this with variables, but complexity increases, and consistency decreases.

4. Drift in External Resources

Data sources depend on things you don't control.

Example: Someone modifies a subnet, replaces it, or deletes/recreates it.

Next terraform plan:

Data source resolves differently
Resource behavior changes
Plan shows unexpected diffs

5. "Latest" AMI Breaks Production Deploys

data "aws_ami" "latest" {
  most_recent = true
}

Works fine until:

New AMI released
Your next deploy uses it
Something breaks

Now, rollback becomes difficult, and debugging becomes messy.

6. Hidden Coupling Across Teams

You rely on:

data "aws_vpc" "main" {
  tags = { Name = "prod-vpc" }
}

Another team:

Renames tag
Restructures network
Creates multiple VPCs

Your pipeline breaks without warning.

7. IAM Permission Issues in CI Pipelines

Data sources need read access. In CI, IAM roles might be restricted and API calls fail.

Error: AccessDenied: not authorized to describe VPCs

Your entire plan fails, not because of your code, but because of permissions.

8. Debugging Non-Obvious Failures

When a data source fails, the error is usually vague, API-level, and not tied clearly to your logic.

You end up debugging:

Filters
API responses
Environment state

Not just Terraform code.

What This Boils Down To

Data sources move part of your system outside Terraform's control. You're now depending on external infrastructure, naming conventions, timing between systems, and API behavior.

In real systems, data sources are necessary, but they are a point of failure. Treat them as integration points, not simple lookups.

How Firefly Prevents Data Source Failures Through Governance

At this point, the gap should be clear:

Terraform fetches data (data sources)
Terraform uses that data (resources)
But nothing checks if that data is actually safe or compliant

That's where Firefly comes in.

What Firefly Does for Data Sources

Firefly sits on top of your workflow and answers: "Given what Terraform is about to use, is this allowed?"

It doesn't replace data sources. It doesn't change how Terraform works. It adds a layer that understands your infrastructure, evaluates policies, and enforces rules.

Terraform fetches data and uses it. Firefly validates that data before you use it, enforces policies on it, and monitors it continuously.

1. Validating Data Source Lookups Before They Break

Take this example:

data "aws_subnet" "main" {
  tags = { Name = "app-subnet" }
}

resource "aws_instance" "app" {
  subnet_id = data.aws_subnet.main.id
}

Terraform fetches the subnet and uses it. But there's no validation that the subnet is actually safe to use. The data source returns whatever matches the tag—whether that subnet is public, has the right security groups, or belongs to an approved VPC.

Firefly adds a validation layer. Before you use that subnet, it checks: Is this subnet private? Does it have flow logs enabled? Does it belong to an approved VPC? If the subnet violates any security policies, the deployment is blocked before the instance gets created in the wrong place.

2. Solving the Tagging Problem That Breaks Data Sources

Most data sources rely on tags to find resources:

data "aws_vpc" "main" {
  tags = { Name = "prod-vpc" }
}

This assumes:

Every resource is tagged correctly
Tags are consistent across environments
Naming conventions are enforced

Reality:

Some resources are missing tags
Tags are inconsistent (prod-vpc vs production-vpc)
Different teams follow different standards
Multi-cloud setups make this worse

Result:

Data source fails
Wrong resource gets picked
Pipelines break

Why This Is Hard to Fix Manually

You can't realistically ensure:

Every resource across AWS, Azure, GCP is tagged properly
Every team follows the same tagging rules
No one creates resources outside Terraform

This becomes a governance problem, not just a Terraform problem.

How Firefly Solves This

Firefly's governance dashboard lets you define and enforce tagging policies across all your infrastructure. Here's how it works in practice

Firefly governance dashboard AWS GCP Azure tagging policy compliance violations

In this example, "Tagging Policies" are selected, showing policies like "AWS DB Instances without any tags", "AWS EC2 Instances without any tags", and "Google Cloud Compute Instances without tags". Each policy displays:

Severity level (LOW, TRACE)
Data source (AWS, Google Cloud, Azure)
Asset types affected
Compliance percentage (100% for passing policies, N/A for policies with no violations)
Violating assets count
AI remediation button for generating fixes

The dashboard shows an overall compliance posture of 100% for tagging policies across AWS, GCP, and Azure resources. Filters let you narrow down by frameworks (CIS, SOC 2, HIPAA), categories (security, cost, tagging), providers, and severity levels.

From the governance layer, Firefly lets you define policies like:

Every resource must have Environment, Owner, Project tags
Tag values must follow a standard format
Missing tags are flagged automatically

The Governance page filtered for "subnet" policies shows security-focused policies:

"VPC Subnet Assigns Public IP" (MEDIUM severity, AWS)
"AWS Subnet Without VPC Flow Logs Enabled" (LOW severity)
"AWS Subnet Associated With NACL That Allows All Ingress" (HIGH severity)
"RDS Associated with Public Subnet" (CRITICAL severity)
"Google Compute Subnetwork with Private Google Access Disabled" (LOW severity)

Each policy shows compliance percentage, violating asset counts, and remediation options. The "RDS Associated with Public Subnet" policy shows 0 violating assets and 100% compliance, meaning all RDS instances are properly placed in private subnets.

This directly solves the data source problem: when you query for a subnet with a data source, Firefly has already validated that:

The subnet exists and is tagged correctly
The subnet is properly configured (private, flow logs enabled, correct NACL)
The subnet belongs to an approved VPC

What this changes:

Before:

Data source depends on tags → tags may or may not exist → failures

After:

Firefly ensures tags exist → data sources become reliable

3. Pre-Deployment Enforcement in CI/CD Pipelines

In a pipeline:

terraform plan → Firefly Guardrails check → approval → apply

Firefly evaluates the plan output with Guardrails, automated policy checks that run before apply. Here's what this looks like for a Terraform run:

A Terraform workflow run shows the Guardrails step blocked with 2 failures:

The left panel shows the workflow stages: Code, Init, Plan, Guardrails (blocked), and Apply
Tags Coverage: 0% (1 taggable resource lacks tags for google_storage_bucket.default)
Policy Violations: 2 High violations, 1 Medium violation for Code phase; 1 High violation for Plan phase
- Code violations: "google-storage-buckets-tags" (google_storage_bucket.default uniform_bucket_level_access should be defined and not null)
- Plan violation: "Compliance check for Google storage buckets" (Google Storage Bucket Level Access Disabled)
Cost Estimation: Was $0/m, Remains $0/m

The Summary panel on the right breaks down:

Tags Coverage: 0% with the specific resource lacking tags
Policy Violations by severity (High: 1, Medium: 1)
The violated resource (Google Cloud Storage Bucket) with the address and the missing tags indicator
Remediate button available for AI-powered fix generation

This shows Guardrails in action: the deployment is blocked because the storage bucket violates tagging policy and security policy. The engineer gets immediate feedback in the PR with specific remediation steps.

Firefly evaluates:

Terraform configuration
Resolved data source values
Target infrastructure

If something violates policy:

Deployment is blocked, or
Flagged for review

Example:

data "aws_ami" "latest" {
  most_recent = true
}

Policy: Only approved AMIs are allowed

Terraform: Fetches AMI

Firefly: Checks compliance → Blocks if needed

4. Continuous Monitoring After Deployment

Data sources don't just affect deployment—they affect future runs.

Firefly continuously evaluates:

Existing resources
Their configuration
Policy compliance

Example scenario:

Subnet was private yesterday
Someone modifies routing today
Firefly detects a violation immediately

Why this matters: Without this, your system depends on external state, but no one is validating that state.

What Firefly Actually Solves

If you're using data sources heavily (which most teams do), you need:

Reliable tagging
Consistent resource metadata
Validation before usage

Because without that, data sources become a source of instability. With Firefly, they become predictable and safe to use at scale.

Best Practices: Making Data Sources Reliable in Production

Data sources become fragile when filters are too broad, ownership is unclear, or you're querying for values that change frequently.

Use Specific Filters, Not Broad Tags

Avoid queries that can match multiple resources:

data "aws_vpc" "main" {
  tags = { Environment = "prod" }
}

This breaks when multiple VPCs match or tagging changes. Use specific identifiers:

data "aws_vpc" "main" {
  id = var.vpc_id
}

Or combine multiple filters:

data "aws_vpc" "main" {
  tags = {
    Name        = "prod-vpc"
    Environment = "prod"
  }
}

Data source queries must return exactly one result, every time.

Pass Module Outputs Instead of Querying

If you control both modules, pass values directly:

module "network" {
  source = "./network"
}

module "app" {
  source  = "./app"
  vpc_id  = module.network.vpc_id
}

This removes runtime lookups, external state dependencies, and filter fragility.

Pin AMI IDs in Production

Don't use most_recent = true for production infrastructure:

data "aws_ami" "latest" {
  most_recent = true
}

Pin AMI IDs, store approved AMIs in SSM Parameter Store, or pass them via variables. Determinism matters more than convenience.

Standardize Tagging Across Teams

Most data sources rely on tags. Without consistent tagging (same keys, same values, enforced across teams), data sources become unreliable and lookups fail randomly.

Use governance tools like Firefly to enforce required tags, validate tag values, and detect inconsistencies before they break pipelines.

Frequently Asked Questions (FAQs)

What is the terraform_data resource in Terraform?

The terraform_data resource is a built-in Terraform resource used for storing values, triggering dependencies, and replacing many null_resource use cases. It does not create real cloud infrastructure but exists only in Terraform state. Teams commonly use it for workflow orchestration, input tracking, and lifecycle-based automation inside Terraform.

‍

What is a data source in Terraform?

A data source in Terraform lets you fetch information about infrastructure that already exists outside your current Terraform configuration. It is read-only and does not create, update, or delete resources. Common examples include looking up existing VPCs, AMIs, IAM roles, or Kubernetes clusters.

‍

What is the difference between a resource and a data source in Terraform?

A Terraform resource creates and manages infrastructure, while a data source only reads information about existing infrastructure. Resources are fully controlled through Terraform state and lifecycle operations like create, update, and destroy. Data sources are mainly used for referencing infrastructure managed elsewhere.

‍

What is the difference between Terraform import and a data source?

Terraform import brings existing infrastructure under Terraform management by adding it to Terraform state. A data source only references existing infrastructure without managing it. Import is used when adopting unmanaged resources, while data sources are used for read-only access and integration.

Featured blog posts

IaC Automation in Action - DIY CI Pipelines without the Pain

The Misconfig Heard Around the World: Why Ops is Always Business Critical

Embracing the Future: Firefly Innovation and the Gartner SRE Hype Cycle 2024

Related case studies

How ZoomInfo Went From Reactive Incidents to Proactive Cloud Resilience With Firefly

How a global healthcare organization automated compliance for a cloud estate with 75% untagged assets

How a celebrity-led brand codified legacy resources, migrated to Terraform, and got disaster-ready

Ready to see Firefly in action?

Discover how Firefly can help you recover your infrastructure from outages and keep your cloud resilient

Chat with us

Play Asset Mutations Racer

Welcome to the Asset Mutations Racer

Your mission: track, manage, and control changes across your entire cloud ecosystem.

An asset mutation occurs when an asset revision is made in your cloud infrastructure. Some are beneficial and lead to a well-controlled cloud, but others are harmful, creating risk and waste.

Use your ↑up and ↓down arrow keys to collect as many beneficial asset mutations as possible.

Avoid harmful asset mutations! Firefly enables rollbacks, but—in this game—you are only allowed 3. When you apply a harmful mutation and are out of rollbacks, your services will be disrupted and it is game over.

Play Drift Defender

Firefly Drift Defender

Score: 0 | High Score: 0

Welcome to Firefly Drift Defender!

Your mission is to prevent drifts in your cloud infrastructure. A drift occurs when the desired state defined in your configuration files doesn't match the actual state of your cloud infrastructure, which can cause deployment issues and security risks.

In this game, you are trying to prevent drift in your Databases, Network, Server, and Storage configurations. When a drift occurs, a resource will catch on fire.

Click on the drifted resource to automatically remediate it, and earn points.

Sadly, your platform engineers are making several manual changes in your cloud consoles, so you'll experience more drifts over time. When you have 5 drifts simultaneously, your services will be disrupted and the game will be over.

Game Over

Your Score: 0

Your High Score: 0

Play Ghosty Cloud

Firefly Ghosty Cloud

score2: 0 | High score2: 0

Welcome to Firefly Ghosty Cloud!

Your mission is to avoid ghosted resources in your cloud infrastructure.

A ghosted resource was once created through Infrastructure as Code (IaC) but has since been deleted or is missing from the actual cloud infrastructure.

In this game, use your spacebar to avoid ghosted resources in your cloud.

The further you go without encountering a ghost resource, the more points you earn for having a reliable and immutable cloud infrastructure.