There's a moment in almost every platform engineering call that plays out the same way.
It happens around the fifteen-minute mark. After the small talk and the agenda and the first demo, when a platform engineer gets quiet and then says some version of the same sentence.
"Look, we have Terraform, but..."
What follows is an honest, slightly tired admission.
- A big chunk of their cloud is not actually codified.
- The code they do have has drifted.
- Nobody is quite sure what would happen if they had to rebuild a region from scratch, and nobody wants to be the one to test it.
This is not a knock on platform teams. The problem is that for years, the industry sold them on "adopt IaC" as the destination. Nobody prepared them for what comes next, when IaC becomes the thing you need to trust with your company's survival.
That gap is what Firefly was built to close. Here are four conversations we keep having, and what actually changes when the tooling catches up.
Admission #1: "We know we have cloud drift. We just don't know how much."
This one comes up before anything else.
Here’s what happens. A team adopts Terraform and coverage numbers go up. Everyone feels good. Then, slowly, reality starts diverging from code. Think: when an engineer fixes something in the console during an incident and forgets to bring it back into IaC. Or when a contractor spins up a test environment that never gets cleaned up.
All of it compounds.
The traditional approach:
Run terraform plan across every workspace, triage the output, file Jira tickets, and hope someone works through them. (Repeat next quarter). Meanwhile, the resources that were never in IaC to begin with do not even show up in the plan. You cannot detect drift on code that does not exist.
What Firefly does:
Scans your entire cloud footprint, not just what is in your state files. It shows you three buckets: codified, drifted, and ungoverned. It generates IaC for ungoverned resources and remediation PRs for drifted ones. That means you stop guessing at your coverage number and start seeing it.
The first time a team sees their real coverage percentage in Firefly, there is usually a long pause. It is almost never what they told leadership it was.
Admission #2: "Our DR plan is a Confluence page nobody has tested since 2023."
The AWS outage last October made this one public, even though every platform team already knew it privately.
Backups protect data, but they do not protect infrastructure. If your primary region goes down, you do not need your database snapshots. You need the VPC, the subnets, the route tables, the security groups, the IAM roles, the load balancers, the DNS records, the Kubernetes clusters, and the hundred other things that make the database reachable in the first place.
Most teams have a runbook for this. Very few have tested it end-to-end in the last year. Fewer are confident it would work if they needed it to.
The traditional approach:
Maintain a manual runbook, run a tabletop exercise once a year, and hope for the best.
What Firefly does:
Treats recovery as code. The same IaC that defines your environment becomes the instrument that rebuilds it. Automatic, cross-region, from a last-known-good state. The agents know your dependencies because they have been watching them the whole time.
DR tooling has been stuck for a decade. Firefly is the first serious attempt to make recovery a first-class, automated workflow instead of a document no one reads.
Admission #3: "Security wants forty new policies enforced by the end of the quarter."
Every platform team is also, on some level, a compliance team.
The friction usually looks like this: security defines a policy (no public S3 buckets, all databases encrypted at rest, no IAM users without MFA), and the platform is supposed to enforce it. But the policy engine only sees what is in IaC, and a meaningful slice of the cloud is not in IaC, so violations slip through.
…Then the audit shows up and everyone is exporting CSVs from three different cloud consoles at 11 PM.
The traditional approach:
Write Rego policies for OPA, wire them into your IaC pipeline, and hope the things outside your pipeline are fine. Build separate scripts for uncodified resources. Reconcile findings across tools by hand.
What Firefly does:
Applies one set of policies across both IaC and live cloud resources. Violations surface in a single dashboard regardless of where the resource came from. Remediation is automated where it is safe and guided where it is not. When the auditor shows up, you open one dashboard.
For teams under FedRAMP, CMMC, DORA, SOC 2, or ISO, this matters. The teams that survive audits cleanly are the ones who can produce evidence on demand. Firefly makes that evidence continuous instead of quarterly.
“We have six IaC tools and no single source of truth.”
Admission #4: “We have six IaC tools and no single source of truth.”
Multi-cloud is a fact of life. So is multi-IaC.
Teams that started on CloudFormation migrated some workloads to Terraform. Tried Pulumi for a side project. Kept a few Ansible playbooks around because nobody wanted to touch them.
Standardization is the dream.
The traditional approach:
Write a platform engineering standards doc, watch it get ignored, mandate reviews, become a bottleneck, give up, and spend a quarter every year trying to produce a unified inventory for leadership.
What Firefly does:
Gives you a single inventory across all clouds, all IaC tools, and all teams. It surfaces the patterns already in use so you can codify them as reusable modules. Self-service provisioning with guardrails means teams get speed and the platform team gets consistency. Nobody has to be the bad guy.
This use case gets underrated. Teams focus on the recovery features, which are real and important. But the day-to-day value is simply having one place where you can see your entire cloud honestly.
What changes when you see the whole picture
The thread across all four conversations is the same. Platform teams are not short on talent or tools, but on a trustworthy picture of reality.
Drift, ungoverned resources, untested runbooks, scattered policies, multi-tool sprawl: all of it is the same underlying problem that looks slightly different on the outside.
Firefly's pitch is not "adopt IaC."
The teams we talk to already did that years ago. The pitch is that IaC is only as valuable as your ability to trust it. And trust requires three things: continuous discovery, so you know what is really out there; continuous governance, so you know it stays right; and continuous recovery, so you know you can rebuild when you need to.
That is what Thinkerbell AI and the rest of the platform are built to deliver. It is also why Gartner recognized Firefly in the 2026 Market Guide for AI SRE Tooling, and why Cloud Resilience Posture Management is emerging as a distinct category. The industry is catching up to what platform engineers have been worrying about for years.
A note on the voice behind this post
Before joining Firefly, I spent most of my career close to platform and infrastructure teams. A few years at a major developer platform working with federal and enterprise customers on IaC adoption and compliance. Several years before that in federal consulting, supporting agencies where "rebuild the environment" was not a hypothetical. A decade before that running an infrastructure services company, where I was on the receiving end of every 3 AM page personally.
The reason I joined Firefly is that the fifteen-minute admission at the top of this post is one I used to make myself, just on the other side of the table. The tooling to solve it did not exist then. It does now.
If any of these scenarios hit close to home, schedule a demo.
