Earlier this month, in just nine seconds, an AI agent deleted a production database, and then wrote a confession explaining exactly which safety rules it had broken. The token was real and the permissions were real, but no human asked for any of it. It happened anyway. That’s today’s reality for cloud practitioners. (PocketOS founder Jer Crane published a public post-mortem about it you can read here).

Now, let’s be clear. The ClickOps problem isn't solved. But it has been lapped. 

While platform teams were restricting console access and writing drift detection policies, agents became a third class of cloud operator: one that doesn't wait for a PR, doesn't tag its resources, and doesn't show up in CloudTrail with a name you recognize. (The 2 a.m. rogue engineer that caused your ClickOps issues is still out there, but now they've got company.)

Agents didn’t invent this problem, of course. But they certainly turned up the volume on it. 

For platform teams wondering what they need to do to keep their cloud estate observable and governable (and stay sane while it happens), here’s a look at what matters in the move from ClickOps to AgentOps as cloud governance’s biggest looming threat.

Before Agents: ClickOps and the Unmanaged Sprawl Problem

For the better part of a decade, platform teams have been wrestling with two related issues. 

  1. The first is unmanaged resources, meaning any cloud asset that exists in your account but isn’t represented in your infrastructure-as-code. Across the hundreds of cloud environments we see, the share of resources sitting outside IaC routinely lands somewhere between 30% and 95%.

  2. The second is ClickOps, the practice of making changes through the console (or an interactive CLI) instead of through versioned, reviewed code. ClickOps persists because it’s faster in the moment. The cost shows up later: drift between code and reality, institutional knowledge trapped in whoever made the change, and an environment that no one fully understands.

Together they break the audit story (you can’t prove a resource is compliant if you can’t prove it’s intentional), the change story (the next terraform apply either ignores the drift or reconciles it destructively), and the cost story (the resources nobody owns are usually the resources nobody turns off). 

The bar platform teams should hold themselves to is simple: every resource should have a known source of truth, in code or on an explicit exception list with an owner and an expiration date. Anything else is technical debt accruing interest.

Why AgentOps is the New Frontier

The standard playbook of restricting console access, requiring IaC for production, running drift detection, and codifying what slipped through actually worked. Or rather, it worked until the operator stopped being human.

Here’s what’s different now. 

An AI agent acting on your cloud is not a person and is not a pipeline. It’s a third category, and our governance models weren’t built for it.

This is no longer a future-tense conversation. The hyperscalers themselves are now shipping agentic capabilities natively. 

  • AWS DevOps Agent, generally available since early 2026, is positioned as an autonomous SRE that responds to a CloudWatch alarm, PagerDuty alert, or ServiceNow ticket and begins investigating without human prompting. 
  • AWS reports it can detect and diagnose a production incident in around four minutes and post a complete root cause analysis with mitigation recommendations to Slack, including suggesting a rollback or capacity change. Notably, it now reaches into Azure and on-prem environments through the Model Context Protocol. 
  • On the Microsoft side, the Azure MCP Server exposes 276 MCP tools across 57 Azure services, letting any MCP-aware agent provision storage, query databases, and operate resources in natural language under Entra ID and Azure RBAC. 

When the cloud provider itself is shipping the agent, you can no longer treat AgentOps as someone else’s problem.

Lessons Learned: Governance Enforcement with AgentOps in 2026

Two structural lessons sit at the heart of this post: a token scoped to a principal is not the same as a token scoped to an operation, and a safety rule that lives only in a system prompt is advisory, not enforcing.

Three properties of agent-driven changes make AgentOps a distinct discipline.

  • The first is velocity. Agents don’t wait. A team that previously made ten infrastructure changes a week can, with agents in the loop, make hundreds. Drift accumulates faster than weekly scans can detect it.
  • The second is opacity. A human ClickOps change is at least attributable to a human. An agent change, depending on how the agent is wired, may show up in CloudTrail as a service principal, a shared role, or a generic CI identity. “Who did this and why?” becomes a much harder question to answer.
  • The third is non-determinism. The same prompt to the same agent on two different days can produce two slightly different Terraform modules. Even when agents work through IaC, the IaC itself is now a moving target.

Visibility is the precondition for managing any of this. You cannot govern what you cannot see, and the half-life of “I know what’s in my cloud” is now measured in hours, not weeks. The platform teams that are getting this right have made a deliberate shift. Instead of asking “what did my pipelines deploy?”, they ask “what changed in my cloud, by whom or what, and is it represented in code?” The aperture has to widen to include every actor: human, pipeline, and agent.

A Look at Governance and Tracking in Practice

A workable AgentOps governance model doesn’t try to slow agents down. It tries to make their work legible. A few practices that hold up well in production:

  • Make the IaC pipeline the agent’s default path. When an agent needs to change the cloud, it should go through the same code-and-pipeline path your humans use: open a pull request, pass your CI checks, deploy through your pipeline, and leave a reviewable artifact behind. This is the only way to keep gating, policy enforcement, peer review, and rollback intact. Direct API calls from an agent should be the rare exception, not the default workflow. The PocketOS incident is what happens when that line is reversed.
  • Give every agent a distinct identity. Don’t let agents share a generic CI role. A separate IAM principal per agent (or per agent capability) makes CloudTrail and audit logs actually useful. When something odd shows up, you want to know which agent did it, on whose behalf. This applies equally to provider-native agents like AWS DevOps Agent and to internal copilots wired into the Azure MCP Server.
  • Treat the cloud as the source of truth, then reconcile. Continuously inventory what’s actually running, classify each resource by whether it’s covered by IaC, and surface the delta. This is the inversion of the traditional IaC-first mental model, and it’s the only one that survives contact with high-velocity agent activity.
  • Codify aggressively, by default. When an agent (or human) creates an unmanaged resource, the right default is not “leave it” or “delete it.” It’s “generate the IaC for it and open a PR.” Codification turns ad-hoc changes into reviewable artifacts and pulls them back into the governed surface area.
  • Apply policy at the change boundary, not the deploy boundary. Tools like OPA and your provider’s native policy engines should be evaluating what an agent is about to do, not just what your pipeline is about to deploy. Agents that bypass pipelines bypass any policy that lives only in pipelines.
  • Get unified visibility across IaC, ClickOps, and AgentOps. For any window of time, you should be able to answer what changed in your cloud, who or what changed it, and which path it took: a Terraform run, a console click, or an agent’s API call. If those three streams aren’t visible side by side, you can’t tell whether agents are reinforcing your governed paths or quietly replacing them. Treat “changes by agent X over the last 24 hours” the way you treat error rates or deploy frequency. If an agent’s blast radius starts to grow, you want to know before the bill or the incident does.

You might realize upon scanning this list that it’s the same governance discipline platform teams have been building for years, but with one important update. The actor list now includes software that thinks.

Why Firefly For Well-Governed AgentOps?

The AgentOps shift is the defining cloud governance challenge of the next several years. 

Firefly is the solution that continuously inventories every resource across your clouds and Kubernetes clusters, tells you exactly what is and isn’t represented in IaC, codifies unmanaged resources into Terraform, Pulumi, or CloudFormation with one click, and detects drift in near real-time. Plus: 

  • Firefly Event Center gives you a single timeline of every change to your cloud, attributed by source, so ClickOps and AgentOps activity show up next to your IaC runs instead of disappearing into CloudTrail. 
  • And the Firefly MCP server lets agents provision and modify cloud resources through your governed IaC path, with policy and review intact, instead of calling provider APIs directly. The pattern we want to make easy is the one that should have been easy in the PocketOS story: agents that build through the pipeline, not around it.

If you’re starting to see agents touch your cloud and you want a clear picture of your estate before the velocity goes up another order of magnitude, start a free trial of Firefly. Connect a single account, and within minutes you’ll see your full inventory, your codification gaps, and your drift across humans, pipelines, and agents alike.

The cloud is about to get a lot more crowded. Make sure you can still see it clearly, so you can manage and automate it cleanly.