During a last-minute integration test with a third-party service, an engineer manually opens port 22 (SSH) to a production VM instance by modifying a VPC firewall rule via the Google Cloud Console. The change is made temporarily to allow debugging access. However, after the troubleshooting is completed, the firewall rule is forgotten, leaving the production VM exposed until the next scheduled update.

The modification, although temporary, wasn’t captured by any version control system, CI/CD pipeline, or infrastructure monitoring tool because it was made manually via the Google Cloud Console UI. Days later, the team runs terraform apply, which reverts the firewall configuration to its original state. While this action ensures the VM is secured again, the exposure to port 22 during the intervening period goes unnoticed, raising potential compliance concerns and security risks. The key issue here is that the change made via the Google Cloud Console was outside of the defined Infrastructure as Code (IaC) process, bypassing all tracking mechanisms and leaving the infrastructure out of sync with the intended configuration.

This shows how making changes directly in the Google Cloud Console, or what we call ClickOps, can cause configuration drift. It's something that tends to happen a lot when you’re in the thick of high-pressure debugging. Configuration drift occurs when the actual state of infrastructure diverges from the defined state in IaC, creating a disconnect between the current environment and the version-controlled configurations. These manual changes, if not properly tracked, pose significant challenges in terms of security, compliance, and cost governance. The assumption that the infrastructure is always in sync with the IaC repository is broken, leaving the organization vulnerable to potential misconfigurations that may go unnoticed for extended periods.

To address these challenges, integrating Firefly's Event Center alongside Google Cloud services provides real-time visibility into infrastructure changes. Since Firefly offers several features out of the box, such as automated event tracking, detailed change auditing, and real-time monitoring, teams can easily monitor manual changes and detect configuration drift before it becomes a critical issue, which we will be discussing in more detail throughout this blog.

How ClickOps Introduces Configuration Drift

Before exploring the impact of ClickOps and how it leads to configuration drift, let's clarify what ClickOps actually means.

What is ClickOps?

ClickOps refers to manual infrastructure changes made directly through a cloud provider’s user interface (UI), like the Google Cloud Console, instead of using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. These changes are made to address immediate needs but bypass the automation and version control that IaC tools provide. As a result, they aren't tracked in the same way as changes made through IaC, creating potential gaps in visibility and control.

When engineers manually adjust configurations, such as modifying firewall rules, adding or removing resources, or changing VM settings directly through the UI, these actions aren't captured in version control or automated pipelines. This means the live environment can diverge from the desired state defined in code, creating configuration drift.

ClickOps creates issues like:

  • Audit Challenges: It’s difficult to track and review changes made manually, as they aren't logged in the version-controlled systems.
  • Compliance Risks: Without visibility into these changes, compliance requirements that demand a full audit trail are hard to meet.
  • Operational Disruptions: When IaC tools, such as Terraform, reapply the original configurations, it can cause unexpected behavior or downtime due to manual changes that aren't reflected in the codebase. For example, Terraform could undo a temporary firewall modification, causing downtime or access issues that weren't accounted for in the IaC configuration.

Why Configuration Drift Is a Problem for Infrastructure

Configuration drift occurs when the actual state of infrastructure deviates from the intended state defined in your Infrastructure as Code (IaC) configurations. This typically happens when manual changes are made outside of the IaC pipeline, like through ClickOps, or when modifications are made via APIs or SDKs that aren’t captured in version control. These manual or untracked changes can remain undetected until an automation tool, such as Terraform, is executed, potentially leading to critical failures.

Key Impacts of Configuration Drift:

  • Security Risks:
    Configuration drift can introduce security vulnerabilities that go unnoticed until a breach occurs. For instance, when a network engineer opens a port (e.g., port 22 for SSH) on a production firewall for debugging purposes via the Google Cloud Console, the change is not captured by IaC tools like Terraform. If this modification is left untracked, it becomes a potential attack vector. 

Terraform may later reapply the original state (removing the open port), but the window of exposure could have allowed external access, leading to a potential security breach. Without visibility into these types of manual changes, security teams cannot ensure that all configurations are aligned with security best practices.

  • Increased Costs:
    Configuration drift can increase operational costs. For example, if a DevOps engineer manually scales up a Kubernetes cluster in Google Cloud to handle a traffic spike but forgets to scale it down, the excess resources continue to incur costs. 

Since this drift isn’t tracked by IaC tools like Terraform, it leads to inefficiencies and inflated cloud bills, especially in environments where scaling should be automated and version-controlled.

  • Compliance Issues:
    Compliance frameworks, such as SOC 2 or ISO 27001, require that all infrastructure changes be documented and auditable. Configuration drift undermines this by allowing manual changes outside of IaC systems, which aren’t logged in version control or CI/CD pipelines. 

For example, modifying access control settings directly in the cloud console won't be reflected in the IaC repo or automated tools. During audits, this gap in the audit trail can lead to compliance violations and expose the organization to non-compliance penalties.

Drift Prevention in Google Cloud Console Requires More Than Just the Console

So, when it comes to preventing configuration drift in Google Cloud, it goes beyond just monitoring changes made through the Google Cloud Console. Tools like Cloud Audit Logs and Cloud Monitoring provide valuable insights, but they don't give you the full context needed to detect and manage drift effectively.

Cloud Audit Logs

These Audit Logs capture administrative activities within your Google Cloud resources, detailing who did what, where, and when. However, while they track the event, they fall short in providing the detailed information needed to detect drift:

  • Limited Visibility into Configuration Changes: Cloud Audit Logs capture API calls that modify configurations, but don’t give you the full picture. For example, if a storage bucket’s access permissions are changed manually through the Google Cloud Console to allow a temporary external connection, the audit log will show that a change was made, but it won’t tell you what exactly changed in terms of configuration. As a result, you can’t easily assess the full impact of the change on your system's security posture.

  • Data Access Logs: By default, Cloud Audit Logs capture Admin Activity logs, but Data Access logs, which show read and write operations on user data, are disabled unless explicitly enabled. Without these logs, you miss the complete set of actions that affect your data, which can leave gaps in your audit trails.

Cloud Monitoring

Cloud Monitoring gives you insight into the performance and health of your cloud resources, tracking metrics like CPU usage, memory, and latency. But it doesn’t directly address configuration drift:

  • Lack of Configuration Context: While Cloud Monitoring can alert you to performance issues, such as a sudden increase in CPU usage, it won’t tell you if that increase is due to an untracked manual configuration change, like resizing a VM or adjusting instance types. Cloud Monitoring only looks at the operational state, not the configuration changes themselves, which leaves out important context for drift detection.

  • Operational Focus: Cloud Monitoring is essential for spotting performance problems, but its scope is limited to metrics that indicate how resources are being utilized. It doesn’t provide the visibility needed to compare your infrastructure against the desired configuration, which is critical for detecting drift.

Cloud Audit Logs and Cloud Monitoring are important, but they don’t provide the level of detail needed to proactively manage configuration drift. That’s where Firefly's Event Center comes in. By tracking infrastructure changes in real-time, Firefly offers a complete view of both manual IaC changes.

Tracking Changes in Google Cloud Console with Firefly's Event Center

While the above console features offer visibility into your Google Cloud activity, they often require manual setup and don’t show the full picture of what changed. Firefly's Event Center builds on top of these logs to surface infrastructure changes more clearly, grouping events, showing who made the change, and highlighting both manual (ClickOps) and automated updates. This helps teams spot drift and trace changes without digging through raw logs.

Here’s how Firefly’s Event Center processes changes, from manual (ClickOps) updates to real-time visibility and resolution:

Event Center processes changes, from manual (ClickOps) updates to real-time visibility and resolution

With Firefly's Event Center, you can:

  • Chronological Event Tracking
    View a timeline of all changes made to your Google Cloud resources, whether initiated by Terraform, the Google Cloud Console, SDKs, or APIs. This comprehensive timeline helps teams understand the sequence of events and identify manual changes that might have caused drift.

  • Detailed Metadata
    Access full context for each event, including:

    • Event ID: For traceability and deeper investigation.

    • User: Who made the change, ensuring accountability?

    • Source IP: Provides insight into the origin of the change.

    • Before/After State: Shows the full impact of the change, helping teams understand what exactly was modified.

  • Filtering Capabilities
    Narrow down events by:

    • Asset Type: For example, filter to see all storage bucket changes or firewall modifications.

    • User: Track actions taken by specific users or service accounts.

    • Region and Timeframe: Isolate events based on location or date range, making it easier to spot specific changes over time.

Viewing a ClickOps Event in Firefly's Event Center

Let's take a look at an example where Firefly’s Event Center captures a ClickOps event, specifically, the creation of a Google Cloud Storage bucket. This event shows how Firefly logs manual changes and gives full visibility into the impact of those changes.

Event: Storage Bucket Creation

  • Event Name: storage.buckets.create
  • Timestamp: 2025-06-13T07:47:11.440747413Z
  • Region: us
  • Service: storage.googleapis.com
  • Asset Name: web-logs-storage
  • Owner: codetocloud.dev@gmail.com

This event captures the creation of the web-logs-storage bucket in the US region.

Detailed Event Metadata:

  • Event ID: 651539618d05bbd925bdb0ed-684a7b11b5d201fa54b4d9a7-google_storage_bucket-us-1wk303feb82lv
  • Source IP: 182.69.177.170
  • Request Parameters: The event includes ACL settings for the bucket, detailing which roles were assigned to various members, such as roles/storage.legacyBucketOwner and roles/storage.legacyObjectOwner.

Raw Event Data:

FIrefly Event Centre

The raw event data includes:

  • Method Name: storage.buckets.create
  • Policy Changes: The event shows binding changes, including the assignment of roles like projectEditor:sound-habitat-462410-m4.
  • Caller User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

The above snapshot from Firefly’s Event Center shows how the event is logged in real-time. The metadata includes the event ID, the user responsible for the change, the service affected, and the full request details.

Firefly doesn't just show the event; instead, it also links it to the affected asset’s IaC status in the Inventory. In this case, the web-logs-storage bucket appears in the Inventory, marked as unmanaged, since it was created outside of Terraform. This gives immediate context about how the asset is managed and whether it aligns with your IaC-defined infrastructure. The below visualizes real-time data for ClickOps events, IaC coverage, and asset management. In the dashboard, you can see key metrics such as ClickOps Detection, the number of events logged, and their associated impact on your environment, such as drift and cost leakage.

 real-time data for ClickOps events

Best Practices for ClickOps and Drift in Google Cloud Console

To effectively manage ClickOps and prevent configuration drift in Google Cloud, it’s essential to follow best practices that combine proactive monitoring, effective change management, and continuous visibility into infrastructure. These practices ensure that manual changes are tracked and that the environment remains consistent with the desired configurations defined by Infrastructure as Code (IaC).

Enable Drift Prevention

To prevent drift caused by manual changes, enabling drift prevention is crucial. This can be achieved by using Google Cloud's Config Sync and setting up the appropriate validation mechanisms.

  • Activate the Config Sync Admission Webhook: The admission webhook ensures that any modification made manually via the Google Cloud Console is validated against your defined IaC configuration before it is allowed. This step blocks any change that deviates from your source of truth.
  • Set spec.configSync.preventDrift to true: This configuration ensures that your environment is always synced with the Git repository that holds your desired configurations. Any manual change outside of IaC will be flagged.
  • Apply the Configuration Using gcloud or kubectl: Once the configuration is set, use gcloud or kubectl to apply the drift prevention settings to your Google Cloud resources. This will ensure that future changes are continuously monitored.

Monitor and Audit Configurations

Preventing changes is one part of the solution, while monitoring what actually happens in your environment is just as important, so:

  • Use the nomos CLI Tool: The nomos tool allows you to check the sync status of your resources and ensure they match your repository’s configuration. Running this tool periodically helps catch drift before it causes operational issues.
  • Set Up Cloud Monitoring and Cloud Logging: By integrating Firefly’s Event Center with Cloud Monitoring and Cloud Logging, you gain enhanced visibility into the health of your resources. This enables you to capture events in real-time and track configuration changes more effectively.
  • Configure Alerts for Drift and Reconciliation Failures: Set up alerts to notify your team whenever a change is made outside of the IaC process or when a resource deviates from the desired state. Alerts can be based on specific asset types, users, or regions, making it easier to catch unauthorized changes.

Implement Periodic Re-Sync

Now moving to Periodic re-sync, which is another important practice to make sure your cloud resources remain in sync with the desired state, even if manual changes are made:

  • Enable Periodic Re-Sync: This feature automatically re-applies the correct configuration from your IaC whenever changes are detected, ensuring your infrastructure always stays aligned with the source of truth.
  • Monitor the root-reconciler Deployment: The root-reconciler continuously monitors the state of your Google Cloud resources. Ensuring that this is running properly is crucial for maintaining alignment between your live infrastructure and your Git-based IaC configurations.

FAQs

Which is Better, Cloudops or Devops?

DevOps focuses on faster software delivery through CI/CD pipelines, while CloudOps manages and optimizes cloud infrastructure. Both are important but serve different purposes. DevOps for application delivery and CloudOps for cloud infrastructure management. They often complement each other in modern workflows.

What is OIDC in GCP?

OIDC (OpenID Connect) is an authentication protocol built on OAuth 2.0, used in Google Cloud for secure user and application authentication. It allows seamless sign-ins using Google accounts or other OIDC providers, enabling secure service access and service-to-service authentication.

What is the GCP AI Platform?

Google Cloud’s AI Platform is a managed service on Google Cloud for training, deploying, and managing machine learning models. It supports custom training with TensorFlow, PyTorch, or scikit-learn, and lets you serve models using scalable endpoints without managing infrastructure.

What is the Difference Between Skew Detection and Drift Detection?

Skew detection compares training data with evaluation or test data to find mismatches before deployment. Drift detection looks at live (production) data over time to catch changes in input or prediction patterns that might impact model performance.