Why do kubectl-based Kubernetes deployments break down at scale?

In early-stage environments —typically a single development cluster managed by one or two engineers— it’s common to deploy workloads using kubectl apply -f against local manifest files. These YAML definitions might live in a Git repository, but just as often, they’re pulled from old commits, edited manually, or passed around in Slack threads. There’s no real deployment pipeline, no approval process, and no enforcement of consistency. Engineers apply changes directly to the cluster with minimal oversight.

This works when you're managing a single service in one environment. But once your platform grows and you’re juggling dev, staging, and production, things start to get messy. Say you have 15 manifest files for a basic service—now multiply that across three environments. That’s 45 YAML files to track, keep in sync, and apply manually. We've all been there in the early days—duplicating configs, patching over inconsistencies, and hoping nothing breaks in prod. The more your stack grows, the more brittle this approach becomes.

Inconsistent Environments Lead to Fragile Releases

Changes applied directly to one environment don’t always make it to others. You might patch a config in staging and forget to update production. Or a teammate applies a fix in QA but never commits it back to Git. These inconsistencies create hidden differences between clusters that only surface during incident response.

There’s No Reliable Audit Trail

Without Git-based deployments, you lose the ability to trace what was deployed, who deployed it, and why. You can’t diff two environments, and you can’t revert confidently because there’s no record of the previous state. All you have is kubectl history - if that.

Rollbacks Are Manual, Slow, and Risky

In a traditional setup without Git-backed deployments, rollbacks are ad hoc and error-prone. There’s no versioned trace of what was applied and when. If a production deployment introduces issues, the typical fallback is to manually re-apply a previous version of the manifest using kubectl apply -f. But which version? From where? Unless you've maintained strict discipline around tagging and archiving YAML files, you're relying on memory, Slack threads, or stale Git branches.

This approach is not only time-consuming but fundamentally unreliable. Even if you find an old manifest, there’s no guarantee it matches the actual state that was previously running—especially if there were hotfixes or manual patches applied directly in the cluster.

A reliable rollback strategy needs to be declarative, version-controlled, and automated. You should be able to revert to a known good Git commit, trigger a sync, and trust that the cluster state will be restored exactly as defined. Without that foundation, you're left troubleshooting production issues with a blindfold on.

CI/CD Pipelines Become a Single Point of Failure

Many teams rely on their CI system to trigger deployments, which couples build and deploy tightly. If the pipeline breaks, nothing goes out. If someone disables a stage or forgets to update a job, the cluster drifts from Git - and no one knows.

Infrastructure Drift

When changes are applied directly to the cluster without being tracked in Git, the cluster starts drifting from the declared infrastructure. And without any automated drift detection, those differences remains unnoticed - until any deployment fails.

This is where Argo CD and GitOps workflows start to prove their value: bringing structure, visibility, and repeatability to Kubernetes deployments as your setup scales.

What Problems Does Argo CD Actually Solve in Production?

The limitations of manual deployment workflows become increasingly apparent as your Kubernetes architecture expands. Argo CD addresses these challenges directly by enforcing a Git-centric deployment model, ensuring that what runs in the cluster is always backed by version-controlled infrastructure.

Maintains Declarative Synchronisation Between Git and Cluster

Argo CD treats Git as the single source of truth for Kubernetes workloads. Every configuration - Deployments, Services, Ingress, RBAC - is stored in a Git repository. Argo CD continuously compares this declared state to the live state of the cluster. If discrepancies are found, it either alerts the team or automatically reconciles the difference, depending on the configured sync strategy. This eliminates the risk of environment drift.

Detects and Highlights Drift in Real Time

Changes made directly to the cluster using kubectl or other tools are immediately detected. Argo CD marks the affected application as "OutOfSync" and provides a diff view to inspect the changes. This gives teams immediate visibility into untracked modifications and ensures that clusters do not silently deviate from intended configurations.

Enables Predictable and Controlled Rollbacks

Because every deployment is driven from Git, reverting to a known-good state is straightforward. You simply revert the commit or redeploy a tagged version from the repository. There's no need to reconstruct YAML files or guess what the last working state looked like - it's all versioned and recoverable.

Decouples Build and Deployment Pipelines

Argo CD separates application delivery from application deployment. CI pipelines can focus solely on building, testing, and publishing artifacts. Deployment decisions - when and how to promote changes - are handled by Argo CD based on Git state. This separation reduces coupling and increases deployment stability.

Provides Operational Visibility and Control

The Argo CD UI and CLI give engineering teams a real-time view of what is deployed, where, and in what state. You can inspect health status, sync history, and differences across environments without logging into the cluster. This visibility is essential for teams operating across multiple environments or regions.

Scales to Multi-App, Multi-Team Workflows

With Projects, Argo CD allows you to define access boundaries between teams, clusters, and Git repositories. This structure ensures that different teams can deploy independently while still adhering to global governance policies. It's particularly valuable in larger organisations where application isolation and role-based access control are essential.

Deploy to Kubernetes Faster with GitOps Argo CD Automation

Now, we’ve covered why traditional deployment strategies start to fail at a scale level and how Argo CD helps bring predictability and structure. Now let’s walk through a practical implementation. We'll provision an EKS cluster using Terraform, set up Argo CD, and deploy a Helm-based application using a GitOps workflow.

Provisioning EKS with Terraform

We start by creating a new EKS cluster using Terraform. The configuration uses the terraform-aws-modules/eks/aws module for setting up the control plane and managed node groups. Here's a simplified version of the Terraform code we used:

provider "aws" { region = "us-east-1" } module "vpc" { source = "terraform-aws-modules/vpc/aws" name = "firefly-vpc" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] public_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] private_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] enable_nat_gateway = true single_nat_gateway = true } module "eks" { source = "terraform-aws-modules/eks/aws" cluster_name = "firefly-eks" cluster_version = "1.27" subnets = module.vpc.private_subnets vpc_id = module.vpc.vpc_id node_groups = { default = { desired_capacity = 2 max_capacity = 3 min_capacity = 1 instance_types = ["t3.medium"] } } }

Once this code is applied using terraform init && terraform apply, the EKS cluster is provisioned. The output confirms the cluster name, endpoint, and node group status.

Configuring Kubectl Access

With the EKS cluster in place, we configure our local kubeconfig so we can interact with the new cluster using kubectl:

aws eks update-kubeconfig --region us-east-1 --name firefly-eks

We then validate that the node group is active and ready:

kubectl get nodes

Installing Argo CD on the Cluster

Next, we install Argo CD in a dedicated namespace using Helm. This gives us greater control over configuration and easier future upgrades.

kubectl create namespace argocd helm repo add argo https://argoproj.github.io/argo-helm helm repo update helm install argocd argo/argo-cd -n argocd

By default, Argo CD exposes a LoadBalancer service. For this to work on AWS, our public subnets must be tagged appropriately so the AWS Load Balancer Controller can allocate an external IP:

aws ec2 create-tags \ --resources subnet-0cd021006b24fde2a subnet-05630f0f88b012a61 subnet-0c6c5cc7db4c2d313 \ --tags Key=kubernetes.io/cluster/firefly-eks,Value=shared \ Key=kubernetes.io/role/elb,Value=1 \ --region us-east-1

We then annotate the Argo CD service to explicitly request an NLB instead of the default classic load balancer:

kubectl annotate svc argocd-server -n argocd \ service.beta.kubernetes.io/aws-load-balancer-type=nlb --overwrite

To verify if the LoadBalancer is assigned an external IP:

kubectl get svc -n argocd argocd-server

Logging into Argo CD

Once the external IP is available, we log in to the Argo CD UI. To do that, we retrieve the initial admin password:

kubectl -n argocd get secret argocd-initial-admin-secret \ -o jsonpath="{.data.password}" | base64 -d && echo

We then visit the ELB hostname in the browser, enter admin as the username, and use the retrieved password to log in.

Deploying an Application Using a Helm Chart via GitOps

With Argo CD installed and operational, we move to the core of the GitOps workflow - deploying a real application using Helm charts as the packaging format.

From the Argo CD UI, we create a new Application resource that points to a public GitHub repository containing several Helm-based workloads. In this case, we’re using the helm-guestbook example, which packages a simple Kubernetes application using Helm.

We use the following values:

  • Repo URL: https://github.com/argoproj/argocd-example-apps.git
  • Path: helm-guestbook
  • Revision: HEAD
  • Cluster: https://kubernetes.default.svc
  • Namespace: default

We specify the Git repository, the path to the Helm chart within that repo, and the target Kubernetes namespace. Argo CD then takes over: it renders the Helm chart, compares the rendered manifests against what’s running in the cluster, and shows a visual diff if they’re not in sync.

Once created, the application will show as OutOfSync. That means Argo CD sees a difference between the declared Git state and what’s actually deployed. We click “Sync” to apply the Helm chart.

After syncing, the app transitions to Synced and Healthy. The topology view shows the relationship between the Service, Deployment, ReplicaSet, and Pod.

Now the entire deployment lifecycle - from cluster provisioning to application delivery - is fully automated and version-controlled. This setup is reproducible across environments and gives you complete visibility into what’s running in production.

Below that, Argo CD’s application topology view provides a real-time visual map of the deployed resources. Each component - Service, ReplicaSet, Deployment, Pod - is laid out clearly, with health and sync statuses shown inline.

Once the application is created and synced, Argo CD gives a complete breakdown of the deployment status directly in the UI. You can view metadata like revision number, sync status, who initiated the action, and timestamps for every stage of the operation.

This architecture makes sure that every deployment is traceable to a Git commit, every cluster state is continuously validated, and rollbacks are deterministic. You're no longer just deploying to Kubernetes - you’re operating it with a structured way.

What Are the Best Practices for Running Argo CD in Production?

After seeing how Argo CD can drive GitOps workflows end to end, the next step is to run it reliably in production. That’s where operational hygiene, security, and observability practices come into play.

Let’s break down some of the most effective strategies teams use to scale Argo CD in production setups:

Use auto-sync in non-prod, manual sync in prod

In lower environments like dev or QA, always enable auto-sync. This allows every Git push to be reflected in the cluster immediately, which is useful for testing workflows and CI triggers. In production, disable auto-sync and rely on manual syncs gated by approvals. This gives you control over what gets deployed and when.

Define custom health checks and lifecycle hooks

Define custom health checks and lifecycle hooks - especially if you run workloads that Argo CD doesn’t understand out of the box. This lets you gate rollout progress on actual app readiness, not just resource status.

Export metrics to Prometheus and create Grafana dashboards

For visibility, export Argo CD metrics to Prometheus and build dashboards in Grafana. Monitor sync durations, status history, and reconciliation errors. This data is critical for debugging and alerting during incidents.

Restrict direct cluster access

Restrict direct access to production clusters. Argo CD should be the only system applying manifests. If engineers need kubectl access, it should go through a controlled access layer - ideally read-only unless escalated.

Use Git as the source of truth

Always treat Git as the record of truth. All config changes should go through PRs. Avoid making changes directly in the Argo CD UI. Even a quick manual patch can create drift and wipe out during the next sync.

Enable pruning for orphaned resources

Finally, schedule periodic pruning. Argo CD can delete orphaned resources - those that exist in the cluster but are no longer declared in Git. This prevents stale workloads from lingering and ensures the cluster stays clean over time.

How Can Firefly Help You Catch What Argo CD Misses?

Even with Argo CD in place, your visibility is limited to resources that are explicitly defined and tracked in your Git repositories. It has no awareness of infrastructure created manually through the AWS console, provisioned via ad-hoc CLI scripts, or spun up during on-call incidents and never codified afterward. Legacy infrastructure - like EC2 instances, S3 buckets, IAM roles, and security groups - often predates any GitOps workflow. These resources remain unmanaged, untracked, and outside of your version control system.

On top of that, changes made directly to Kubernetes clusters using tools like kubectl patch, kubectl edit, Lens, or even Terraform applied manually from someone’s laptop can all introduce configuration drift. Argo CD is not designed to detect or manage these changes - it only reconciles the desired state defined in Git with what’s running in the cluster. If a resource was never in Git to begin with, Argo CD won’t even know it exists.

That’s where Firefly fits in.

Firefly continuously scans your cloud accounts and Kubernetes clusters to discover unmanaged infrastructure. It compares what’s running in your environments against what’s defined in IaC and flags anything that’s been created outside your pipelines.

This image shows a live EC2 instance tagged with cluster metadata but flagged as unmanaged by Firefly because it wasn’t created through Terraform or included in version control.

Firefly also detects configuration drift in managed resources. For example, if a security group is defined in Terraform but someone modifies it through the AWS console, Firefly alerts you that the current state has diverged from the declared state.

Even better, Firefly can codify these unmanaged resources into Terraform configuration blocks using its Codification feature.

In this example, a ClusterRoleBinding not tracked in Git is automatically converted into a Terraform or YAML configuration, making it easy to bring the resource under management.

This codification process supports Terraform, Pulumi, Helm, and Kubernetes manifests - so you can choose your preferred tooling and bring everything into version control.

Finally, Firefly helps teams enforce governance policies. It lets you define policies for infrastructure hygiene - like disallowing public S3 buckets or ensuring all VPCs have flow logs enabled - and scans your environment continuously.

This dashboard shows misconfigurations across multiple AWS accounts with options for AI remediation, alerts, and policy enforcement.

Together, Firefly and Argo CD form a complete GitOps workflow:

  • Argo CD enforces the declared state for your Kubernetes workloads.
  • Firefly discovers and codifies what’s missing from that declared state, including cloud-native resources.

This pairing gives you a full map of your operational footprint - both what's declared and what's not — so you can close the gaps, prevent drift, and bring true consistency to production.

Frequently Asked Questions, Answered 

What’s the best way to manage secrets when using Argo CD?

Use external secret managers like HashiCorp Vault, AWS Secrets Manager, or Sealed Secrets with Argo CD. Avoid storing plain secrets in Git—always encrypt or reference them securely.

How do I handle multiple environments like dev, staging, and prod with Argo CD?

Use separate Argo CD Applications or Projects for each environment. Keep environment-specific overlays in Git, and isolate namespaces and cluster targets to reduce risk.

Do I need to expose the Argo CD UI to the public internet?

No. It’s best to expose Argo CD via a private load balancer or through a secure VPN. If public access is required, use ingress with TLS and enforce SSO/OIDC for RBAC.

What should I monitor to ensure Argo CD is healthy?

Track reconciliation metrics, sync durations, error rates, and OutOfSync events. Export these to Prometheus and use Grafana dashboards for visual monitoring and alerting.