Cloud operations are complex. There are a lot of reasons for this complexity, but in this post, I want to focus on how resources and services are managed in today’s clouds. Cloud today is oftentimes comprised of a large number of heterogeneous resources that have altogether different methods for managing them.
This diversity of resources is in large part the byproduct of cloud practices that predate infrastructure as code (IaC). Before automation and IaC, many companies would configure resources and services manually, without any alignment to best practices, based on internal processes that are unique to the organization. As companies evolved, and adopted IaC for codifying and managing cloud resources, this created a mishmash of services that are managed and unmanaged.
This practically means while some of the services have legacy and manual methods for managing them, others are aligned with industry best practices and are codified and managed in methods considered cloud-native. This lack of alignment and heterogeneity adds layers of complexity to already complex cloud operations.
This matters because "uncodified" resources essentially miss out on all the benefits of being able to be implemented and managed through CI/CD and automated processes that have inherent gates and safety mechanisms before deployment to production. By managing resources through code, important checks and balances can be applied both from a security perspective — such as code and secret scanning through authentication checks — as well as just general good practices — such as code peer reviews, usability, bug testing, and more. On top of this, infrastructure as code was built on concepts of immutability to provide greater resilience for cloud operations, including the ability to restart and redeploy unhealthy assets, and these benefits are lacking for unmanaged resources.
We have discovered that there are many common AWS services that have been "left behind" in the move to IaC, that should be codified. In this post, we’ll tell you why.
Route53 DNS Records
As we recently witnessed in the October 4th, 2021, Facebook outage, even the most robust of services can be taken down by DNS. There were plenty of cynical jokes circling on Tech Twitter about how it’s ALWAYS DNS.
It’s always DNS… "Understanding How Facebook Disappeared from the Internet"
And while DNS remains the edge of your services, it comes with high risk when it remains unmanaged, and even despite having the power to take down your entire operation — even at a global scale — the irony is that it still very often receives very low priority in the grand scheme of large cloud operations.
We’ve found that more often than not, Route53 DNS records are not defined as code. Most of the time, it’s even a manual copy-paste process of managing your DNS records, and one of the greatest risks to any kind of operations is actually human error. That’s actually the backbone of automation and early configuration management, to configure everything in the exact same canonical manner over and over again — so there is no human error baked into the configuration.
Leveraging IaC helps you to do this with your DNS records automatically and safely, and should be a practice in any production engineering group. This is a great read on how to get started with such a migration with Terraform.
IAM Users Roles and Policies
Managing roles and policies in the cloud remains a pain point for most cloud ops teams. From the onboarding to the offboarding, to even enforcing security measures such as MFA, it is always difficult to manage and maintain in large, distributed, high-scale environments. The exploit surface is so wide with IAM, where overly privileged access can have a disastrous impact on production services, and least-privilege practices have their own learning curve, it’s no surprise that the majority of breaches are still authentication and authorization related.
To combat this challenge there are popular open-source tools like Open Policy Agent (OPA) that enable organizations to quickly and easily define organizational policies as code (this is a great read of other ways to leverage OPA for Policy as Code).
While this is the case, like DNS, IAM is very rarely managed as code, and it could benefit immensely from the very dry checks and balances machines apply when automating such processes. This can be done with CloudFormation, Terraform, Ansible, or any other automation tooling, and should be on your to-do list if you haven’t yet codified your IAM roles and policies. Another great post on how to get started with moving to codified IAM in your organization.
CloudFront Distributions
The benefits of leveraging a CDN are known, and are an enabler for more rapid access to web content and improved web performance. However, the back end operations are less exciting, and often involve a lot of configuration to deliver the results companies are looking for.
That’s why many times AWS CloudFront distributions are configured directly in the AWS console UI. The sheer number of lines of code that configure CDN through code, and the step-by-step process of configuring through the console UI, make this a go-to choice in cloud operations. However, the benefits of managing CloudFront through IAC are tremendous; it’s a write once, deploy anywhere analogy. It makes it infinitely easier to reroute, update URLs and much more through code — and not manually have to search through the console UI to manage this resource.
S3 Buckets Configurations
The S3 object storage service from AWS is one of the most commonly used services, with at least some part of every project or architecture relying on it and big data architectures relying quite heavily on it. However, we still see all too often that this critical service is managed through the AWS console, and not via infrastructure as code.
Like with every storage service, it tends to contain the most sensitive and vital data, serving all users of any cloud app. Therefore, managing your S3 buckets with IaC not only will provide this critical piece of your operation with the ease of mind that comes with best-practice configurations, but also the guardrails required to prevent any changes that can break apps and expose users to security threats.
Some of the ways IaC provides an easy way to enforce policies are to ensure:
- All S3 buckets employ encryption-at-rest.
- The S3 bucket policy is set to deny HTTP requests.
- MFA delete is enabled on your S3 buckets.
- "Block public access (bucket settings)" is configured on buckets.
These are just to name a few ways you can leverage IaC practically to ensure the right configuration policies are applied.
IaC also allows you to manage cross-bucket policies and to create immutable protection and controls out of the box, such as object permissions (with granularity for specific buckets and file extensions), bucket access for specific IAM users and groups, and more. Check out how to get started in this post.
KMS Service Configuration (When Not Using Vault)
While Vault has become quite popular as a cross-platform and service key management tool, in cloud operations, and particularly AWS operations, their built-in KMS (key management service) remains fairly ubiquitous. However, like all AWS services and tools, these can be configured via the console UI or code, and it’s a very common practice to do so through the UI.
While cloud data encryption should be at the top of the mind of every developer and security engineer, and this is especially true when it comes to the actual security keys for this data encryption. But like everything else that is done at scale, managing many keys to a diversity of services is a complex operation, and migrating away from manual key management is imperative. This automation can be achieved through AWS’s KMS (or other services like Vault or even Akeyless) and should be integrated into your IAC as well to enable the level of security and governance key management requires.
Not only does IAC enable you to manage your keys in individual files (that can be deleted or updated rapidly in the event of a security issue, unauthorized access, or more), this also provides added benefits like automatic key rotation, alias creation, applying key policies, access to data and persistent resources, and more. When this is done through IaC, listing, creating, and revoking grants to keys and it becomes one line of config that's easy, enabling you to rest a bit easier from a security perspective.
Codifying Your AWS Resources
Infrastructure as code and automated delivery processes have brought with them a lot of benefits for embedding good coding and security practices into your development processes, and these should be applied to all your cloud resources. While many of these resources may have been created before IaC became the de facto way to manage cloud operations, they should not be left behind. This becomes especially important as your cloud operations grow and scale. So, I’d suggest you check out the state of your cloud operations, and start migrating critical services to code.
This post will be part of a series of posts that focuses on unmanaged resources across platforms and frameworks, so stay tuned for future posts on Kubernetes and additional clouds, and best practices for codifying and automating your cloud management.
Photo by Antonis Spiridakis on Unsplash