February 23, 2021

AWS CloudFormation: IaC at Enterprise Scale

Chris Belyea

One of the first    IaC    solutions,    AWS CloudFormation,    provisions infrastructure according to a declarative template that defines the desired state.    Originally JSON only, these templates can now be written in YAML which is easier to read and write by hand. Templates define resources (e.g., Lambda functions, load balancers, queues) and their properties. Subsets of these resources and properties are exactly the thing you want to share across teams. CloudFormation provides quite a few ways to accomplish this.    

Copy/Paste    

This isn’t a viable strategy at any scale but is mentioned because it happens so commonly with CloudFormation. It’s fast, understandable, but completely uncontrollable. Keeping code snippets up to date across templates is impossible. It’s where many start when learning IaC, but this is the absolute lowest-maturity option. Don’t do this.    

Nested Stacks    

Nested Stacks were one of CloudFormation’s earliest attempts at modularity, and for that, they succeed. You define a nested stack by declaring one or more AWS::CloudFormation::Stack resources in your template, with each one pointing to a    “child”    CloudFormation template in an S3 Bucket. Each child template can define a small, logically discrete piece of infrastructure. What qualifies as “logically discrete” is subjective—having a nested stack for a single resource is far too granular but including a complete three-tier application’s worth of infrastructure is too much. The goal is to enable composability. Give engineers pre-built components they can plug into their architecture, not a box full of tiny blocks to be assembled. A parameterized template containing a load balancer with appropriate TLS ciphers, listeners, and target groups, is representative of the right scale.    

Now that they’re compatible with change sets, nested stacks’ practicality has increased, however, they can still be cumbersome to work with, especially for significant stack updates with internal dependencies that take a long time. Nested stacks can contain other nested stacks, but if you go beyond one or two levels it quickly descends into overcomplexity. And just because nested stacks make it easier to combine multiple infrastructure components into a giant stack it doesn’t mean that you should. A CloudFormation stack provides a good blast radius boundary. Use stacks accordingly.    

Also, mind the default limit of 200 CloudFormation stacks in an account. In a heavily used account with deeply nested stacks, it may not be hard to hit that ceiling. Especially if you are building a stack per code branch!    

Finally, nested stacks suffer in two areas key for adoption at enterprise scale. Their discoverability is poor due to the fact that the child templates must live in an S3 Bucket. And managing and specifying versions of child templates requires some rigor.

One way to mitigate the discoverability issue is to build an online catalog of what templates are available. However, it’s important that each child template is thoroughly tested and provides release notes for each version. Collecting all child template repositories under a single GitHub organization is recommended.

It’s essential that child templates be published to S3 with a version identifier. For example, com.example.alb-1.77.2.yaml so that consumers can pin their dependence on a specific version of that template. Without this, teams could find that their infrastructure is changing unexpectedly every time they update their stack and CloudFormation pulls in the latest version of a child template. The best way to do this is for a pipeline to test  cloudformation validate-template and cfn-lint and then publish releases to S3. Because CloudFormation has no concept of dependency management, templates declaring AWS::CloudFormation::Stack resources must reference a specific version of the child template. For example:

Transforms & Macros    

Macros provide a way for CloudFormation to preprocess a template before executing it. They can be used to expand shorthand resource definitions into ones compliant with your company’s standards, or to import standard resources into a stack without having to explicitly define them. You can also use macros to provide CloudFormation with new functionality, such as the ability to provision multiple resources by specifying count parameter (similar to Terraform), letting you make your templates better follow the DRY principle.    

Macros are comprised of two parts: a Lambda function that implements the logic, and an AWS::CloudFormation::Macro resource. You can deploy the Lambda function that implements a macro into a single account and then share access to other accounts in your organization, but the macro resource itself has to be deployed into every account in which you plan to use it (Stack Sets can help here). Be sure to take cross-region availability considerations into account when deploying macros. Also, note that Stack Sets themselves cannot use macros.

If you’ve used AWS Serverless Application Model (SAM) then you’ve used a transform AWS::Serverless. Transforms are simply macros hosted by AWS with no additional setup required.

Because they have to be centrally deployed and maintained, and the fact that modifying a macro Lambda will affect all stacks that depend upon it during their next update, we don’t see macros commonly used beyond the simplest use-cases.

CloudFormation Registry

Terraform’s biggest strength is its provider system. These plugins enable Terraform to interact with multiple public cloud providers, on-premises VMware clusters, network equipment, Git platforms, monitoring tools, and more. Each provider makes Terraform aware of relevant resources and data sources used to manage... whatever Terraform is managing.

The CloudFormation Registry achieves similar capabilities for CloudFormation, but as you’ll see it isn’t an ideal solution nor is it intended to be used explicitly for IaC code reuse. The Registry stores resource types which themselves are an extension of custom resources. Resource types define new CloudFormation resources that you can use in your templates. They are specifically geared toward using CloudFormation to provision third-party resources outside of AWS. For example, to create Datadog monitors for the infrastructure in your stack. Every resource type you use must be registered in each region of each account you plan to use.

Since you can pass an IAM execution role to resource types when they’re executed you could conceivably use this to provision AWS resources that comply with your organization’s standards as well. Your customers would be able to use your resource types in their CloudFormation templates making the transition to standard resource types that are shared across your company easier.

The CloudFormation Registry is intended for custom resource types that manage third-party resources;    however,    you could also use it as a way to define AWS infrastructure standards for your organization. We’ve worked with a lot of customers and have never seen this approach in the wild, so if you’ve already gone down this path let us know. If you haven’t, then CloudFormation Modules are a better option.

CloudFormation Modules

While the CloudFormation Registry with resource types was really intended to be used for provisioning resources outside of AWS, CloudFormation Modules are a new feature that provides the closest analog to Terraform’s modules, which are one of Terraform’s most-touted features (and rightly so).

CloudFormation Modules allow you to build reusable collections of resources that can be used like building blocks in your CloudFormation templates. Modules are similar to nested stacks; they can contain multiple resources, parameters, outputs, and other standard CloudFormation constructs. Modules can be nested. And stack-level tags also apply to resources created by a module. Syntactically, their use in a template is straightforward and similar to declaring any other CloudFormation resource. Modules themselves are defined as familiar CloudFormation template "fragments" (in JSON only, however).

Despite these appealing characteristics, CloudFormation Modules have some serious shortcomings that make them unsuitable for widespread adoption. It’s likely AWS will resolve these issues over time (modules are, after all, a brand-new feature) but for now, caveat emptor. So, what are these issues?

Similar to CloudFormation Registry resource types, Modules must be deployed into a region in an AWS account before use. The CloudFormation Registry is the central store for Modules.    So, unlike Terraform modules, which behave similarly to traditional software libraries that are defined by their consumer as a dependency, Modules have to be pre-provisioned into an account before they’re available to be consumed in CloudFormation templates. In a typical enterprise with dozens or even hundreds of AWS accounts, this becomes a logistical nightmare even with the best CI/CD pipelines.

Another serious wrinkle with Modules is that their versions are tracked as simple integers (no semver) and are incremented automatically by AWS as part of each deployment. Consumers of a module are forced to use the version of the module specified as the “default”, and neither the version numbers nor the "default" version are guaranteed to be consistent across accounts. It’s easy to imagine scenarios where module versions get out of sync. This is a major impediment to reproducibility, and if you use Stack Sets to deploy across accounts your risk exposure grows even more.

CloudFormation Modules are a good evolution of the nested stacks concept that arguably should have arrived a few years ago. And unfortunately, in their current state, they aren’t suitable for use in an enterprise—or any—environment.

If you want to learn more about scaling your IaC approach, I wrote an article explaining some general guidelines and available tools. SingleStone helps companies of all sizes, from start-up to Fortune 500, with getting into the cloud and using it to effectively drive real business agility—and results. If your company is ready to manage its infrastructure code like a high-performing organization, we can help.

Ready to Modernize Your Tech and Simplify Your Data?

Schedule a call to get your questions answered and discover how we can help you in achieve your goals.

Schedule a Call