What You Should Know About Scaling AWS Accounts
Scaling an enterprise’s cloud services from 1 Amazon Web Services (AWS) account to 500+ accounts is not easy. There are challenges you will not foresee, and there are things you will learn every day along the way. One of the best decisions an organization leveraging AWS can make is centralizing the management of AWS Identity & Access Management (IAM). This approach goes hand in hand with establishing a strong identity federation for controlling access to AWS resources.
With centralized management, the advantages far outweigh the disadvantages, but it surely does not come without its own challenges. How do you create a one size fits all for an Enterprise that is unpredictable and massively diverse in its missions and cloud experience? The organizational challenges far outweigh the technical ones, so I am going to focus on how we design, develop, and maintain the technical configuration of centralized and federated IAM.
At the beginning of your journey, you usually begin with IAM users and groups. This works… for maybe a handful of accounts. Once you get beyond 20 to 30 accounts, manually keeping all of them up to date and in sync is no longer fun or feasible. You will quickly experience manual scaling issues.
In comes infrastructure-as-code and automation to save the day. Our approach consists of using a CI tool (like Jenkins) and AWS CloudFormation. We start by converting IAM users and group entities to federated roles and begin building a single CloudFormation template to encompass all the roles and policies. We create Security Assertion Markup Language (SAML) identity providers in each account and reference the SAML provider ARN in the trust policy for each role in the template.
It becomes much faster to update a single template and run stack updates in each account as the roles and policies change. Now how do you manage and deploy updates to these stacks across all accounts? At this point, we turn to use STS AssumeRole (aka switch role) which is a simple API used to retrieve temporary credentials for an IAM role in each account in order to make the API calls to CloudFormation. With a simple Jenkins job shell script, we could then loop through a list of AWS account IDs, assume our cross-account role in each account and perform the updates.
At some point, maybe around 100 accounts, running a bash script to do updates sequentially became a bottleneck. It takes a long time to get through all the accounts via CLI, sometimes up to an hour for updating all accounts in Commercial and GovCloud, and doesn’t offer robust error handling or rollback capability. Rather than rewriting the jobs to multithread or optimize, we turn to serverless.
Another method that we adopt is using AWS Step Functions and Lambda with another workflow process and use a similar framework to address federation updates. Using a Map state in Step Functions, we can feed our list of AWS account IDs and iterate in parallel, invoking a Lambda function for each account to perform the update. This State Machine could now perform updates on hundreds of accounts in both Commercial and GovCloud simultaneously. This architecture can reduce your deployments from hours to under 5 minutes. In my next posts, I will speak to this method in more detail for other use cases.
Thank you for reading!
About the Author
Rob is a Chief Cloud Architect for E-INFOSOL. He received his B.S. in Information Sciences and Technology (IST) with a Minor in Security & Risk Analysis from The Pennsylvania State University. He went on to obtain an M.S. in Systems Engineering from Southern Methodist University with a focus in Systems Architecture Development. His background in Systems Engineering and an interest in Web Application Development naturally led to creating a passion for working with Amazon Web Services. His appetite to constantly learn and build is a perfect fit for working with the cloud services of today’s world.
Rob’s interests outside of work include traveling, food, running, hiking, skiing, and an unhealthy obsession with cars.