Architecting Cloud Account Provisioning

In this post I will talk about considerations when architecting an account provisioning system for three major cloud providers at an enterprise level. Regardless of the industry, time to market is very important, and as an account custodian you want your customers to be up and running in the shortest amount of time possible so that they can perform their tasks as soon as possible.

The three providers of note are of Amazon Web Services, Google Cloud and Microsoft Azure. There are similarities that spans across all three providers, which makes architecting a single system streamlined from a logical perspective, but in reality, the physical implementation of each differs quite a bit, which is a hurdle one will have to cross and which should not be underestimated. With a sound design upfront and understanding what the options are, one can make informed decisions as to what approach or approaches to follow.

The main considerations when embarking on such a venture can be classified into the "what's" and "how's". The important thing (as with any good business requirement review) is understanding the what, even before considering the how.

A summary of some considerations in terms of "what's" and "how's":

1. What is the desired end user experience? For example, how do you know a customer needs a GCP project or AWS account, what information do you need from them to provision said account (think budgeting, governance, security).

2. What are business goals that need to be satisfied with regards to billing, security, governance, and support?

3. What are the limitations with regards to time and resources?

4. How will the end user experience be provided?

5. How will the business goals be met?

6. How will the limitations of time and resources be overcome?

Once it is understood what the system should do and what the desired end state is, the technical aspects come into play, especially with regards to how it will be implemented for each cloud provider.

The cloud providers make available a variety of methods to interact with their systems, and major props need to be given to them for that. Each vendor provides REST APIs, an SDK in multiple languages, CLI, or a framework for "infrastructure as code".

The CLIs are the easiest to get up and running with but does don't scale very well. You can create a script to (for example) create an account, but it's tedious to fill in the variables or provide the parameters each time, and assign IAM permissions right after it was created. In most cases I have found it is basically an alternative to something you can do from the provider's GUI console, as it requires someone to physically execute the scripts - sure it is faster, but still take manual input.

The REST APIs are arguably the most flexible depending on your ultimate solution and is simple to use, since an HTTP support is ubiquitous nowadays. An obstacle to overcome is figuring out how authentication works for each provider. GCP is very straight forward if you use a Service Account, but AWS requires you to use SignatureV4, for which there are open source libraries, but is quite an exercise should you have to implement it yourself (not recommended, even by them), in which case you might as well use the SDK. The asynchronous nature of tasks is one of the major drawbacks, and you will probably end up polling the status of an operation if following this approach.

The SDKs are great for programmatically interacting with the cloud providers and can seamlessly integrate with your experience application. For example, if you are using an AWS Lambda project for the main user experience running on NodeJS, you can simply use the JavaScript/TypeScript SDK to do whatever tasks you need as part of the experience. One drawback I found was that the SDK for AWS account provisioning was lacking, and I had to use the REST API. It is not clear to me why there is not a JavaScript function to create an account.

Infrastructure as code or a similar framework comes highly recommended as the best practice for automation. The promise is big: you can easily and seamlessly automate tasks such as account provisioning and adding guardrails. From preliminary research it appears like a sound approach, but I still have questions with regards to how it interacts with the overall end user experience. In my mind it is the ultimate goal but will require a larger time and resource commitment to get there.

To conclude, understanding and navigating the topology and terminology for how things are structured in each platform. They are all similar but use different language and have nuances to them. For example, the smallest unit of work in GCP is a project, whereas in AWS you have an account, and in Azure you have subscriptions.

Closely related to the above is IAM. Each one works different but ultimately end up doing the same thing. Navigating this is a difficult but important task when implementing account provisioning. The core principle of providing only the necessary permission, nothing less and nothing more, is very valuable when designing one solution across all providers.

This post did not go into detail what one ultimate solution is, and from my experience a combination of the different approaches is fine. For example, getting up and running with the REST APIs is a quick way to get to market but ultimately you would want to get to a place where you can more seamlessly integrate with the SDKs or frameworks they provide.