Consistent Design
We never design targeting only a simple capability or service. This can be also considered in a higher abstraction level the major cloud provides has guidance’s, positions on “How to Design” and “Patterns” they are a very useful resource. Those also don’t spare to build a governance model that will guide the major patterns on how to best leverage the usage of Cloud Service providers.
An very relevant aspect is start with the logical model once define the problem statement.
Bellow some aspects to consider:
Location is very relevant aspects more even due the natural conversion of Cloud Providers trends to deploy regions close to their competitors due a natural choice of the best connectivity offerings trends to be concentrated. So, use this in four favor avoid waste this saving on cost and simplification.
When plan the main region use the placement start, when define the disaster recover solution also take this in consideration choose the best regions clearly defined how build and orchestrate the DR plan.
Today seems easier to newer applications define boundaries because they born in this way. However, services architecture following older architecture paradigm can be a bit complex define this because can be easy defined as multi-client to server model. To better approach them we can do some exercises but one in particular:
Level of dependency
- Communication dependency rest, soap, http
- The most demand – data to data (hard replication or log transport)
- Provider what service via what protocol?
- Consumer what service via what protocol?
How communication handle:
- Timeout
- Retry
- Encryption
Now in practice
Services that handle better timeouts, retry and has no integrations based on data to data can define an initial cluster of services.
Services that has higher intra dependencies can stabilize a second cluster
High number of services point to a singular entity can also define a common service or cross service that will sustain more than one workload. This cluster of services will define our service support.
Other cluster of services are called “enterprise class” can be challenge in terms of volume, data to data scenarios and services more sensible to latency due the inexistence support control for timeouts and retries. This cluster of services will require more elaborate strategies to mitigate operational impact and add some area of improvement.
A last consideration here we need consider what is simpler and can add extra improvement in our services in future once migrate to cloud once define boundaries plan leverage use of:
- Encryption
- Decouple
- Get rid of latency sensible (consumer provider scenarios…)
Communication
Once enter to cloud deployment scenario we need consider the volume of communication scenarios, flow and volume.
Scenarios
All cloud providers offer:
- VPN: iPSEC tunnel over internet
- Dedicated communication: Some can offer direct connection using a third-party provider or even a connection partner with preexisting physical connections over automation framework to create private or public virtual circuits.
- Internet: make cloud service available over internet.
- Internal cloud to cloud communication(same provider): consumes cloud provider internal backbone to communicate services deployed in a region using an internal backbone.
Factors to consider:
- Outbound communication: the outbound traffic is considered everything that goes out from a region, like:
- Geo replication
- Remote bucket copy
- Backup copy
- Remote peering
- Pay attention also on obvious scenarios like more “cold” services when you retrieve data you also pay communication fees.
- Observe if you cloud provider offer an allowed of traffic for this deployment like first xxGb of monthly traffic is free this can also enhance your deployment to best architect the scenario or avoid some unnecessary costs.
Remembering my times of consultant … normally we trend to say there is not one size feet’s all. There is some initial design approach that can help to build a virtual network design that can be consistent to more than one player.
Generally, the usage of any tool will require different adjusts to virtual network design. Nowadays and for sure for the future we will continue to have impressive capabilities, nice products form vendors that can offer their product for the major cloud providers. Some capabilities will require specific designs for a cloud provider changing they regular way that is define.
For the initial models leverage the native capabilities of the cloud provider will guide a fluid experience on automation configuration management build native using cloud attributes. That later you also peruse using other alternatives. Link the cloud attributes and specific capabilities for the third-party vendor.
A way to add vision on how we drive the initial design:
- Commonly you will not able to change once defined, some cloud providers allow this. But observe:
- Ground zero rule. Zero trust design where the commutation is denied by default.
- Don’t underestimate the cloud environment try to fit in several virtual networks that you can manage.
- Don’t super estimate and consume the highest possible segment – remember normally companies’ trends to use multi Cloud.
- Virtual Networks will be subdivided. Also check the number of networks divide them also using the segmentation based on exposure.
- Allow and design communication allowing the traffic from sources, VNICs being more specific as possible.
- When design the traffic flow use the special gateways available from the provider. Here a very important aspect. Use the less exposure for the initial design like: avoid use of internet gateways, trend to use any other more specific like internal service gateway – if the provider has one then a NAT gateway.
- Once the usage of internet face services is in question protect them not only with firewall services, add more specific protections like DDoS and
- At this point we have an VCN with the specific division – networks using gateways and services that will add security and gateway’s to established communications.
- What about the transit scenario?
- Some cloud providers use terms like hub spoke other transit. The concept is using an VCN to handle dedicate communications or iPSEC connections, configure pairs of transitive gateways to route traffic.
- Should use ? Yes. Reason.: better control and isolation.
- Where start complex to use: Where the number of peered connections increase to create and establish communication with the VCN behind of the VCN that hold the provide communication and iPSEC. This can indicate others aspects and can raise an requirement of detailed review.
This topic is present on distinct cloud providers with different approach on how to organize the resources in their cloud deployment.
The idea here is not described specific any of them but expose some relevant aspects.
Design this isolation exercise the extreme conditions and check how works for they what that you manage consider:
- How you manage smaller teams’ trends to be more transverse
- Bigger teams are deeper and focus on smaller and defined
- Will allow any shared administrative ? Or the end to end will be manage by one team ?
Understanding the nature of the deployment:
- Centric interactions like an enterprise deployment services have several and complex interactions.
- Disperse interactions other scenarios can be more independent consume other flavors of integrations data streams.
Now we know the following characteristics:
- Team that will manage, they will also be dedicated and distinct from also environments?
- Deployment will be shared or has a hands-off ?
- How the resources will interact, or they will be isolated?
Based on this we can start defining if will be:
- More centralized
- Mode decentralized
- Highly decentralized
- Completed Isolated
To define who can do what…
We need to have the isolation defined prior and available to an update or revision. Because normally is used as base to write policy definitions that will allow users perform operations at the defined cloud resources targeting that logical isolation.
Concept here is also the same for the network – zero trust. In general terms all cloud provider has this definition that you MUST write the policy to allow the user to perform any operation instead capable to login.
A nice analogy can be used to start this design here is the role. Define the minimum required capabilities of a set of users need to have.
Add the vision of environment to check if the initial design is valid.
Restrict the usage or global/root/master admins using break the glass process.
Errors can happen and they will happen, consider:
- When create a sub admin do not allow delete permission.
- Restrict update on constructs that can add or change policy definitions, labels, security rules and most important policy controls …
Audit here is an requirement check how you can retain the audit log for all actions of create, update, delete.
Continuity strategies will drive consumption of more services. When design continuity a major and initial drive is what is the level of continuity that this service will require. Some services can support active – active models others cold… however what is important is what is your business requirement ? This will drive how we can better choose the strategy.
Here let’s define from cold to active a major vision:
- Backup based. Copy backup to other region and don’t has any service active. Once required the service will provision backup will be restored. Active pointers: DNS, LBS updated.
- Dormant sync. Data is replicated to block volumes using thirty party tools. Once required the service will be provisioned using the volumes with the latest updates form the sync tools. Active pointers: DNS, LBS updated.
- Live sync: Data is replicated using solution specific against provisioned resources here the resources will be running, can also consider using dehydration to mitigate the cost expended apply the changes required to active the copy and adjust the compute power. Active pointers: DNS, LBS updated.
- Live balanced: Service is running in more than one region, using a mechanism to control traffic to all live running on their specific regions. This traffic control is based policy can normally allow capabilities to analyze some traffic patters to build more specific policy like Geographic area, circuit… here data replication can be complex and must be evaluated how to proper handle.
Now we have some major alternatives to build the composition of the availability strategies consider that will have impact on cost once we add an higher coverage for continuity.