Management and Governance

Arch AWS CloudFormation 48 AWS CloudFormation

AWS CloudFormation is a service that provides Infrastructure as Code.

CloudFormation template are used to provision the infrastructure.
JSON and YAML format is supported.

Stack is a group of resources.

StackSets allow users to create, update, or delete stacks across multiple accounts and regions with a single operation.

Change Sets allow users to preview how proposed changes to a stack might impact their running resources before implementation.
When you edit a template and update a stack a change set is created which describes how the change will affect you.

Templates can be stored in CodeCommit.
CodePipeline and CodeBuild can be used for CI/CD.

Templates can be scheduled to run at specific times.

Deleting the stack deletes all the Resources.

Arch AWS Cloud Development Kit 48 AWS Cloud Development Kit (CDK)

CDK is used to define your infrastructure using programming languages.

cdk synth generates the templates
cdk deploy deploys the infrastructure

Arch Amazon CloudWatch 48 Amazon CloudWatch

Amazon CloudWatch monitors all your resources in real time.

Metrics are recorded and alarms can be generated on thresholds. Logs of all services can be collated.

Events can be configured in EventBridge to respond to state changes in AWS instances.

Dashboard is available and has a customizable home page.

A common use-case is for EC2 instances to auto-scale based on CloudWatch alarms.

CloudWatch Metric Stream allows CloudWatch metrics to be sent to Kinesis Data Firehose and 3rd party service providers like Datadog, Dynatrace, New Relic, Splunk, Sumo Logic.

CloudWatch Logs 35
CloudWatch Logs have Log groups which have arbitrary names.
Log stream is a set of instances within an application.
Log expiration policies can be set.
CloudWatch Logs can send logs to S3, Kinesis Data Stream, Firehose, Lambda, OpenSearch.

CloudWatch Logs Insights can query CloudWatch Logs.

Subscription filters can be set on the CloudWatch Logs.

By default, EC2 does not send logs to CloudWatch. It needs a CloudWatch Agent with the necessary IAM permissions.

AWS CloudWatch Agent can collect system-level metrics from on-premises servers to view them alongside AWS metrics for comprehensive monitoring.

Cloud Watch Alarms 35
CloudWatch Alarms can monitor metrics.
They can be used to trigger start/stop/terminate on an EC2 instance.
They can be configured to work with autoscaling groups or to send a message to SNS.

CloudWatch alarms can be used to mitigate sudden traffic spikes at unpredictable times, by monitoring CPU usage and network traffic and triggering scaling policies of ASGs.

Composite Alarms can club more than one alarm.

CloudWatch Container Insights is used to collect, aggregate, summarize metrics and logs from containers. Works for ECS, EKS, K8S on EC2.

CloudWatch Lambda Insights can aggregates metric and diagnostic information like cold starts.

CloudWatch Contributor Insights analyze log data and time series data that displays top unique contributors and their usage. E.g. the IP addresses generating the highest traffic.

CloudWatch Application Insights provides automatic dashboards to troubleshoot your application and related AWS services.

Arch Amazon EventBridge 48 Amazon EventBridge

Amazon EventBridge facilitates decoupling and scalability of applications. It provides event processing capabilities at scale, event routing and filtering.
It is used to schedule cron jobs e.g. schedule scripts to run every hour and to write scripts to react to a service doing something.

Every account has one Event Bus by default for AWS Services and a Partner Event Bus for 3rd party services.
Rules can be setup to filter and send events to targets.

Pipes route events from one source to one target.

Scheduler is responsible for scheduling events.

Some sources include an EC2 instance change, CodeBuild build failures, S3 object uploads etc.

Filters can be set to determine which events to process.

A JSON document is generated which can integrate with destinations.

EventBridge Archive and Replay feature is the most efficient and cost-effective way to store EventBridge events and use them later.

Arch AWS CloudTrail 48 AWS CloudTrail

AWS CloudTrail provides governance, compliance and audit for your AWS accounts.
It allows you to get a history of events or API calls made within your AWS Account.

Actions of SDK, CLI, Console or IAM users are audited.

If events are needed for more than 90 days, then logs can be shared to CloudWatch Logs or S3 buckets, and Athena can be used to query the data.

CloudTrail Insights detects unusual activity in your account.

Arch AWS AppConfig 48 AWS Config

AWS Config helps with auditing and recording compliance of your AWS resources.
E.g. it can identify security groups that allow unrestricted SSH access to all IP addresses.

We can create rules for compliance and AWS Config can show all the resources which violate compliance.

AWS Config Remediations automatically re-configures your Security Groups to their correct state.

AWS Config Notifications notify you over email when someone modifies your EC2 instances' security group.

AWS Config managed rule can check if any third-party SSL/TLS certificates imported into ACM are marked for expiration within 30 days. The rule can trigger an Amazon SNS notification to the security team.

Arch AWS X Ray 48 AWS X-Ray

AWS X-Ray is a tracing tool that receives traces from application and AWS services.
Traces are a collection of Segments.

AWS X-Ray helps developers analyze and debug production and distributed applications by providing insights into the performance and errors.

Arch AWS Health Dashboard 48 AWS Health Dashboard

AWS Health Dashboard provides alerts and guidance for changes that can affect your AWS environment.
E.g. AWS maintenance activities

A public event is one that affects all customers. A private event is one that affects only your account or a region that you use.

Arch Amazon Managed Service for Prometheus 48 Amazon Managed Service for Prometheus

Prometheus is a time series database and an open source monitoring solution. Amazon Managed Service for Prometheus is Amazon’s managed offering for Prometheus. It collects metrics for your application.

It is based on the PromQL language.

Targets are services from which Prometheus should collect data. Retriever collects the metrics from the target.

It works well with dashboard visualization tool like Grafana.

Prometheus uses service discovery to identify all services to collect metrics from.

Prometheus can integrate with Cloud Watch for alerts.

Alert Manager is a component in Prometheus which can trigger an event in SNS which can send an alert.

Arch Amazon Managed Grafana 48 Amazon Managed Grafana

Grafana is an open source project that provides high quality dashboards to visualize data.
It provides improved capabilities as compared to CloudWatch.

Grafana can be used to query metrics stored in Grafana to view custom dashboards.

AWS Managed Grafana is Amazon’s managed offering for Grafana.

Arch AWS Trusted Advisor 48 AWS Trusted Advisor

AWS Trusted Advisor is a tool that helps you follow AWS best practices.

It can advise on the following:

  • Cost Optimization: helps reduce costs by identifying idle resources

  • Performance: by reviewing usage and configurations

  • Security: recommending best practices

  • Fault tolerance: checks autoscaling groups, backups

  • Service Quotas: monitors maximum allowed resources and alerts when you reach quota of 80%

Arch AWS Launch Wizard 48 AWS Launch Wizard

AWS Launch Wizard simplifies the process of deploying well known 3rd party applications through pre-configured templates.

Launch Wizard itself comes at no extra charge.

Arch AWS Compute Optimizer 48 Compute Optimizer

AWS Compute Optimizer provides optimization for services like EC2, ECS, Fargate, Lambda.

AWS Compute Optimizer recommends optimal AWS Compute resources for your workloads to reduce costs and improve performance by using machine learning to analyze historical utilization metrics. Compute Optimizer helps you choose the optimal Amazon EC2 instance types, including those that are part of an Amazon EC2 Auto Scaling group, based on your utilization data.

Its features include:

  1. Performance Risk Analysis

  2. Cost-saving Recommendations

  3. EC2 Instance Type Recommendations

  4. EBS Volume Recommendations

  5. Optimization for Fargate

Arch AWS Organizations 48 AWS Organizations

AWS Organizations is a global service.
It simplifies the process of managing multiple AWS accounts.

It offers centralized billing across all accounts.

We have a Root organization unit in the organization.

Organization units help group different organizations.

Management accounts handle the management activities like adding and removing accounts.

Member accounts can only be part of one organization.

Reserved instances and savings plan discounts can be shared across account.

Service Control Policies (SCPs) are a type of organization policy that you can use to manage permissions in your organization.
SCPs do not apply to the management account.

SCPs establish permission boundaries for all accounts. They can be enforced regardless of the permissions assigned to IAM entities (users or roles).
SCPs alone are not sufficient in granting permissions to the accounts in your organization. No permissions are granted by an SCP. An SCP defines a guardrail, or sets limits, on the actions that the account’s administrator can delegate to the IAM users and roles in the affected accounts. The administrator must still attach identity-based or resource-based policies to IAM users or roles, or to the resources in your accounts to actually grant permissions. The effective permissions are the logical intersection between what is allowed by the SCP and what is allowed by the IAM and resource-based policies.

FullAWSAccess SCP is attached at Root OU.

Deny policies on Org Unit cannot be overridden at account level.

It integrates with other services like IAM.

It is provided at no additional charge.

Arch AWS Systems Manager 48 AWS Systems Manager

AWS Systems Manager is used to manage a large number of servers in AWS and on-premise.

It needs to install a Systems Manager Agent, which is a software to be installed on all of your servers.

Arch AWS Control Tower 48 AWS Control Tower

AWS Control Tower helps you manage an environment having multiple accounts. Think of it as a AWS Organization orchestrator. It offers automated provisioning and governance.

Preventive Guardrails provide preventive measures whereas Detective Guardrails provide reactive measures.

Account Factory helps automate setting up the new accounts.

Arch AWS Service Catalog 48 AWS Service Catalog

AWS Service Catalog is a curated collection of IT services.
An IT administrator manages the catalog.

Roles define what a user is allowed to use.

Some challenges that occur without a Service Catalog are:

  • Inconsistent deployments

  • Lack of Governance

  • Uncontrolled spending

  • Complexity in managing accounts

  • Slow deployment of resources

Products can be thought of a template to deploy a set of resources. Cloud formation stacks can configure a Product.

Portfolios manage who can access which Product.
Portfolio is a group of Products.

Catalog Administrator manages the catalogs and products. End users use the products.

Arch AWS License Manager 48 AWS License Manager

AWS License Manager manages software licenses from various vendors for cloud as well as on-premise.

License manager can prevent software from launching if it does not conform to a valid license.

It prevents over-use of a license, works across AWS accounts and enforces license rules.

Resource Group and Tag Manager

Resource Group and Tag Manager is used to group resources as per tags.

One good idea is to tag resources per environment.

Arch AWS Proton 48 AWS Proton

AWS Proton allows a platform team to create environments using IAAS. It automates deployments and can have flexible definitions.

Arch AWS Resilience Hub 48 AWS Resilience Hub

AWS Resilience Hub helps you set up your disaster recovery process by continuously tracking the application.

It has the capability to alert during outage.

It provides SOP (Standard Operating Procedure) for recovery.

Its features include:

  • Define the Resilience Policies

  • Run Assessments

  • Review Assessments

  • Implement Recommendations

  • Setup Alarms

  • Review SOPs

Arch AWS Resource Explorer 48 AWS Resource Explorer

AWS Resource Explorer simplifies the search and discovery of your AWS resources across AWS regions.

It has two types of users, an administrator and a normal user.

Index is a collection of information about AWS resources in a specific region.
Local index is specific to a region whereas an Aggregator index collects data from all regions.

Every region replicates their local index to the aggregator index.

Arch AWS Resource Access Manager 48 AWS Resource Access Manager

AWS Resource Access Manager allows sharing of resources within AWS accounts.

Its features include:

  1. Creating a resource share

  2. Selecting resources to share

  3. Choosing principals

  4. Accepting the resource share request at receiving account

  5. Monitoring and manage resource share

VPC sharing (part of Resource Access Manager) allows multiple AWS accounts to create their application resources such as Amazon EC2 instances, Amazon RDS databases, Amazon Redshift clusters, and AWS Lambda functions, into shared and centrally-managed Amazon Virtual Private Clouds (VPCs). To set this up, the account that owns the VPC (owner) shares one or more subnets with other accounts (participants) that belong to the same organization from AWS Organizations. After a subnet is shared, the participants can view, create, modify, and delete their application resources in the subnets shared with them. Participants cannot view, modify, or delete resources that belong to other participants or the VPC owner.