Application Integration

Autoscaling

Autoscaling groups are used to dynamically scale EC2 instances on-demand.

It works on the basis of policies which are called as Scaling Policies.

We configure the minimum, desired and maximum values for the EC2 instances.

Scaling Policy Types

Manual

manually change the desired count of instances

Dynamic

Target tracking: based on rules and thresholds on target instances. e.g. a target value could be the average CPU utilization of your Auto Scaling group.
Step: based on Cloudwatch alarms, you can define different step adjustments based on the breach size of the alarm.
Simple: based on Cloudwatch alarms

Scheduled

Simple scaling policies are similar to step scaling policies, except they’re based on a single scaling adjustment, with a cooldown period between each scaling activity.

Instance warm-up: For step scaling, you can optionally specify the number of seconds that it takes for a newly launched instance to warm up. Until its specified warm-up time has expired, an instance is not counted toward the aggregated EC2 instance metrics of the Auto Scaling group.

Auto Healing is a feature of ASG that automatically replaces unhealthy EC2 instances with new ones.

Launch templates need to be configured for the EC2 instances.

Cooldown periods: Amazon EC2 Auto Scaling cooldown periods help you prevent Auto Scaling groups from launching or terminating additional instances before the effects of previous activities are apparent.

Elastic Load Balancers

Also refer to this.

ELBs can forward traffic to more than 1 AZ.

Listener is a rule on the basis of which traffic is routed.

Target group is a set of instances. It can be a set of EC2 instances, IP Addresses, Lambda functions or other ALBs

Load Balancers can be either Private or Public.

Rules can be based on:

Host Header
Path
HTTP Method
Source IP
HTTP Header
Query String

ELBs can integrate with EC2, ECS, Lambda, WAF, Route 53 and autoscaling groups.

Deregistration delay: By default, Elastic Load Balancing waits 300 seconds before the completion of the deregistration process, which can help in-flight requests to the target become complete. To change the amount of time that Elastic Load Balancing waits, update the deregistration delay value.

Amazon API Gateway

Amazon API Gateway is a serverless offering that provides HTTP endpoints for clients.

It can invoke Lambda, Step Functions or SQS message.

Endpoint Types include:

Edge Optimized
Regional
Private

An Edge-Optimized API Gateway is best for geographically distributed clients. API requests are routed to the nearest CloudFront Edge Location which improves latency. The API Gateway still lives in one AWS Region.

It has out of the box caching available.

To prevent your API from being overwhelmed by too many requests, Amazon API Gateway throttles requests to your API using the token bucket algorithm, where a token counts for a request.

Amazon AppFlow

Amazon AppFlow facilitates exchange of data between SaaS applications.

Examples include transfer of data between Salesforce to Redshift and transfer of conversations from Slack to S3.

It needs connectors to be established between source and destination.

Data mapping between fields and filtering is supported.

The data transfer is encrypted.

Amazon Simple Notification Service (SNS)

Amazon SNS facilitates the delivery of messages from publishers to subscribers. One message can be received by many receivers.

SNS works on the basis of a Topic.

Messages are published to a Topic and are delivered to all subscribers of the Topic.

Examples of subscribers include a SQS queue or a Lambda function.

Publishers can be applications / services or programs.

Types of topics

Standard topic: Messages may not show up in order
Messages may show up more than once
Offers the maximum throughput
FIFO topic: Follows a strict first-in, first-out order.
Used in cases where the order of messages is critical.
Duplicates are not possible
Provides lesser throughput as compared to standard topics
SQS FIFO queues can be the only subscribers

By default, SNS supports a maximum message size of 256 KB. Larger messages need to be driven through S3.

Amazon Simple Queue Service (SQS)

Amazon SQS is one of the oldest service provided by AWS.

It facilitates asynchronous communication between services.

It promotes decoupling, load levelling and helps establish scalable architectures.

SQS works on the basis of a Queue.
Messages are published to a queue.

By default, SQS supports a maximum message size of 256 KB with a default retention of 4 days and a maximum retention of 14 days.

Producers send messages to SQS queue whereas Consumers receive those messages.
Consumers should delete the message once they process it.

The SDK can be used with API SendMessage.

Message attributes consist of a name/value pair DLQ contains messages that could not be processed successfully after few attempts.

Visibility timeout is the time a message remains visible after being retrieved.

Message locking is available.

Types of queues

Standard Queues: best effort ordering, at least once delivery
FIFO Queues: exactly once processing, no duplicates
3000 messages / sec with batching, else 300 messages / sec

It is useful for decoupling applications like microservices.

SQS can work with Auto Scaling Group on basis of a CloudWatch metric called Queue Length.

To migrate from a standard queue to a FIFO queue, the following should be considered:

The name of the FIFO queue must end with .fifo suffix
Standard queue must be deleted and recreated as a FIFO queue. An existing standard queue cannot be converted to a FIFO queue
The throughput for the target FIFO queue cannot exceed 3,000 messages per second

SQS is a good option to manage user sessions for an application that experiences high load. The EC2 instances can forward session data to a SQS queue. Worker instances can extract the SQS queue and transcribe the session data into the database.

Another example is an online voting system for a live television program. The ingestion of votes can be decoupled from the database to allow the voting system to continue processing votes without waiting for the database writes. Dedicated workers can be added to read from SQS queues to allow votes to be entered into the database at a controllable rate. This will ensure that no votes will be lost.

Amazon Kinesis

Amazon Kinesis makes it easy to collect, process and analyze data in real time.

Arch Amazon Kinesis Data Streams 48 Kinesis Data Streams

A massively scalable and durable real-time data streaming service.
It can continuously capture gigabytes of data per second from hundreds of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events.
It can deliver data to multiple consuming applications, with a replay feature.

A stream consists of multiple shards. Data is split among the shards.

Producers send data to Kinesis data streams. Examples include applications, clients, SDK, Kinesis Producer Library (KPL) or Kinesis agents.
Producers send Records to the stream.
Records are made up of a Partition Key and the Data Blob. Partition Key determines which shard the record will be stored.
Data transfer is at the rate of 1 MB/sec or 1000 messages/sec per shard.

Consumers consume data from the stream. Examples include applications based on SDK or the Kinesis Client Library (KCL), Lambda Functions, Kinesis Data Firehose or Kinesis Data Analytics.
The record adds the sequence number to determine its relative position.
Consumption modes include 2 MB/sec per shard for all consumers (shared mode) or 2 MB/sec per shared per consumer (enhanced mode)

With enhanced fan-out developers can register stream consumers to use enhanced fan-out and receive their own 2MB/second pipe of read throughput per shard, and this throughput automatically scales with the number of shards in a stream.

Amazon Kinesis Data Streams are used to process the data streams as well as decouple the producers and consumers for the real-time data processor.

Retention can be set between 1 and 365 days.
Data once inserted cannot be deleted.
Messages with ame partition key are directed to the same shard.

Capacity Modes include Provisioned Mode and On-demand mode.

Access is controlled using IAM policies.

Encryption in flight is using TLS and encryption at rest is using KMS.

VPC Endpoints are available for Kinesis access within a VPC.

API calls can be monitored using CloudTrail.

Arch Amazon Kinesis Data Firehose 48 Kinesis Data Firehose

Fully managed, serverless service, with automatic scaling.

It is a near real time service.

Kinesis Data Firehose has the same producers as Kinesis Data Streams. In addition, Kinesis Data Streams is a producer for Firehose. Additional producers include Amazon CloudWatch (Logs and Events) and AWS IoT.

Records can be up to 1 MB in size.

Firehose can optionally choose to transform the data using Lambda functions.

AWS Lambda functions can be used to process records in the Kinesis Data Firehose delivery stream on-the-fly, with retention period adjusted based on the processing speed requirement.

Separate Amazon S3 buckets can be configured in respective regions for Kinesis Data Firehose to deliver data ensuring compliance with data sovereignty regulations i.e. data from each country must remain within that region.

Then Firehose writes data in batches into destinations.

Firehose Destinations:

AWS Destinations: Amazon S3, Amazon Redshift (via S3 using the copy command), Amazon OpenSearch
3^rd party Partner Destinations: Datadog, Splunk, New Relic, mongoDB
Custom Destinations: HTTP Endpoint

Data can be sent to S3 bucket as a backup.

Kinesis Data Analytics
Analyzes data streams with SQL or Apache Flink (Amazon Managed Service for Apache Flink)

Analytics for SQL can read data from Kinesis Data Streams or Kinesis Data Firehose.
Destinations include Kinesis Data Streams or Kinesis Data Firehose

Flink uses Java, Scala or SQL to process and analyze streaming data
Flink cannot read from Firehose. Only from Kinesis Data Streams or Amazon MSK
To read from Firehose use Kinesis Analytics for SQL instead.

Flink has advanced capabilities as compared to SQL.

Arch Amazon Kinesis Video Streams 48 Kinesis Video Streams

captures, processes and stores video streams

Amazon MQ

Amazon MQ is a managed message broker for ActiveMQ and RabbitMQ.
It supports open protocols like MQTT, AMQP, STOMP, Openwire and WSS.

Amazon EventBridge

See this

Amazon Simple Email Service (SES)

SES is used for automatic transactional messages or bulk email notifications.
For a marketing firm conducting large-scale email campaigns, using a dedicated IP pool in Amazon SES is a key strategy for maintaining high email delivery rates.
A dedicated IP pool allows the firm to manage its own email sending reputation, which is crucial for ensuring that their emails are not marked as spam and that they reach their intended recipients.

Step Functions

See this

Simple Workflow Service

It is a fully managed State tracker and Task Coordinator in the cloud.

It is similar to Step Functions.

It should be preferred to Step functions if you want to integrate external signals or if you want to launch child processes from parent processes.

We can use regular programming languages to build the functions.

Managed Workloads for Apache Airflow

A fully managed service to programmatically author workflows for Apache Airflow.

Big Data Ingestion Pipeline

The flow is as such:

IoT devices → Kinesis Data Streams → Kinesis Data Firehose → S3