Databases

Amazon Relational Database Service (AWS RDS)

AWS RDS is a managed service for popular RDBMS based databases. It simplifies the routine operations of high availability, fault tolerance, scalability, backup and restore, monitoring, performance, security and compliance.

You cannot SSH into the underlying EC2 instance.

To facilitate secure access to the database, Amazon RDS should be configured to use SSL/TLS for data in transit.

The following databases are covered:

MySQL
Postgresql
MariaDB
Oracle
Microsoft SQL Server

General Purpose and Memory Optimized instances can be provisioned.

Deployment Types

Single AZ: for dev and staging environments
Multi AZ: for production environments with automatic backup to another AZ, automatic failover in case an AZ goes down, synchronous replication.
It provides one DNS name across AZs.

Read Replicas are used to load balance read requests, writes happen only to one instance.
Upto 15 read replicas can be provisioned within AZ, Cross AZ or Cross Region.
Read replicas can be converted into a standalone database of it own.
Read replicas follow an asynchronous replication model.
Read replicas can be set for Multi AZ.
They should be used only for Select statements.
Cross region read replicas are possible.

Storage Types

General Purpose SSD: cost-effective, development purpose
Provisioned iOPS SSD: IO intensive, low latency, for production env
Magnetic: obsolete, hard disk drives, being phased out

RDS Configurations

options and parameters to customize RDS

DB Parameter Groups: collection of parameters that can fine tune performance, security, resource allocation
DB Option groups: parameters to manage encryption, performance, security
Subnet groups: defines subnets where DB will be deployed, help control the network configuration of the database
Security groups: control inbound and outbound traffic
Db snapshots: backup of your db instances
Parameter store: store configuration data
Performance insights: analyze performance of database with visuals
Enhanced monitoring: assist with troubleshooting
Audit and log data: to help track database activity
Supports SSL and encryption: to protect data at rest and in transit

RDS Backups: Transactions are backed-up every 5 minutes with a maximum 35-day retention period.

Amazon RDS Custom provides access to the underlying EC2 instance for Oracle and Microsoft SQL Server. It allows the Database Administrator (DBA) to access and customize the database environment and the underlying operating system.

RDS at-rest encryption must be decided at launch time. If the master is not encrypted, read replicas cannot be encrypted

In-flight encryption is available. Clients must use the AWS TLS root certificates.

Authentication can be through username / password or IAM roles.

Network access is governed through security groups.

Amazon Aurora

Amazon Aurora is a cloud based RDS fully compatible with MySQL and PostgreSQL.

The main difference between traditional RDS and Aurora is on how the storage is maintained.

Aurora’s features include

Purpose-built Log-structured distributed storage
Storage volume is striped across hundreds of storage nodes
Storage nodes with locally attached SSDs
Continuous backup to S3
Storage volume spread across Azs offering built-in HA

Storage is divided into 10 GB protection group (PG).
Data in each protection group is replicated across 6 storage nodes, two in each AZ.
Storage volume grows automatically by adding PGs, up to 128 TB.

Aurora follows a quorum model.

Each Aurora cluster has 1 Primary Instance and other replica instances.
Each Aurora cluster can have up to 15 replicas

Two flavours:

Provisioned - Fixed capacity
Serverless - on-demand scaling

Aurora Global is a multiple database clusters operating as 1 cluster. Amazon Aurora Global Database is used to enable fast local reads with low latency in each region.

For Aurora Serverless, the unit of capacity measure is Aurora Capacity unit (ACU)
Min 0.5 Max 128 (1 ACU = 2 GiB memory, corresponding CPU and networking)

Writer Endpoint is a DNS name that is always pointing to the master.
Reader Endpoint is for Connection Load Balancing
If a Custom Endpoint is defined then the Reader Endpoint is generally not used anymore.

To separate the read requests from the write requests in Aurora, a read replica is set up and the application is modified to use the appropriate endpoint.
This option should be used as a way to offload read traffic for read-heavy workloads that require scalability and high availability.

Aurora Database Cloning can be used to create a new DB cluster from an existing one

Amazon Aurora Replicas can make an application more resilient to periodic spikes in request rates.

RDS Proxy

RDS Proxy establishes a pool of database connections and reuses those.
It can use IAM for database access.
It can utilize secrets manager.
It reduces failover times by 66%. It supports serverless and auto-scaling. It is never publicly accessible.

Amazon Redshift

Amazon Redshift is a fully managed, petabyte scaled data warehouse.

It is like a relational database and is based on PostgreSQL.

It comprises two types of nodes:

Leader node: manages query coordination, compilation and optimization.
Compute node: stores data and executes queries and computations.

It provides automatic data sampling and compression.

It offers encryption at rest

It offers fine-grained access control

Data inserts are support for:

Amazon Kinesis Data Firehose
S3 using COPY command
EC2 using JDBC driver

Redshift Spectrum allows you to create Amazon Redshift cluster tables that query data which is in S3 without loading it into Redshift tables.

Enhanced VPC Routing in Redshift forces all COPY and UNLOAD traffic moving between your cluster and data repositories through your VPCs

Redshift Serverless allows you to pay only for what you use.
Measures data warehouse capacity in Redshift Processing Units (RPUs)

DynamoDB

Amazon DynamoDB is a no-sql database with transaction support.
It is the best fit to handle workloads in which the schema is rapidly evolving.

It is schemaless.

There are two types of Read/Write capacity settings available, a Provisioned Mode and an On-Demand Mode.

The maximum size of an item in a DynamoDB table is 400 KB

Two types of Primary Keys are available:

Partition Key - regular primary key
Composite Primary Key - Partition Key + Sort Key

Two types of Secondary Index:

Global Secondary Index - Partition key and sort key different from base table (Limit - 20)
Local Secondary Index - Same partition key as the table but a different sort key (Limit - 5). It must be configured at the time of table creation.

DynamoDB Accelerator (DAX) is an in-memory cache.
It delivers 10x performance improvement. It maintains a cluster of nodes and is easy to integrate. It writes data to cache and table simultaneously. It lowers cost at scale.

DynamoDB can have Global Tables. They provide active-active replication, which means you can read and write from any table. This enables low-latency access to data globally and ensures high availability.

DynamoDB is serverless with no servers to provision, patch, or manage and no software to install, maintain or operate.

It automatically scales tables up and down to manage capacity and to maintain performance in response to traffic patterns.

It provides both provisioned (specify RCU & WCU) and on-demand (pay for what you use) capacity modes.

RCU and WCU are decoupled. Each value can be adjusted independently.

Res Amazon DynamoDB Stream 48 DynamoDB Stream allows you to capture a time-ordered sequence of item-level modifications in a DynamoDB table. It’s integrated with AWS Lambda so that you create triggers that automatically respond to events in real-time. DynamoDB Stream enable DynamoDB to get a changelog and use that changelog to replicate data across replica tables in other AWS Regions.

DynamoDB table with TTL is best to store users' session with automatic expiry and deletion of expired user sessions.

The following table classes are available:

Standard Access
Standard Infrequent Access
On-Demand / Provisioned capacity

Import/Export from S3 is without usage of RCU & WCU.

Amazon OpenSearch

Amazon OpenSearch is Amazon’s fork of ElasticSearch for licensing reasons.

Every OpenSearch Cluster has an OpenSearch Service Domain.

The OpenSearch Ingestion Pipelines transfer the data from sources like Fluent Bit, OTEL, A3 to the OpenSearch Service Domains.

It can operate is a managed cluster or serverless.

OpenSearch Dashboard can be used for Visualization.

Amazon ElastiCache

Amazon ElasticCache is a fully managed, highly performant cache based on Redis and Memcached.

It is highly available as well as HIPAA-compliant and can be used as an in-memory database that supports caching results of SQL queries.

Res Amazon ElastiCache ElastiCache for Redis 48 ElastiCache for Redis

Multi AZ deployment with auto-failover, supports read replicas, data persistence, encryption at rest.
Works in a pub/sub mode.
Passwords/tokens can be set for authentication.
Supports SSL in flight encryption

Res Amazon ElastiCache ElastiCache for Memcached 48 ElastiCache for Memcached

Multi-AZ deployment with auto discovery, data partitioning and sharding.
Supports SASL-based authentication

Application changes are needed to enable the cache.

Supports Lazy Loading, Write Through and Session Store.

Gaming Leaderboard is a good use-case with Redis Sorted Sets

We cannot use SQL on ElastiCache.

Setting up an Amazon ElastiCache in front of Amazon RDS can help to deal with the high volume of read traffic, reduce latency and downsize the instance size to cut costs.

Amazon MemoryDB for Redis

Amazon MemoryDB for Redis gives you the ability to use Redis as your main DB along with the Caching features.

Nodes run on shards. Shards are part of a cluster. They run on EC2.

It consists of one primary node and other read replicas.

It stores data in-memory and uses multi-AZ transactional logs to pass data into hundreds on nodes.

Amazon DocumentDB

Amazon DocumentDB is a MongoDB compatible managed database.

It stores 6 copies of data in 3 AZs.

It can have 0 - 16 instances.

It possessed one primary instance with multiple replicas.

It can offer a maximum storage of 128 TB.

DocumentDB can be provisioned with a Multi-AZ deployment for high availability and automatic failover.

Amazon Keyspaces

Amazon Keyspaces is a scalable and highly available Apache Cassandra service.

It handles deployment, management and scaling of Cassandra tables.

It is serverless and works on CQL query language.

It comprises of On-demand and Provisioned modes.

It has the capability of global replication with active-active support.

It is used for applications that require low latency.

Some common use cases include trade monitoring, fleet management and route optimization.

Amazon Neptune

Amazon Neptune is a serverless, fully managed graph database. It is good for knowledge graphs, fraud detection, recommendation engines and social networking.

It is spread across 3 Azs and 15 read replicas.

Neptune ML is used in graph neural Networks and provides fast and accurate predictions.

It supports Open Graph APIs.

At a social media company, Neptune can handle complicated queries such as returning the number of likes on the videos that have been posted by friends of a user.

Quantum Ledger Database (QLDB)

QLDB is used as a ledger or book for financial transactions.

It can be queries using SQL.

Its main features are that it is transparent, immutable and cryptographically verifiable.

PartiQL is the query language.

It provides a current table as well as a history table.

Amazon Timestream

Amazon Timestream is used in use cases related to iOT and security camera inputs.

It can capture trillions of events a day.

Recent data is stored in-memory and historical data is stored in a cost optimized storage.