Databases
Amazon Relational Database Service (AWS RDS)
AWS RDS is a managed service for popular RDBMS based databases. It simplifies the routine operations of high availability, fault tolerance, scalability, backup and restore, monitoring, performance, security and compliance.
You cannot SSH into the underlying EC2 instance.
The following databases are covered:
-
MySQL
-
Postgresql
-
MariaDB
-
Oracle
-
Microsoft SQL Server
General Purpose and Memory Optimized instances can be provisioned.
- Deployment Types
-
- Single AZ
-
for dev and staging environments
- Multi AZ
-
for production environments with automatic backup to another AZ, automatic failover in case an AZ goes down, synchronous replication.
It provides one DNS name across AZs.
Read Replicas are used to load balance read requests, writes happen only to one instance.
Upto 15 read replicas can be provisioned within AZ, Cross AZ or Cross Region.
Read replicas can be converted into a standalone database of it own.
Read replicas follow an asynchronous replication model.
Read replicas can be set for Multi AZ.
They should be used only for Select statements.
Cross region read replicas are possible.
- Storage Types
-
- General Purpose SSD
-
cost-effective, development purpose
- Provisioned iOPS SSD
-
IO intensive, low latency, for production env
- Magnetic
-
obsolete, hard disk drives, being phased out
- RDS Configurations
-
options and parameters to customize RDS
- DB Parameter Groups
-
collection of parameters that can fine tune performance, security, resource allocation
- DB Option groups
-
parameters to manage encryption, performance, security
- Subnet groups
-
defines subnets where DB will be deployed, help control the network configuration of the database
- Security groups
-
control inbound and outbound traffic
- Db snapshots
-
backup of your db instances
- Parameter store
-
store configuration data
- Performance insights
-
analyze performance of database with visuals
- Enhanced monitoring
-
assist with troubleshooting
- Audit and log data
-
to help track database activity
- Supports SSL and encryption
-
to protect data at rest and in transit
RDS Backups: Transactions are backed-up every 5 minutes with a maximum 35-day retention period.
RDS at-rest encryption must be decided at launch time. If the master is not encrypted, read replicas cannot be encrypted
In-flight encryption is available. Clients must use the AWS TLS root certificates.
Authentication can be through username / password or IAM roles.
Network access is governed through security groups.
Amazon Aurora
Amazon Aurora is a cloud based RDS fully compatible with MySQL and PostgreSQL.
The main difference between traditional RDS and Aurora is on how the storage is maintained.
Aurora’s features include
-
Purpose-built Log-structured distributed storage
-
Storage volume is striped across hundreds of storage nodes
-
Storage nodes with locally attached SSDs
-
Continuous backup to S3
-
Storage volume spread across Azs offering built-in HA
Storage is divided into 10 GB protection group (PG).
Data in each protection group is replicated across 6 storage nodes, two in each AZ.
Storage volume grows automatically by adding PGs, up to 128 TB.
Aurora follows a quorum model.
Each Aurora cluster has 1 Primary Instance and other replica instances.
Each Aurora cluster can have up to 15 replicas
Two flavours:
-
Provisioned - Fixed capacity
-
Serverless - on-demand scaling
For Aurora Serverless, the unit of capacity measure is Aurora Capacity unit (ACU)
Min 0.5 Max 128 (1 ACU = 2 GiB memory, corresponding CPU and networking)
Writer Endpoint is a DNS name that is always pointing to the master.
Reader Endpoint is for Connection Load Balancing
If a Custom Endpoint is defined then the Reader Endpoint is generally not used anymore.
Aurora Database Cloning can be used to create a new DB cluster from an existing one
RDS Proxy
RDS Proxy establishes a pool of database connections and reuses those.
It can use IAM for database access.
It can utilize secrets manager.
It reduces failover times by 66%.
It supports serverless and auto-scaling.
It is never publicly accessible.
Amazon Redshift
Amazon Redshift is a fully managed, petabyte scaled data warehouse.
It is like a relational database and is based on PostgreSQL.
It comprises two types of nodes:
-
Leader node: manages query coordination, compilation and optimization.
-
Compute node: stores data and executes queries and computations.
It provides automatic data sampling and compression.
It offers encryption at rest
It offers fine-grained access control
Data inserts are support for:
-
Amazon Kinesis Data Firehose
-
S3 using COPY command
-
EC2 using JDBC driver
Enhanced VPC Routing in Redshift forces all COPY and UNLOAD traffic moving between your cluster and data repositories through your VPCs
Redshift Serverless allows you to pay only for what you use.
Measures data warehouse capacity in Redshift Processing Units (RPUs)
DynamoDB
Amazon DynamoDB is a no-sql database with transaction support.
It is the best fit to handle workloads in which the schema is rapidly evolving.
It is schemaless.
There are two types of Read/Write capacity settings available, a Provisioned Mode and an On-Demand Mode.
The maximum size of an item in a DynamoDB table is 400 KB
Two types of Primary Keys are available:
-
Partition Key - regular primary key
-
Composite Primary Key - Partition Key + Sort Key
Two types of Secondary Index:
-
Global Secondary Index - Partition key and sort key different from base table (Limit - 20)
-
Local Secondary Index - Same partition key as the table but a different sort key (Limit - 5). It must be configured at the time of table creation.
DynamoDB is serverless with no servers to provision, patch, or manage and no software to install, maintain or operate.
It provides both provisioned (specify RCU & WCU) and on-demand (pay for what you use) capacity modes.
RCU and WCU are decoupled. Each value can be adjusted independently.
DynamoDB Stream allows you to capture a time-ordered sequence of item-level modifications in a DynamoDB table. It’s integrated with AWS Lambda so that you create triggers that automatically respond to events in real-time. DynamoDB Stream enable DynamoDB to get a changelog and use that changelog to replicate data across replica tables in other AWS Regions.
DynamoDB table with TTL is best to store users' session with automatic expiry and deletion of expired user sessions.
The following table classes are available:
-
Standard Access
-
Standard Infrequent Access
-
On-Demand / Provisioned capacity
Import/Export from S3 is without usage of RCU & WCU.
Amazon OpenSearch
Amazon OpenSearch is Amazon’s fork of ElasticSearch for licensing reasons.
Every OpenSearch Cluster has an OpenSearch Service Domain.
The OpenSearch Ingestion Pipelines transfer the data from sources like Fluent Bit, OTEL, A3 to the OpenSearch Service Domains.
It can operate is a managed cluster or serverless.
OpenSearch Dashboard can be used for Visualization.
Amazon ElastiCache
Amazon ElasticCache is a fully managed, highly performant cache based on Redis and Memcached.
ElastiCache for Redis
Multi AZ deployment with auto-failover, supports read replicas, data persistence, encryption at rest.
Works in a pub/sub mode.
Passwords/tokens can be set for authentication.
Supports SSL in flight encryption
ElastiCache for Memcached
Multi-AZ deployment with auto discovery, data partitioning and sharding.
Supports SASL-based authentication
Application changes are needed to enable the cache.
Supports Lazy Loading, Write Through and Session Store.
Gaming Leaderboard is a good use-case with Redis Sorted Sets
We cannot use SQL on ElastiCache.
Amazon MemoryDB for Redis
Amazon MemoryDB for Redis gives you the ability to use Redis as your main DB along with the Caching features.
Nodes run on shards. Shards are part of a cluster. They run on EC2.
It consists of one primary node and other read replicas.
It stores data in-memory and uses multi-AZ transactional logs to pass data into hundreds on nodes.
Amazon DocumentDB
Amazon DocumentDB is a MongoDB compatible managed database.
It stores 6 copies of data in 3 AZs.
It can have 0 - 16 instances.
It possessed one primary instance with multiple replicas.
It can offer a maximum storage of 128 TB.
Amazon Keyspaces
Amazon Keyspaces is a scalable and highly available Apache Cassandra service.
It handles deployment, management and scaling of Cassandra tables.
It is serverless and works on CQL query language.
It comprises of On-demand and Provisioned modes.
It has the capability of global replication with active-active support.
It is used for applications that require low latency.
Some common use cases include trade monitoring, fleet management and route optimization.
Amazon Neptune
Amazon Neptune is a serverless, fully managed graph database. It is good for knowledge graphs, fraud detection, recommendation engines and social networking.
It is spread across 3 Azs and 15 read replicas.
Neptune ML is used in graph neural Networks and provides fast and accurate predictions.
It supports Open Graph APIs.