Migration and Transfer

On a high level, the following are the steps involved in planning a migration activity from on-premise to AWS:

  • Assessment and Inventory

  • Categorization

  • Determining Cloud Services

  • Migration Planning

  • Migration Execution

Arch AWS Migration Hub 48 AWS Migration Hub

AWS Migration Hub assists in planning the migration activities to the AWS cloud.

It helps to discover details of the on-premise environment, either through agent-based or agent-less discovery strategies.

It generates a list of servers, CPU, Memory and Network utilization. It provides recommendations of target EC2 instances and the associated cost.

Arch AWS Application Discovery Service 48 AWS Application Discovery Service

AWS Application Discovery Service helps gather necessary information of your on-premise applications and infrastructure.
It helps identify relationships and dependencies between servers.
The data is pushed from agents to ADS every 15 minutes.

Arch AWS Application Migration Service 48 AWS Application Migration Service

AWS Application Migration Service performs the lift and shift of on-premise VMs to the cloud.
AWS has a staging area and a migrated resources area.

A Replication agent needs to be installed on the on-premise servers.

It integrates with AWS Systems Manager, S3 and Elastic Disaster Recovery.

It provides a continuous replication from on-prem to the staging area. Staging can be periodically moved to production.

Arch AWS Database Migration Service 48 AWS Database Migration Service

AWS Database Migration Service helps to migrate databases from on-premise to cloud.

AWS DMS enables you to seamlessly migrate data from supported sources to relational databases, data warehouses, streaming platforms, and other data stores in AWS cloud.
Example: DMS can act as a bridge between Amazon S3 and Amazon Kinesis Data Streams

DMS Fleet Advisor provides insight into the database.

AWS Schema Conversion Tool helps to change the schema. It can handle complex database configurations such as secondary indexes, foreign keys, and stored procedures.

A Replication Instance helps to replicate the data.

A Replication Task defines how the replication should occur.

You must create an EC2 instance that will handle the migration task.

Migration Types
Full Load

there is an associated downtime

Full Load + Change Data Capture (CDC)

no downtime

CDC only

native tool to copy the data, it checks for the delta

Migration could be homogenous (e.g. Oracle → Oracle) or heterogeneous (e.g. mongoDB → DocumentDB)

Use Case:
RDS MySQL → Aurora
Option 1: DB Snapshots from RDS MySQL are restored as MySQL Aurora DB
Option 2: Create a Aurora Read Replica from your RDS MySQL and when replication lag is 0, promote it as its own DB cluster (can take time and $$)

Use Case:
External MySQL → Aurora
Option 1: Use Percona XtraBackup to create a file backup in S3. Create Aurora MySQL DB from Amazon S3
Option 2: Create an Aurora MySQL DB. Use the mysqldump utility to migrate MySQL into Aurora - slower than S3

Use DMS if both databases are up and running.

Use Case:
RDS PostgreSQL→ Aurora
Option 1: DB Snapshots from RDS PostgreSQL restored as MySQL Aurora DB
Option 2: Create an Aurora Read Replica from your RDS PostgreSQL and when replication lag is 0, promote it as its own DB cluster (can take time and $$)

Use Case:
External PostgreSQL→ Aurora
Create a backup and put it in Amazon S3. Import it using the aws_s3 Aurora extension.

Use DMS if both databases are up and running

Arch AWS Elastic Disaster Recovery 48 AWS Elastic Disaster Recovery

To manage a disaster recovery site on-premise could be a costly affair.

AWS Elastic Disaster Recovery service is a fully managed disaster recovery service for physical, virtual and cloud based servers.

Disaster Recovery Types
On-premise to On-premise

traditional DR, very expensive

On-premise to Cloud

hybrid recovery

Cloud to Cloud

DR among two cloud regions

The below terminology is important for managing DR:

Recovery Point Objective (RPO)

Defines how often you need to run backups and how far back in time do you need to go to recover

Recovery Time Objective (RTO)

Defines the downtime till you recover

Strategies for DR
Backup / Restore

High RPO, High RTO, Most cheap option

Pilot Light

A small deployment is always ready on the cloud. Useful for critical workloads. The RDS is running, EC2 is not running.

Warm Standby

Full system is up and running but at minimal size. In case of disaster, we can scale to production load.

Hot Site / Multi Site Approach

Very low RTO, very expensive, Full Production Scale is running on AWS and on-premise aka active-active.

An AWS Replication Agent needs to be installed on any source server that needs to be backed up.
Staging area is the location where AWS receives the data.
A Launch template is used to configure the specifications of the recovery servers.

Elastic Disaster Recovery can minimize downtime and data loss with reliable recovery.

On the Cloud we have:

  1. The staging subnet which replicates all data to an EBS volume per source disk

  2. The recovery subnet

S3 can be used for Disaster Recovery and it offers 11 9s of data durability.

EBS Snapshots can be used for DR. The snapshots are incremental.

Arch AWS Backup 48 AWS Backup

AWS Backup is a fully managed service that centrally manages and automates backups across AWS services.

Backup Vault stores your data.
Backup Plan defines the configuration for backup.
Recovery Point is the point of time that data can be restored.
AWS Backup Vault Lock enforces the WORM model i.e. Write Once Read Many.

Backups cannot be deleted even by the root user.

Arch AWS Mainframe Modernization 48 AWS Mainframe Modernization

AWS Mainframe Modernization moves mainframes present on-premise to the cloud and changes the runtime to a modern system.

Supports Refactoring and Re-platforming.

Arch AWS DataSync 48 AWS DataSync

AWS DataSync helps to move large amount of data to and from AWS.

Components include:
Agent: to be installed on the source
Location: defines the source and destination
Task: describes the transfer (blueprint)
Task Execution: actual execution of a task

A Task could be in one of the following states:

  • Available

  • Running

  • Unavailable (agent offline)

  • Queued (another task is using the agent)

A Task Execution could be in one of the following states:

  • Queued

  • Launching

  • Preparing

  • Transferring

  • Verifying

  • Success

  • Error

A DataSync agent 35 connects to the on-premise data center and sends the data to the DataSync Discovery service 35 so that it can provide the ideal recommendations for your setup.

Example 1. On-premise data to EFS using DataSync

Configure an AWS DataSync agent on the on-premise server that has access to the on-premise NFS file system. Transfer data over the AWS Direct Connect connection to an AWS PrivateLink interface VPC endpoint for Amazon EFS by using a private virtual interface (VIF). Set up an AWS DataSync scheduled task to send the files to the Amazon EFS file system every 24 hours.

Arch AWS Transfer Family 48 AWS Transfer Family

AWS Transfer Family is a secure transfer service that enables you to transfer files in and out of AWS storage services.

It works with S3 and EFS.

Transfer Family Server is a fully managed and highly available FTP server where files can be hosted.

It supports the following protocols:

Res AWS Transfer Family AWS SFTP 48 Res AWS Transfer Family AWS FTP 48 Res AWS Transfer Family AWS FTPS 48 Res AWS Transfer Family AWS AS2 48

We can implement AWS Transfer Family with automated lifecycle policies to transition older data to more cost-effective storage classes.

AWS Snow Family

AWS Snow Family consists of a set of physical hardware devices that facilitate copying data to and from AWS.

They are rugged, portable and highly secure.

They are recommended when it takes more than 1 week to transfer data on the network.

Apart from migrating data, some devices can collect and process data at the edge.

The following devices support Data Migration:

Arch AWS Snowcone 48 Snowcone Arch AWS Snowball Edge 48 Snowball Edge Arch AWS Snowmobile 48 Snowmobile

The following devices support Edge computing:

Arch AWS Snowcone 48 Snowcone Arch AWS Snowball Edge 48 Snowball Edge

Snowball Edge

Can move terabytes/ petabytes of data. It can provide block storage and Amazon S3 compatible object storage.
It weighs 50 pounds and has compute power.
It is generally used for large data cloud migrations, decommissioning a data center or in disaster recovery setup.

Snowball Edge Storage Optimized

80 TB of HDD capacity for block volume and S3 compatible object storage, 40 vCPU, 80 GiB memory.

Snowball Edge Compute Optimized

42 TB of HDD or 28 TB NVMe capacity for block volume and S3 compatible storage, 104 vCPUs, 416 GiB memory, option GPU, Storage Clustering available (up to 16 nodes).

Snowcone

A small and portable device which has compute power. + It weights only 4.5 pounds and is available in SSD and HDD.
Snowcone is used in scenarios where Snowball does not fit (usually space-constrained environments).
Customer must use their own batteries and cables.
Data can be sent to AWS either offline or by connecting it to the internet and using AWS DataSync.

Snowcone

8 TB of HDD storage, 2 CPUs, 4 GB memory

Snowcone SSD

14 TB of SSD storage, 2 CPUs, 4 GB memory

Snowmobile

Is a truck that can transfer exabytes of data.
1 EB = 1000 PB = 1,000,000 TB.
Each Snowmobile has a capacity of 100 PB.
It is highly secured, with temperature control, GPS and 24/7 video surveillance.
It is preferred to Snowball if you transfer more than 10 PB of data.

Edge computing is processing data while its being created on an edge location. Edge locations have limited or no internet access.
They can run EC2 Instances and AWS Lambda Functions.

AWS OpsHub is a CLI to manage your Snow Family Device.

Snowball family cannot import to S3 Glacier directly. The flow is as follows:

Snowball → S3 → S3 Lifecycle Policy → S3 Glacier

Snowball devices can be used to replicate data from AWS to on-premises as well.
Example: From S3 bucket to storage on an on-premise datacenter.