Storage

Arch Amazon Elastic Block Store 48 Elastic Block Store (EBS)

In EBS, data is stored in blocks. Collection of blocks are presented as a Volume.

EBS is mountable as a file system as well as bootable for an OS.

EBS is a network drive attached to an instance.

By default, the root volume for an AMI backed by Amazon EBS is deleted when the instance terminates. You can change the default behavior to ensure that the volume persists after the instance terminates. Non-root EBS volumes remain available even after you terminate an instance to which the volumes were attached.

Only one EBS volume can be attached per instance and the volume is usually bound within 1 AZ. An exception is the multi-attach volumes which can be used if the application is smart enough to have only one instance actively writing.

Instances communicate with EBS via a network.

EBS volumes can be detached from one instance and attached to another.

EBS volumes are redundant within an AZ.

EBS Snapshots are used to transfer data of EBS from one AZ to another. The snapshot images are stored in S3.

Volume Types
Solid state drive (SSD) volumes

SSD-backed volumes are optimized for transactional workloads involving frequent read/write operations with small I/O size, where the dominant performance attribute is IOPS.

General Purpose SSD (gp2 and gp3)

Good balance between price and performance. gp3 is the latest generation of general purpose volumes and are the cheapest.
Baseline I/O performance for General Purpose SSD storage is 3 IOPS for each GiB.

Provisioned iOPs SSD (io1 and io2)

Provide the highest performance. They are designed to meet I/O intensive workloads like database and supports multi-attach feature.
A Provisioned IOPS SSD EBS volume provides up to 64,000 IOPS for each volume.

Hard disk drive (HDD) volumes

HDD-backed volumes are optimized for large streaming workloads where the dominant performance attribute is throughput.

Throughput Optimized HDD volumes (st1)

slower than SSD

Cold HDD volumes (sc1)

used for infrequent access, where having the lowest cost is important.

Magnetic

Previous generation volumes backed by magnetic drives. Used when data is accessed infrequently and performance is not a problem.
Magnetic storage does not support IOPS as a configurable parameter.

EBS volumes and EBS snapshots are charged per GB per Month.

Amazon EBS volumes can be encrypted. The following are the features of an encrypted EBS volume:

  • Data at rest inside the volume is encrypted

  • Any snapshot created from the volume is encrypted

  • Data moving between the volume and the instance is encrypted

Instance Store

Instance store provides temporary block level storage. It is located physically on disks attached to the host computer that your EC2 instance runs on.

The instance store has no additional cost, compared with the regular hourly cost of the instance.
Not all EC2 instance types support Instance Store. View the list here

Arch Amazon EFS 48 Elastic File System (EFS)

Amazon Elastic File System (Amazon EFS) provides serverless, fully elastic file storage so that you can share file data without provisioning or managing storage capacity and performance.

It supports NFSv4.

EFS does not work with Windows based EC2 instances.

EFS can be shared across multiple EC2 instances.

EFS is VPC specific.

EFS is made available by a mount target. Mount target is deployed in a subnet, it is just an IP address.

EFS file system types
Regional

data is available even if one or more AZz in a region are unavailable.

One Zone

continuous availability within a single AZ

EFS storage classes
EFS Standard

uses solid state drive (SSD) storage for lowest latency. Good for frequently accessed files.

EFS Infrequent Access

cost-optimized storage class for data that is accessed only a few times each quarter.

EFS Archive

cost-optimized storage for data that is accessed a few times each year or less.

EFS Performance
General Purpose Performance Mode

Ideal for latency sensitive applications

Max I/O Performance Mode

can scale to higher levels of aggregate throughput and operations per second

Elastic Throughput Mode

automatically scales throughput performance up or down to meet the needs of the workload

Provisioned Throughput Mode

level of throughput the file system can drive independent of the file system size

Bursting Throughput Mode

Scales with the amount of storage in your file system and supports bursting to higher levels for up to 12 hours oer day

As EFS is a file system storage, you cannot mount an Operating System on it.

Arch Amazon FSx 48 FSx

A fully managed service that provides a high performance file storage. It takes care of file servers and volumes, replication of data, patching the file server, addressing hardware issues and backups.

FSx Flavours
FSx for Windows File Server

fully compatible with Windows file servers, based on Windows Server OS. Supports Server Message Block (SMB) protocol. Can integrate with Microsoft Active Directory.

FSx for Lustre

optimized for high performance parallel file processing. It is built on top of the Lustre file system. It is good for ML and data analytic workloads. Supports only single AZ deployment (both SSD and HDD options are available).

FSx for NetAPP ONTAP

It is built on top of NetApp’s file system. It is compatible with NFS, SMB, iSCSI protocol.

FSx for OpenZFS

a managed OpenZFS file system, compatible with NFS only.

FSx for Lustre provides the ability to process both the 'hot data' in a parallel and distributed fashion as well as easily store the 'cold data' on Amazon S3.

Res Amazon Simple Storage Service S3 Standard 48 S3 - Simple Storage Service

S3 is a service that provides object based storage. It is a global service, however a bucket is created within a region.

S3 cannot be used for file-based or block-based storage.

Objects created in S3 have a key, value and version/metadata.

S3 has a flat file structure, the folders are an illusion.

S3 is designed for high availability.

S3 bucket names must be unique globally across all AWS accounts.

S3 can handle unlimited number of objects.

The maximum size of an S3 bucket is 5 TB.

100 buckets are supported by default. However, this number can be increased to 1000 by requesting a service limit increase.

Multipart upload can break an object into parts before uploading to S3. Multipart upload is recommended once the size goes over 100 MB.

S3 Transfer acceleration uses the Amazon Edge locations for faster uploads and downloads.

Multipart uploads and S3 Transfer Acceleration are useful to mitigate delays in uploading large files to the destination S3 bucket.
All Amazon S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what’s in the bucket.

S3 Byte Range fetches can fetch a part of a file.

S3 Select and Glacier Select can filter data on server side using sql.

Only empty S3 buckets can be deleted.

S3 Batch Operator is used for bulk operations like encrypting all unencrypted objects. S3 Inventory can be used to filter the object list

aws S3 sync command helps to copy data from a source bucket to a destination bucket.

Example
aws s3 sync s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET
S3 Storage Classes
S3 Standard

This is the default storage class and offers replication across at least 3 AZs. The customer is charged for the data stored and the egress fees.

S3 Standard IA

This is for data that is accessed less frequently, but requires rapid access when needed. It has a retrieval fee, minimum duration charge of 30 days, minimum size charge of 128 KB.

S3 One Zone IA

cheaper cost as data is stored only in one AZ. It has a retrieval fee, minimum duration charge of 30 days, minimum size charge of 128 KB. It should be used for objects that are easily recoverable.

S3 Glacier Instant Retrieval

Low-cost option for rarely accessed data. Minimum duration charge of 90 days, minimum size charge of 128 KB.

S3 Glacier Flexible

Objects are not immediately available, they are not publicly accessible, minimum duration of 90 days, minimum size charge of 40 KB (Provides Bulk, Expedited, Standard options)

S3 Glacier Deep Archive

Cheapest option, objects are not immediately available. Minimum duration of 180 days, minimum size charge of40 KB (Provides Standard and Bulk options)

S3 Intelligent-Tiering

Intelligently moves data to the most cost-effective tier. Extra monitoring cost is to be paid.

The minimum storage duration is 30 days before you can transition objects from Amazon S3 Standard to Amazon S3 One Zone-IA.

Header to select the storage class is x-amz-storage-class

Lifecycle rules can be created to automatically move objects or even to delete objects.

S3 Analytics can be used for Storage Class Analysis.

S3 Versioning is used to maintain a history of objects and helps to recover deleted objects. Versioning is enabled at a bucket level, not at object level. Once versioning is enabled, it cannot be disabled, it can only be suspended. You are charged for all versions of your object.

MFA delete provides extra security to make any change in the versioning state of your bucket. It can be enabled only through CLI.

Versioning and MFA delete when used together can provide adequate protection against accidental deletion of objects.

3 states in versioning are not versioned, versioning enabled, versioning suspended.

S3 Replication is used to create copies of objects. For replication, it is necessary that versioning is enabled in all regions participating in the replication. It is asynchronous.

There are two modes of replication:

  1. Cross Region Replication (CRR)

  2. Same Region Replication (SRR)

Once replication is enabled, only the new objects are replicated. For existing objects there is an option to enable it through S3 Batch Replication.

Amazon S3 Batch Replication provides you a way to replicate objects that existed before a replication configuration was in place, objects that have previously been replicated, and objects that have failed replication. This is done through the use of a Batch Operations job.

By default, delete markers are not replicated, but can be enabled. Only delete markers are replicated, not the actual deletes.

S3 ACLs, IAM and Resource Policies

By default, all buckets are locked out. Only the user who created the bucket and the root user has access to the bucket.

S3 Bucket Policies determine who can access the bucket. Bucket policies in Amazon S3 can be used to add or deny permissions across some or all of the objects within a single bucket. Policies can be attached to users, groups, or Amazon S3 buckets, enabling centralized management of permissions. With bucket policies, you can grant users within your AWS Account or other AWS Accounts access to your Amazon S3 resources.

IAM Policy determines what a user can do. With IAM policies, you can only grant users within your own AWS account permission to access your Amazon S3 resources.

Resource Policy determines what can be done on a resource.

Both IAM Policy and Resource Policy should pass so that a User has access to a bucket

Access Control Lists (ACLs) give read or write access on buckets or objects to groups of users. With ACLs, you can only grant other AWS accounts (not specific users) access to your Amazon S3 resources.

ACLs are deprecated. IAM Policies are preferred.
Static Hosting

S3 provides the ability to host static websites and provides a URL for the website. Every request to the website is charged. For a custom domain, the bucket name should match the domain name.

S3 is the preferred service for distributing static content.
Amazon CloudFront can be used with Amazon S3 for huge improvements in an application’s performance to serve static content.
Pre-Signed URLs

Pre-Signed URLs are used to impersonate an authorized user and can be provided to people who do not have AWS accounts. A Pre-Signed URL must have an expiration date with a maximum of 7 days.

Access Points

Access Points can be used for creating access for a group like developers or managers. Access Points get their own ARN. They can also work for group of resources. The Access Point policies need to be copied to the bucket policy. They can be accessed internally within the VPC using Gateway or Interface endpoint. VPC Endpoint Policy must allow access to the target bucket and Access point.

Requester Pays is an option to bill the requester for the network cost.

Event Notifications are used for auditing information to SQS, SNS and Lambda functions. Another option is to use EventBridge.

S3 Encryption can be enabled for both Client side and Server side.
Server side encryption has the following options:

  1. Server side encryption with S3 Managed keys - default (SSE-S3)

  2. Server side encryption with KMS Keys stored in AWS KMS (SSE-KMS)

  3. Server side encryption with customer provided keys (SSE-C)

Both server side encryption with customer provided keys (SSE-C) and client side encryption ensure that the encryption keys will be stored on-premises at all times.
If a client makes a cross-origin request on our S3 bucket, we need to enable the correct CORS headers. This is known as the scenario of Cross-Origin Resource Sharing.
Example: Web browsers will block running a script that originates from a server with a domain name that is different from the webpage. Amazon S3 can be configured with CORS to send HTTP headers that allow the script to run.

Glacier Vault Lock and S3 Object Lock are used to prevent deletion of objects.

S3 Object Lambda can be used to change the object just before it is retrieved.

S3 Sharding is a feature that stores data in shards across multiple nodes. S3 manages the sharding and reordering of large files that are uploaded in parts.

Arch AWS Storage Gateway 48 Storage Gateway

Storage Gateway is a hybrid cloud storage service. It allows your on-premise applications to integrate and use the cloud storage services.
It can be used as an extension for your on-premise storage needs.
It can also assist migrations into the cloud.
Other use-cases include backups and disaster recovery.
The Storage Gateway is either a physical machine or a virtual machine that is provisioned on your on-premise environment.

Storage Gateway flavours

The flavour of storage gateway to be used depends on the current set-up where the file is stored.

Volume

for files stored on block storage. Uses the iSCSI protocol.

Cached mode

data is not stored locally on-premise. All data is stored in S3.

Stored mode

mimics the behaviour of the network attached storage present on-premise. Data is stored locally on-premise. Data is replicated asynchronously to AWS S3.

File

for files stored on file storage. Uses the NFS or SMB protocol.

Amazon S3 File Gateway

uses NFS or SMB protocol for the application server on-premise, in the background it is using S3

Amazon FSx File Gateway

native access to Amazon FSx for Windows File Server

Tape

for files stored on magnetic storage. Uses the iSCSI protocol.

Use cases:

  • Disaster recovery

  • Backup and restore

  • Tiered storage

  • On-premises cache and low-latency file access

The Storage Gateway has to be installed on the corporate data center. If you don’t have the hardware on-premise, you can order it from Amazon via the Storage Gateway Hardware Appliance.