Storage
Elastic Block Store (EBS)
In EBS, data is stored in blocks. Collection of blocks are presented as a Volume.
EBS is mountable as a file system as well as bootable for an OS.
EBS is a network drive attached to an instance.
Only one EBS volume can be attached per instance and the volume is usually bound within 1 AZ. An exception is the multi-attach volumes which can be used if the application is smart enough to have only one instance actively writing.
Instances communicate with EBS via a network.
EBS volumes can be detached from one instance and attached to another.
EBS volumes are redundant within an AZ.
EBS Snapshots are used to transfer data of EBS from one AZ to another. The snapshot images are stored in S3.
- Volume Types
-
- Solid state drive (SSD) volumes
-
SSD-backed volumes are optimized for transactional workloads involving frequent read/write operations with small I/O size, where the dominant performance attribute is IOPS.
- General Purpose SSD (gp2 and gp3)
-
Good balance between price and performance. gp3 is the latest generation of general purpose volumes and are the cheapest.
Baseline I/O performance for General Purpose SSD storage is 3 IOPS for each GiB. - Provisioned iOPs SSD (io1 and io2)
-
Provide the highest performance. They are designed to meet I/O intensive workloads like database and supports multi-attach feature.
A Provisioned IOPS SSD EBS volume provides up to 64,000 IOPS for each volume.
- Hard disk drive (HDD) volumes
-
HDD-backed volumes are optimized for large streaming workloads where the dominant performance attribute is throughput.
- Throughput Optimized HDD volumes (st1)
-
slower than SSD
- Cold HDD volumes (sc1)
-
used for infrequent access, where having the lowest cost is important.
- Magnetic
-
Previous generation volumes backed by magnetic drives. Used when data is accessed infrequently and performance is not a problem.
Magnetic storage does not support IOPS as a configurable parameter.
EBS volumes and EBS snapshots are charged per GB per Month.
Instance Store
Instance store provides temporary block level storage. It is located physically on disks attached to the host computer that your EC2 instance runs on.
Not all EC2 instance types support Instance Store. View the list here |
Elastic File System (EFS)
Amazon Elastic File System (Amazon EFS) provides serverless, fully elastic file storage so that you can share file data without provisioning or managing storage capacity and performance.
It supports NFSv4.
EFS does not work with Windows based EC2 instances.
EFS can be shared across multiple EC2 instances.
EFS is VPC specific.
EFS is made available by a mount target. Mount target is deployed in a subnet, it is just an IP address.
- EFS file system types
-
- Regional
-
data is available even if one or more AZz in a region are unavailable.
- One Zone
-
continuous availability within a single AZ
- EFS storage classes
-
- EFS Standard
-
uses solid state drive (SSD) storage for lowest latency. Good for frequently accessed files.
- EFS Infrequent Access
-
cost-optimized storage class for data that is accessed only a few times each quarter.
- EFS Archive
-
cost-optimized storage for data that is accessed a few times each year or less.
- EFS Performance
-
- General Purpose Performance Mode
-
Ideal for latency sensitive applications
- Max I/O Performance Mode
-
can scale to higher levels of aggregate throughput and operations per second
- Elastic Throughput Mode
-
automatically scales throughput performance up or down to meet the needs of the workload
- Provisioned Throughput Mode
-
level of throughput the file system can drive independent of the file system size
- Bursting Throughput Mode
-
Scales with the amount of storage in your file system and supports bursting to higher levels for up to 12 hours oer day
As EFS is a file system storage, you cannot mount an Operating System on it.
FSx
A fully managed service that provides a high performance file storage. It takes care of file servers and volumes, replication of data, patching the file server, addressing hardware issues and backups.
- FSx Flavours
-
- FSx for Windows File Server
-
fully compatible with Windows file servers, based on Windows Server OS. Supports Server Message Block (SMB) protocol. Can integrate with Microsoft Active Directory.
- FSx for Lustre
-
optimized for high performance parallel file processing. It is built on top of the Lustre file system. It is good for ML and data analytic workloads. Supports only single AZ deployment (both SSD and HDD options are available).
- FSx for NetAPP ONTAP
-
It is built on top of NetApp’s file system. It is compatible with NFS, SMB, iSCSI protocol.
- FSx for OpenZFS
-
a managed OpenZFS file system, compatible with NFS only.
S3 - Simple Storage Service
S3 is a service that provides object based storage. It is a global service, however a bucket is created within a region.
S3 cannot be used for file-based or block-based storage.
Objects created in S3 have a key, value and version/metadata.
S3 has a flat file structure, the folders are an illusion.
S3 is designed for high availability.
S3 bucket names must be unique globally across all AWS accounts.
S3 can handle unlimited number of objects.
The maximum size of an S3 bucket is 5 TB.
100 buckets are supported by default. However, this number can be increased to 1000 by requesting a service limit increase.
Multipart upload can break an object into parts before uploading to S3. Multipart upload is recommended once the size goes over 100 MB.
S3 Transfer acceleration uses the Amazon Edge locations for faster uploads and downloads.
S3 Byte Range fetches can fetch a part of a file.
S3 Select and Glacier Select can filter data on server side using sql.
Only empty S3 buckets can be deleted.
S3 Batch Operator is used for bulk operations like encrypting all unencrypted objects. S3 Inventory can be used to filter the object list
- S3 Storage Classes
-
- S3 Standard
-
This is the default storage class and offers replication across at least 3 AZs. The customer is charged for the data stored and the egress fees.
- S3 Standard IA
-
This is for data that is accessed less frequently, but requires rapid access when needed. It has a retrieval fee, minimum duration charge of 30 days, minimum size charge of 128 KB.
- S3 One Zone IA
-
cheaper cost as data is stored only in one AZ. It has a retrieval fee, minimum duration charge of 30 days, minimum size charge of 128 KB. It should be used for objects that are easily recoverable.
- S3 Glacier Instant Retrieval
-
Low-cost option for rarely accessed data. Minimum duration charge of 90 days, minimum size charge of 128 KB.
- S3 Glacier Flexible
-
Objects are not immediately available, they are not publicly accessible, minimum duration of 90 days, minimum size charge of 40 KB (Provides Bulk, Expedited, Standard options)
- S3 Glacier Deep Archive
-
Cheapest option, objects are not immediately available. Minimum duration of 180 days, minimum size charge of40 KB (Provides Standard and Bulk options)
- S3 Intelligent-Tiering
-
Intelligently moves data to the most cost-effective tier. Extra monitoring cost is to be paid.
Header to select the storage class is x-amz-storage-class
Lifecycle rules can be created to automatically move objects or even to delete objects.
S3 Analytics can be used for Storage Class Analysis.
S3 Versioning is used to maintain a history of objects and helps to recover deleted objects. Versioning is enabled at a bucket level, not at object level. Once versioning is enabled, it cannot be disabled, it can only be suspended. You are charged for all versions of your object.
MFA delete provides extra security to make any change in the versioning state of your bucket. It can be enabled only through CLI.
3 states in versioning are not versioned
, versioning enabled
, versioning suspended
.
S3 Replication is used to create copies of objects. For replication, it is necessary that versioning is enabled in all regions participating in the replication. It is asynchronous.
There are two modes of replication:
-
Cross Region Replication (CRR)
-
Same Region Replication (SRR)
Once replication is enabled, only the new objects are replicated. For existing objects there is an option to enable it through S3 Batch Replication.
By default, delete markers are not replicated, but can be enabled. Only delete markers are replicated, not the actual deletes.
S3 ACLs, IAM and Resource Policies
By default, all buckets are locked out. Only the user who created the bucket and the root user has access to the bucket.
IAM Policy determines what a user can do. With IAM policies, you can only grant users within your own AWS account permission to access your Amazon S3 resources.
Resource Policy determines what can be done on a resource.
Both IAM Policy and Resource Policy should pass so that a User has access to a bucket
Access Control Lists (ACLs) give read or write access on buckets or objects to groups of users. With ACLs, you can only grant other AWS accounts (not specific users) access to your Amazon S3 resources.
ACLs are deprecated. IAM Policies are preferred. |
- Static Hosting
-
S3 provides the ability to host static websites and provides a URL for the website. Every request to the website is charged. For a custom domain, the bucket name should match the domain name.
- Pre-Signed URLs
-
Pre-Signed URLs are used to impersonate an authorized user and can be provided to people who do not have AWS accounts. A Pre-Signed URL must have an expiration date with a maximum of 7 days.
- Access Points
-
Access Points can be used for creating access for a group like developers or managers. Access Points get their own ARN. They can also work for group of resources. The Access Point policies need to be copied to the bucket policy. They can be accessed internally within the VPC using Gateway or Interface endpoint. VPC Endpoint Policy must allow access to the target bucket and Access point.
Requester Pays is an option to bill the requester for the network cost.
Event Notifications are used for auditing information to SQS, SNS and Lambda functions. Another option is to use EventBridge.
S3 Encryption can be enabled for both Client side and Server side.
Server side encryption has the following options:
-
Server side encryption with S3 Managed keys - default (SSE-S3)
-
Server side encryption with KMS Keys stored in AWS KMS (SSE-KMS)
-
Server side encryption with customer provided keys (SSE-C)
Glacier Vault Lock and S3 Object Lock are used to prevent deletion of objects.
S3 Object Lambda can be used to change the object just before it is retrieved.
S3 Sharding is a feature that stores data in shards across multiple nodes. S3 manages the sharding and reordering of large files that are uploaded in parts.
Storage Gateway
Storage Gateway is a hybrid cloud storage service. It allows your on-premise applications to integrate and use the cloud storage services.
It can be used as an extension for your on-premise storage needs.
It can also assist migrations into the cloud.
Other use-cases include backups and disaster recovery.
The Storage Gateway is either a physical machine or a virtual machine that is provisioned on your on-premise environment.
- Storage Gateway flavours
-
The flavour of storage gateway to be used depends on the current set-up where the file is stored.
- Volume
-
for files stored on block storage. Uses the iSCSI protocol.
- Cached mode
-
data is not stored locally on-premise. All data is stored in S3.
- Stored mode
-
mimics the behaviour of the network attached storage present on-premise. Data is stored locally on-premise. Data is replicated asynchronously to AWS S3.
- File
-
for files stored on file storage. Uses the NFS or SMB protocol.
- Amazon S3 File Gateway
-
uses NFS or SMB protocol for the application server on-premise, in the background it is using S3
- Amazon FSx File Gateway
-
native access to Amazon FSx for Windows File Server
- Tape
-
for files stored on magnetic storage. Uses the iSCSI protocol.
Use cases:
-
Disaster recovery
-
Backup and restore
-
Tiered storage
-
On-premises cache and low-latency file access
The Storage Gateway has to be installed on the corporate data center. If you don’t have the hardware on-premise, you can order it from Amazon via the Storage Gateway Hardware Appliance.