Welcome to Day 62 of the #100DaysOfDevOps Challenge! Today we will see the Features of Amazon S3
To gain comprehensive knowledge of Amazon S3, one must have a clear understanding of the various features of S3. So, without wasting time, let's dive into the features of S3.
1. Storage Classes
Amazon S3 provides a range of storage classes that we can choose from based on the data access, resiliency, and cost requirements of our workloads. S3 storage classes are built in such an efficient manner that they can address almost any of our use cases, be it performance needs or data residency requirements or variable/unknown/infrequent access patterns or archival storage or low cost and so on. Storage classes offered by Amazon S3 are as below:
★ S3 Standard
★ S3 Intelligent Tiering
★ S3 Standard-Infrequent Access (S3 Standard-IA)
★ S3 One Zone-Infrequent Access (S3 One Zone-IA)
★ S3 Glacier
★ S3 Glacier Deep Archive
As a customer, for data storage purposes, we can use any of the above-mentioned storage classes which best addresses our use cases. To be more elaborate, for normal purposes, we can use the S3 Standard class, which ensures low latency and high throughput for our stored data. S3 Standard class is also the default storage class. Similarly, we can use the S3 Intelligent Tiering storage class if we want to optimize cost while acquiring all the perks of the S3 Standard class. Likewise, for data that is accessed less frequently but require rapid access when needed, we can use either S3 Standard-IA or S3 One Zone-IA. Moreover, for archive data storage, we have two choices, namely, S3 Glacier and S3 Glacier Deep Archive. In the nutshell, for choosing a storage class, firstly, we must have a clear idea of our use case and then, choose a storage class that best addresses our use case.
2. Storage Management
Amazon S3 offers several storage management features which can be used for managing costs, satisfying regulatory requirements, reducing latency, and replicating data for compliance and security requirements. For gaining clear insights, let's have a brief discussion on each feature.
★ S3 Lifecycle
We can configure the S3 Lifecycle policy to make sure that the objects are stored cost-effectively throughout their lifecycle. We can either impose transition action on objects causing them to move to other S3 storage classes after a certain time interval or impose expiration action on objects causing them to be deleted by Amazon S3 on expiration.
★ S3 Object Lock
S3 Object Lock enables us to store objects in S3 using a write-once-read-many (WORM) model. To be more precise, it can help us prevent stored objects from being deleted or overwritten for either a fixed period of time or an indefinite period of time. S3 Object Lock can be used if the objects to be stored in S3 require WORM storage.
★ S3 Replication
S3 Replication enables us to replicate the objects along with their metadata and tags across Amazon S3 buckets in the same AWS region or different AWS Regions as the source bucket. On the basis of the nature of replication, S3 replication can be categorized into two types, namely, Cross-Region Replication and Same-Region replication. Cross-Region replication helps us minimize latency, meet compliance requirements and increase operational efficiency whereas Same-Region replication helps us aggregate logs into a single bucket, configure live replication between production and test accounts and abide by data sovereignty laws (when required).
★ S3 Batch operations
S3 Batch operations enable to perform large-scale batch operations on Amazon S3 objects. We can perform a specific operation on billions of Amazon S3 objects containing exabytes of data. S3 Batch operations can be used through AWS Management Console, AWS CLI, Amazon SDKs, or REST API. We can use it to carry out various operations such as copying S3 objects, setting tags or access control Lists (ACLs) to objects, initiating restoration of objects from S3 Glacier Flexible Retrieval, invoking an AWS Lambda function to perform custom actions on objects, and so on. Moreover, Amazon S3 also tracks the progress of the operations, sends notifications, and generates and stores a detailed completion report of all actions, providing us a fully managed, auditable, and serverless experience.
3. Access Management
Amazon S3 provides several access management features for auditing and managing access to our S3 buckets and objects stored in them. When we create S3 buckets and store objects in them, by default, they are private i.e. only the person who has created them has access to it. Hence, to address use cases such as granting public access to the buckets and objects, granting access only to some specific or authorized people, and so on, we can use access management features to audit permissions to our Amazon S3 resources. To gain more clarity, let's have a look at the access management features provided by Amazon S3:
★ S3 Block Public Access
S3 Block Public Access feature allows account administrators and bucket owners to set up centralized controls to limit public access to S3 buckets and objects. By default, block public access settings are turned on at the account and bucket level.
★ AWS Identity and Access Management (IAM)
We can use IAM to create IAM users for our AWS account and manage access to our Amazon S3 resources by granting different types of access to an IAM user or a group of IAM users, based on necessity.
★ Bucket Policies
To grant access permissions to our Amazon S3 buckets and the objects inside them, we can create as well as configure a bucket policy. A bucket policy is a resource-based policy that is written in JSON format. It enables us to add or deny permissions for the objects stored in the S3 bucket, based on which, incoming access requests are either allowed or denied.
★ Access Control Lists (ACLs)
Access control lists allow us to grant basic read/write permissions for S3 buckets and objects to other AWS accounts. But, there are some limits to managing permissions using ACLs such as the inability to grant permissions to users of their account, inability to grant conditional permissions, inability to deny permissions, and so on. Hence, AWS itself recommends the use of S3 resource-based policies or IAM policies for access control, instead of ACLs.
★ S3 Object Ownership
S3 Object Ownership is a bucket-level setting that can be used to take ownership of every single object present in our S3 bucket by disabling ACLs. By default, when another AWS account uploads an object to our S3 bucket, instead of us, that other AWS account owns the object, has access to it, and also has the privilege to grant other users access to it through ACLs. Because of this default behavior, access management becomes somewhat complex for the bucket owner. Fortunately, S3 Object Ownership comes as a savior and helps us override the default behavior allowing us to disable ACLs and own every object in our bucket, ultimately helping in simplifying the access management for objects stored in our S3 buckets.
★ Access Analyzer for S3
Access Analyzer enables us to evaluate and monitor our S3 bucket access policies and thereby ensure that the policies imposed in the buckets provide only the intended access. Whenever there is the presence of S3 buckets configured with public access (access to anyone on the internet or other AWS accounts), Access Analyzer alerts us, based on which we can evaluate the bucket policies and take immediate actions if required.
4. Storage Logging and monitoring
Amazon S3 provides several storage logging and monitoring tools that can be used to monitor and regulate how our Amazon S3 resources are being used. While some of the tools carry out monitoring for us automatically, some of the tools require our manual involvement. To gain vivid knowledge on storage logging and monitoring features of Amazon S3, let's have a look at each of the storage logging and monitoring tool in brief:
★ Amazon CloudWatch metrics for Amazon S3
Amazon CloudWatch metrics for Amazon S3 offer several metrics such as daily storage metrics for buckets, request metrics, replication metrics, and Amazon S3 Storage Lens metrics which can be watched over a specified period of time to collect insights/report on operational usage and cost of S3 resources as well as generate billing alerts when estimated charges reach a user-defined threshold. Moreover, the report obtained from the metrics can be used to identify operational issues and enhance the performance of our applications using Amazon S3.
★ Amazon CloudTrail Log Monitoring
Amazon S3 is integrated with AWS CloudTrail. Hence, CloudTrail records each action taken by a user, role, or an AWS service in Amazon S3 in the form of a log file and automatically delivers/saves the very log file in an Amazon S3 bucket of our choice. Such log files contain detailed API tracking for S3 bucket-level and object-level operations which can prove to be handy for security, auditing, governance, and compliance use cases. Additionally, when CloudTrail Logs are integrated with CloudWatch Logs, we can monitor CloudTrail logs in real-time and trigger CloudWatch alarms in the form of email notifications when specific API activity takes place in Amazon S3 resources by a specific user/role/AWS service.
★ Server Access Logging
Server Access Logs provide detailed records/logs of all the requests made to Amazon S3 resources, which can be used for security and access audits. By default, Amazon S3 doesn’t collect server access logs. Hence, we should manually enable it ourselves for the desired S3 bucket, by using either Amazon S3 console or Amazon S3 API or AWS Command Line Interface (AWS CLI), or AWS SDKs. After enabling server access logging for an S3 bucket or source bucket, the access logs/records will be saved in a target bucket of our choice (target bucket should be in the same AWS region as source bucket and owned by our own account).
★ AWS Trusted Advisor
AWS Trusted Advisor continuously monitors and analyzes our AWS resources and makes our recommendations for cost optimization, performance enhancement, security improvement, fault tolerance, and service limits. In the case of Amazon S3, Trusted Advisor checks the logging configuration of buckets, identifies buckets having open access permissions and versioning disabled and then, recommends activities which when carried out ensure the enhancement in monitoring, security, and fault tolerance of S3 resources.
5. Data Processing
Amazon S3 provides several data processing features that can be used for transforming the data and triggering workflows to automate various processing activities. For a clearer understanding, let's discuss each of the features in brief.
★ S3 Object Lambda
S3 Object Lambda gives us the ability to add our own code to Amazon S3 GET requests for modifying and processing data to be received by an application making the request. It utilizes AWS Lambda functions for auto-processing the response of an S3 GET request. Since AWS is a serverless compute service, our custom code runs just fine without requiring management of underlying compute services. For implementing S3 Object Lambda to address our specific use cases requiring transformation of response data, all we need to do is configure a Lambda function and attach to an S3 Object Lambda service endpoint while S3 does the rest i.e. it automatically calls our Lambda function.
★ Event notifications
Amazon S3 Event Notifications enables us to run workflows, send alerts/notification messages through Amazon Simple Notification Service (Amazon SNS) or Amazon Simple Queue Service (SQS) and perform some actions in response to the changes in objects stored in S3. Using it, we can set triggers to perform actions such as transcoding media files on upload, processing data files, synchronizing S3 objects with other data sources, and so on. S3 event notifications are set at bucket level and we can configure them through Amazon S3 console or REST API or Amazon SDK.
6. Storage Analytics and Insights
Amazon S3 offers several storage analytics and insights features that can help us gain clear visibility on our S3 storage resources and their usage, which ultimately makes us able to better understand, analyze and optimize our storage. To achieve a better understanding of storage analytics and insights features of Amazon S3, let's have a brief look at each of the features.
★ Amazon S3 Storage Lens
Amazon S3 Storage Lens be used to acquire insights of Amazon S3 storage usage and activity through 29+ usage and activity metrics, at the organization, account, bucket, object, or even prefix level. We can view the information either in the account snapshot on the Amazon S3 console home page or interactive dashboards or through a metrics export (CSV or Parquet format) Apart from collecting our usage and activity information, it also analyzes them and provides us recommendations for optimizing costs and applying best practices for data protection. S3 Storage Lens can be used through AWS Management Console, AWS CLI, AWS SDKs, or REST API.
★ Storage Class Analysis
Storage Class Analysis monitors the access frequency of S3 objects over a period of time, analyses them, and provides us the analysis result, which can help us determine the appropriate time to transition less frequently accessed stored objects to a more cost-effective storage class. We can configure the Storage Class Analysis policy to monitor an entire bucket or a prefix or an object tag. Moreover, the Storage Class Analysis feature also includes a detailed daily analysis of our storage usage at the specified bucket or prefix or tag level, which we can export to an S3 bucket.
★ Amazon S3 Inventory
Amazon S3 Inventory is one of the useful analytics tools provided by Amazon S3 that helps us manage our S3 storage. It can be used for auditing and reporting on the replication and encryption status of our S3 objects for business, compliance, and regulatory needs. Amazon S3 Inventory generates as well as stores comma-separated values (CSV), Apache optimized row columnar (ORC), or Apache Parquet output files containing the list of objects and their corresponding metadata for an S3 bucket or a shared prefix on a daily or weekly basis. Such inventory list files can be queried by using tools such as Amazon Athena to gain useful insights on Amazon S3 storage, which can be later utilized for several use cases.
If you have reached this portion of the article, it means that, now, you have some good knowledge of Amazon S3 and its features. Also, I hope that you do remember that, at the beginning of this article, we had planned to gain comprehensive knowledge on Amazon S3 in three steps. Since we have completed the first and second steps by gaining knowledge on Amazon S3 and its features, let's move on to the third and final step and try to gain some knowledge on the working mechanism of Amazon S3.
Working Mechanism of Amazon S3
Amazon S3 is an object storage service fully managed by AWS. Hence, as a user, all we have to do is upload and download our data as per the requirement. Behind the scenes, AWS takes care of several things such as storing the data uploaded by us, ensuring their availability, and delivering the data to our customers when requested.
Amazon S3 stores the data uploaded by us as objects within the buckets. In simple words, buckets can be termed as containers in which S3 stores the data as objects. Each bucket is standalone, which means that one bucket cannot have another bucket inside it. Instead, for proper fragmentation of data, it can contain “prefixes” (analogous to folders inside our local disk) Similarly, an object can be defined as the fundamental entity stored in Amazon S3. An object consists of object data, metadata, and a key as its unique identifier within a bucket.
To sum up, the working mechanism of Amazon S3, when a user needs to upload data to AWS S3, he/she accesses S3 using the web service interface, creates a bucket with several features such as bucket policy, lifecycle policies, versioning control, and so on and finally uploads the data to the created bucket specifying the type of S3 Storage class to be used for the data. In response to the activities of the user for storing data, Amazon S3 creates a bucket in the region specified by the user with the desired features and stores the data in the bucket ensuring the availability and consistency of data. Moreover, once a file is uploaded, Amazon S3 also provides us with a URL that can be used to access the uploaded file.