Back

ⓘ Amazon S3




Amazon S3
                                     

ⓘ Amazon S3

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network. Amazon S3 can be employed to store any type of object which allows for uses like storage for Internet applications, backup and recovery, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006, then in Europe in November 2007.

                                     

1. Design

Although Amazon Web Services AWS does not publicly provide the details of S3s technical design, Amazon S3 manages data with an object storage architecture which aims to provide scalability, high availability, and low latency with 99.999999999% durability and between 99.95% to 99.99% availability though there is no service-level agreement for durability.

The basic storage units of Amazon S3 are objects which are organized into buckets. Each object is identified by a unique, user-assigned key. Buckets can be managed using either the console provided by Amazon S3, programmatically using the AWS SDK, or with the Amazon S3 REST application programming interface API. Objects can be managed using the AWS SDK or with the Amazon S3 REST API and can be up to five terabytes in size with two kilobytes of metadata. Additionally, objects can be downloaded using the HTTP GET interface and the BitTorrent protocol.

Requests are authorized using an access control list associated with each object bucket and support versioning which is disabled by default. Note that since buckets are typically the size of an entire file system mount in other systems, this access control scheme is very coarse-grained, i.e. you cannot have unique access controls associated with individual files. Bucket names and keys are chosen so that objects are addressable using HTTP URLs:

  • http:// bucket.s3- region.amazonaws.com/ key
  • . region.amazonaws.com/ bucket / key for requests using IPv4 or IPv6
  • . region.amazonaws.com/ bucket / key
  • http:// bucket.s3-accelerated.dualstack.amazonaws.com/key
  • http:// bucket.s3-website. region.amazonaws.com/ key if static website hosting is enabled on the bucket
  • http:// bucket.s3-website- region.amazonaws.com/ key if static website hosting is enabled on the bucket
  • http:// bucket.s3.amazonaws.com/ key
  • - region.amazonaws.com/ bucket / key
  • bucket / key for a bucket created in the US East N. Virginia region)
  • http:// bucket.s3.dualstack. region.amazonaws.com/ key for requests using IPv4 or IPv6
  • http:// bucket.s3-accelerated.amazonaws.com/key where the filetransfer exits Amazons network at the last possible moment so as to give the fastest possible transfer speed and lowest latency
  • http:// bucket / key where bucket is a DNS CNAME record pointing to bucket.s3.amazonaws.com
  • http:// bucket.s3. region.amazonaws.com/ key
  • bucket / key

Amazon S3 can be used to replace significant existing static web-hosting infrastructure with HTTP client accessible objects. The Amazon AWS authentication mechanism allows the bucket owner to create an authenticated URL which is valid for a specified amount of time.

Every item in a bucket can also be served as a BitTorrent feed. The Amazon S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file. This can drastically reduce the bandwidth cost for the download of popular objects. While the use of BitTorrent does reduce bandwidth, AWS does not provide native bandwidth limiting and, as such, users have no access to automated cost control. This can lead to users on the free-tier of Amazon S3, or small hobby users, amassing dramatic bills. AWS representatives have stated that a bandwidth limiting feature was on the design table from 2006 to 2010, but in 2011 the feature is no longer in development.

A bucket can be configured to save HTTP log information to a sibling bucket; this can be used in data mining operations.

There are various User Mode File System FUSE-based file systems for Unix-like operating systems Linux, etc. that can be used to mount an S3 bucket as a file system such as S3QL. The semantics of the Amazon S3 file system are not that of a POSIX file system, so the file system may not behave entirely as expected.

                                     

1.1. Design Hosting websites

Amazon S3 provides the option to host static HTML websites with index document support and error document support. Websites hosted on S3 may designate a default page to display and another page to display in the event of a partially invalid URL, such as a 404 error, which provide useful content to visitors of a URL containing a CNAME record hostname rather than a direct Amazon S3 bucket reference when the URL does not contain a valid S3 object key, such as when a casual user initially visits a URL that is a bare non-Amazon hostname.

                                     

1.2. Design Amazon S3 logs

Amazon S3 allows users to enable or disable logging. If enabled, the logs are stored in Amazon S3 buckets which can then be analyzed. These logs contain useful information such as:

  • Turnaround time
  • HTTP status codes
  • Date and time of access to requested content
  • HTTP request message
  • Protocol used

Logs can be analyzed and managed using third-party tools like S3Stat, Cloudlytics, Qloudstat, AWStats, and Splunk.

                                     

1.3. Design Amazon S3 tools

Amazon S3 provides an API for developers. The AWS console provides tools for managing and uploading files but it is not capable of managing large buckets or editing files. Third-party websites like S3edit.com or software like Cloudberry Explorer, ForkLift and WebDrive have the capability to edit files on Amazon S3.

                                     

2. Amazon S3 storage classes

Amazon S3 offers four different storage classes that offer different levels of durability, availability, and performance requirements.

  • Amazon S3 Standard is the default class.
  • Amazon S3 Standard Infrequent Access IA is designed for less frequently accessed data. Typical use cases are backup and disaster recovery solutions.
  • Amazon Glacier is designed for long-term storage of data that is infrequently accessed and where retrieval latency of minutes or hours is acceptable.
  • Amazon S3 One Zone-Infrequent Access is designed for data that is not often needed but when required, needs to be accessed rapidly. Data is stored in zone and if that zone is destroyed, all data is lost.
                                     

3. Notable users

  • The API has become a popular method to store objects. As a result, many applications have been built to natively support the Amazon S3 API which includes applications that write data to Amazon S3 and Amazon S3-compatible object stores
  • Photo hosting service SmugMug has used Amazon S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it as being "considerably more reliable than our own internal storage" and claimed to have saved almost $1 million in storage costs.
  • Netflix uses Amazon S3 as their system of record. Netflix implemented a tool, S3mper, to address the Amazon S3 limitations of eventual consistency. S3mper stores the filesystem metadata: filenames, directory structure, and permissions in Amazon DynamoDB.
  • Mojang hosts Minecraft game updates and player skins on Amazon S3.
  • Tumblr, Formspring, and Pinterest host images on Amazon S3.
  • Amazon S3 was used by some enterprises as a long term archiving solution until Amazon Glacier was released in August 2012.
  • Bitcasa, and Tahoe-LAFS-on-S3, among others, use Amazon S3 for online backup and synchronization services. In 2016, Dropbox stopped using Amazon S3 services and developed its own cloud server.
  • Swiftypes CEO has mentioned that the company uses Amazon S3.
  • reddit is hosted on Amazon S3.


                                     

4. S3 API and competing services

The broad adoption of Amazon S3 and related tooling has given rise to competing services based on the S3 API. These services use the standard programming interface; however, they are differentiated by their underlying technologies and supporting business models. A cloud storage standard like electrical and networking standards enables competing service providers to design their services and clients using different parts in different ways yet still communicate and provide the following benefits:

  • Provide timely solutions for delivering functionality in response to demands of the marketplace.
  • Increase competition by providing a set of rules and a level playing field, encouraging market entry by smaller companies which might otherwise be precluded.
  • Allow economies of scale in implementation.
  • Encourage innovation by cloud storage & tool vendors, & developers because they can focus on improving their own products and services instead of focusing on compatibility.

Examples of competing Amazon S3-compliant storage implementations:

  • DELL EMC Elastic Cloud Storage ECS
  • Pure Storages FlashBlade
  • Connectrias Cloud Storage
  • S3Proxy allows access to other storage backends via the S3 API
  • IBM Bluemix object storage
  • Eucalyptus
  • Riak CS, which implements a subset of the S3 API including REST and ACLs on objects and buckets.
  • NooBaa Hybrid Storage
  • Openstack Swift
  • Cloudian HyperStore
  • DigitalOcean Spaces
  • DDN Web Object Scaler WOS for on-premise cloud storage
  • Minio Object Storage released under Apache License v2
  • CloudServer
  • Nimbula acquired by Oracle
  • Ceph with RADOS gateway
  • NetApp StorageGRID for on-premise clouds
  • ActiveScale Western Digital
  • Apache CloudStack
  • DreamHost DreamObjects
  • Scality RING
  • Rackspaces Cloud Files
  • IBM Cloud Object Storage formerly Cleversafe for on-premise object storage or on the IBM public cloud
  • OpenIO
  • Linode Object Storage


                                     

5. History

Amazon Web Services introduced AmazonS3 in 2006.

Amazon S3 is reported to store more than 2 trillion objects as of April 2013. This is up from 10 billion objects as of October 2007, 14 billion objects in January 2008, 29 billion objects in October 2008, 52 billion objects in March 2009, 64 billion objects in August 2009, and 102 billion objects in March 2010. In November 2017 AWS added default encryption capabilities at bucket level.