skip to Main Content

AWS S3

S3 – Simple Storage Service
Object based storage – flat files
S3 is a universal namespace – unique names for buckets
HTTP 200 successful upload code
Format of bucket names:
http://s3-aws-region.amazonaws.com/bucket
(https://s3-us-east-1.amazonaws.com/bucketname)
Bucket site hosting URL:
http://examplebucket.s3-website-us-west-2.amazonaws.com/
Data consistency model for S3 – read write consistency for puts of new object.  Puts and deletes take time to propagate.  New object immediately read.  If updating or deleting object it can take time to read.
KEY NOTES:
API for S3 SSE – X-amz-server-side-encryption – header to the request
Objects uploaded prior to versioning will have version ID as NULL.
Server side encryption – each object is encrypted with a unique key employing strong encryption.  As an additional safeguard, it encrypts itself with a MASTER KEY that it regularly rotates!
You can use bucket policy to allow images/videos to be accessible by only your WEBSITE URL!
409 error means bucket already exists
S3 Buckets can contain both encrypted + non encrypted objects
To encrypt use either server side encryption – s3 managed keys (SSE-S3) or server side encryption with AWS KMS managed keys (SSE-KMS) KMS provides audit trail or use server side with customer provided keys (SSE-C) you manage keys and AWS manages encryption.
Max buckets 100!!!
Minimum size of an S3 object is 0 bytes
S3 request rate – if requests are a mix of GET, PUT, DELETE, or GET Bucket (list objects), choosing appropriate key names for your objects ensures better performance by providing low-latency access to S3 index.  Workloads that are GET-intensive use AWS CloudFront
Key names introduce randomness at the beginning in your key name prefixes, this helps the I/O load and will be distributed across more than one partition
Example add hex has prefix to key name:
examplebucket/232a2013-26-05-15-00-00/cust1234234/photo1.jpg
examplebucket/7b542013-26-05-15-00-00/cust3857422/photo2.jpg
You can group objects by adding more prefixes in your key name
examplebucket/animations/232a-2013-26-05-15-00-00/cust1234234/animation1.obj
examplebucket/animations/7b54-2013-26-05-15-00-00/cust3857422/animation2.obj
examplebucket/videos/ba65-2013-26-05-15-00-00/cust8474937/video2.mpg
examplebucket/videos/8761-2013-26-05-15-00-00/cust1248473/video3.mpg

  • secure, durable, highly scalable object store (0 byte to 5TB), universal namespace (must be unique bucket – regardless of regions),object based key value store, VersionID, Metadata, ACL
  • The total volume of data and number of objects you can store are unlimited. Individual Amazon S3 objects can range in size from 0 byte to 5 terabytes. The largest object that can be uploaded in a single PUT is 5 gigabytes. For objects larger than 100 megabytes, customers should consider using the Multipart Upload capability. it mean the largest single file into S3 is 5G, but after the 5G files are in S3, they can be assembled into a 5T file,
  • You can use a Multipart Upload for objects from 5 MB to 5 TB in size (Exam question, scenario where more than 5GB file needs to be uploaded)
  • object based storage vs block based Storage (EFS)
  • data is spread out in multiple facilities, you can loose two facilities and still have access to files
  • For PUTS of New Objects (Read after Write Consistency), For Overwrite PUTS and DELETE (Eventual Consistency)
  • http://docs.aws.amazon.com/general/latest/gr/awsservicehtml#limits_s3 ( Number of S3 bucket limit per account — 100)

Storage Tiers/ Class 

  • S3 Standard – Durability (11 9s), Availability (99.99 %) – reliable regular for just about everything
  • S3 IA (Infrequent Access) – Durability (11 9s), Availability (99.9 %) – accessed every 1 month to 6 months or so (infrequent) but rapid access and low retrieval time (few ms)
  • S3 RRS(Reduced Redundant Storage)- Durability (99.99%), Availability (99.99 %) – less durability (data that can easily be regenerated – e.g thumbnails) – cheapest of all s3, less fault tolerant then the other two since you are willing to loose the data, reproducible data
  • Glacier – for archival only (3 to 5 hours restore time)
  • S3 price – charged for Storage, number of requests, data transfer (tiered so more you use less charge)
  • All S3 bucket name has to be all lowercase letters
  • S3 for static website hosting (Static Website Hosting > Enable website hosting) – no dynamic
  • Any time you create a bucket nothing is publicly accessible / Any time you add an object to a bucket its private by default (you will get 403) > Make the files public (even for public hosting)
  • every object inside the bucket can have different storage class (S3 standard, S3-IA, S3-RRS) and you can turn on server side encryption (AES – 256)
  • regular bucket link: https://s3-eu-west-1.amazonaws.com/saurabhtest <— https
  • bucket with Static website hosting: http://saurabhbitsite.s3-website-eu-west-1.amazonaws.com <— http (has to be for static hosting), you can turn it into SSL / https with cloudfront though
  • CORS – Cross Origin Resource Sharing  help to avoid the use of proxy – stuff in one bucket work with bucket in different domain.
  • Versioning– once enable you cannot disable versioning / although it can be suspend it , if you want to turn it off delete the bucket and recreate (version id)
  • Once you delete the delete marker, you can get the file back that you have deleted while versioning on every version is stored separately in the bucket for each version / might not be a good choice for cost perspective for large media files., multiple updates use case also not ideal for versioning.
  • Versioning’s MFA Delete Capability can be used to provide additional layer of security.
  • Cross Region Replication– (requires versioning enabled on source and destination buckets). you can enable – need source and destination bucket (create a new bucket, source bucket will not show up on drop down of destination).Existing objects will not be replicated, only new objects will be replicated across the region

Lifecycle management in S3

  • when versioning is disabled
    • Transition to IA S3 – min 30 days and has a 128KB minimum of object size
    • Archive to Glacier – min 1 day if IA is not checked, min 60 day if Transition to IA S3 is checked
    • Permanently Delete – min 2 day if IA is not checked and 1 is selected for Glacier, min 61 day if IA is selected 30, Glacier is selected 60.
  • when versioning is enabled you have lifecycle management options to take action on previous version as well as current version.

Security and Encryption in S3

  • by default newly created buckets are private
  • Access control using Bucket Policies (entire bucket) and ACL(individual objects and folders)
  • access logs – all the request made to S3 buckets, to another bucket or another account’s S3 bucket

Encryption

  1. In Transit – SSL / TLS
  2. Data at rest

Server Side Encryption

  • SSE- S3 Server Side Encryption with S3 managed keys, (amazon AES 256 handled for you) – click on the object and encrypt
  • SSE – KMS – AWS Key management services , managed keys – additional charges / audit trail of keys, amazon manage keys
  • SSE – C – Server side encryption with Customer provided keys – you manage encryption keys

Client Side Encryption

  • you encrypt the data on client side and upload to s3
  • Every non-anonymous request to S3 must contain authentication information to establish the identity of the principal making the request. In REST, this is done by first putting the headers in a canonical format, then signing the headers using your AWS Secret Access Key.
  • You can insert a presigned url into a webpage to download private data directly from S3.
  • The object creation REST APIs (see Specifying Server-Side Encryption Using the REST API) provide a request header, x-amz-server-side-encryption that you can use to request server-side encryption.

S3 Transfer Acceleration

  • Utilize local edge locations to upload content to S3 – incur extra cost
  • further away you are the more benefit you get (faster)

Storage Gateway
(1) Gateway stored volumes – entire dataset is stored onsite and asynchronously backed up to S3
(2) Gateway cached volumes – Most frequently used data is stored onsite and entire dataset is stored on S3
(3) Gateway Virtual Taped library – Used for backup if you don’t want to use Tapes, like Netbackups etc..

Import Export
Import / Export Disk

  • Import to S3, EBS, Glacier
  • export from S3

Import / Export Snowball

  • Import to S3
  • Export to S3
  • S3 stored data in alphabetical / lexigraphical order. so if you want to spread the load across S3, filename should not be similar (Optimize performance)

S3 bucket policy contains the following element

  • Resources – buckets + objects – ARN (Amazon Resource Name)
  • Actions
  • Effect – effect will be allow or deny
  • Principal – the account or user allowed to access

HTTP 404 status code – missing security header

Multipart upload API allows you to stop & resume uploads
USE multipart upload for objects larger than 100MB
S3 API – put object, upload part, get bucket, get object, post, delete, list – see here for more https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html

S3 paid subscribers to download content: Generate pre-signed object URL
Restrict access with bucket policy and ACL on bucket

403 Forbidden in API = InvalidAccessKeyID
MAX protection of preserved versions use MFA

 

Back To Top