Scalable Media Hosting with Amazon S3
Published on November 26, 2007
Imagine you have a small web site with big potential. You’re currently using a reasonably-priced web host that provides good value for the amount of traffic you normally receive. Perhaps you’ve even gone one step further, and are hosting your site on a dedicated server. But now your site has caught the attention of the blogosphere—you’re about to get much more traffic than you can handle with your current web hosting setup.
What are you going to do?
Knowing how to scale your web site can mean the difference between watching your idea take off or take a dive. One common technique for scaling a web site is to use a different server to host media files like images, videos, and audio. This distributes the traffic and bandwidth load between hosts, allowing the primary web server to focus on delivering web pages and server-side processing rather than serving up 5MB audio files (or even 100MB videos).
If you don’t want to set up, configure, and maintain a few extra servers just for hosting your media files, you can instead use Amazon S3. Amazon S3 is storage for the internet; it gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.
This article walks through the steps necessary for hosting media files for your web site using Amazon S3. We’re going to use a domain we’ve already registered, webscalecomputing.info, and set up a new sub-domain, media.webscalecomputing.info, that will host the images, videos, and audio files on Amazon S3.
(While we won’t go into any programming details for using Amazon S3, you’ll need to have a basic understanding of web networking and DNS to read this article—or you’ll need to be able to translate the concepts to your own hosting provider.)
More on Amazon S3
Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.
You can always improve your web site performance by moving your media files from your main web server to their own box—and this could be as simple as creating a sub-domain that points to a host that serves your media files. Of course, you still have to worry about the typical heavy-lifting for any type of hosting, such as:
- How much traffic will this setup accommodate? What happens if I get more traffic than it can handle?
- What happens if the host goes down?
- How do I backup the files so they’re not lost?
- How much am I paying for idle capacity?
Amazon S3 provides answers to those questions, without the need for worrying about the pesky details of, well, implementing them.
The web services interface is simple enough that you can retrieve data using a URL, so it’s well-suited for basic web hosting tasks, like serving up media files.
The pricing for Amazon S3 is on a pay-as-you-go basis, so there is no minimum fee. This means you don’t have to invest in a large amount of hosting infrastructure or services in order to ensure that your web site handles the occasional traffic spike.
Use the AWS Simple Monthly Calendar provided by AWS to estimate your monthly bill.
Amazon S3 in Action
Blue Origin is one small company with a big idea that successfully scaled its web site using Amazon S3. On January 2, 2007, the company posted information and videos on its web site about a test launch for a new vertical take-off, vertical-landing vehicle. Within the next day, the news was covered by both SlashDot and Boing Boing, sending a tremendous amount of traffic to its web site. With its media files stored in Amazon S3, it was able to instantly scale and handle the 3.5 million requests and 758 GBs in bandwidth in a single day.
Had the company hosted the web site completely on one of its internal servers, the traffic on January 4th would have overwhelmed their system capacity. If they had used a basic hosting package from a popular provider, they would have overwhelmed that service, or—even worse—exceeded the maximum allowed bandwidth for the month and incurred massive overage fees.
Blue Origin’s total charge for Amazon S3 in January? Just over $300.
Time for a look at how to host your media files on Amazon S3.
Sign up for Amazon S3
If you haven’t already, sign up for Amazon S3 at http://aws.amazon.com/s3. After signing up, you’ll have two access identifiers needed for uploading your media files:
- an Access Key ID
- a Secret Access Key
The Access Key ID is a public identifier, like a user name, that specifies a particular Amazon S3 account. The Secret Access Key is the private identifier, like a password, that ensures you’re the one making a request.
Upload Your Media Files
Without going into too many details, Amazon S3 uses concepts of a bucket and object to store data. Buckets help organize a collection of objects, just as a folder might contain a list of files.
There are many tools available for working with Amazon S3 without having to write a software application. For this article, we’ll be using a plug-in for the Firefox browser, called S3Fox. You can also use one of the many code samples and tools available through the Amazon S3 Resource Center, or use a product built on Amazon S3 in the Solutions Catalog.
First, create a bucket in your Amazon S3 account that corresponds to the domain you’ll use to host your media files. For our web site, we’ll create a bucket called “media.webscalecomputing.info”.
Important: Use lower-case letters only to name buckets that will be used in DNS redirects. This requirement is a function of the way that DNS handles names (always lower case).
Why use this specific bucket name? Amazon S3 has a virtual hosting feature that allows inbound requests from a web site, so it will serve up content from the bucket by the same name. We’ll talk more about this feature in the next section when we configure our domain.
Next, add your media files to the new bucket in Amazon S3. Using the Firefox plug-in, it’s as simple as selecting the files on your local system, then clicking the transfer button.
Amazon S3 has a rich set of access privileges for both buckets and objects, so make sure that permissions are set on both the bucket and your objects to allow everyone access. The Firefox plug-in we’re using sets this for us using a dialog box.
All the media files are now accessible through a URL pointing to Amazon S3. The basic URL syntax for S3 is
http://<bucket_name>.s3.amazonaws.com/<object_name>, so the files we uploaded have the following URLs:
The simplest way to use Amazon S3 for media hosting is to simply update our web pages to point to these files. For example:
<img src=”http://media.webscalecomputing.info.s3.amazonaws.com/jeff-at-web20.jpg” />
But when people download our files, we want them to look like they’re coming from our domain, and not s3.amazonaws.com! If someone chooses to download our audio file, we want users to think it’s coming from our site. We’ll now set up our domain hosting so that the files are available through a URL under http://media.webscalecomputing.info/.
Setting up Your Domain
Since we already host our web site on www.webscalecomputing.info, we now want to create a sub-domain that we’ll point to the files located in Amazon S3. This is done by using a CNAME entry on our hosting provider.
Most popular web hosting companies will let you create a new CNAME record for your domain. For our hosting company, creating a new CNAME record consisted of logging into our account, then navigating through a few DNS configuration pages until we ended up at one that allows us to create a CNAME record.
To create the CNAME record, we specify an alias, “media”, and the domain it points to, “s3.amazonaws.com”.
Now, with the CNAME record in place, the media files are now available through the following URLs:
Our web page can now reference the media files.
<img data-src=”http://media.webscalecomputing.info/jeff-at-web20.jpg” />
Automatically Copying Files to Amazon S3
There are more ways you can use Amazon S3, including automatically copying files to Amazon S3. The Resource Center in the AWS Developer Connection web site has technical documentation, code samples, and other resources you can use to learn more about Amazon S3 and build your own applications to use the service. As always, the exact tutorial to read depends on the language you’re using, but here are a few possibilities:
Learning More About Amazon S3
Why not host the entire web site in Amazon S3 and just use a domain provider to set up the appropriate CNAME records? Although it’s certainly possible, you probably want to have a web server running, to perform server-side processing of your scripts, or to access a database. Amazon S3 is a storage solution, so it does not perform any server-side processing (although check out Amazon EC2 for information on scalable, virtual computing).
Of course, you don’t have to have a web site to use Amazon S3. Like Jeremy Zawodny, you can just as easily use Amazon S3 to backup your home computer. (I pay just over $1 a month to backup my important files.)
Learn more about Amazon S3:
- Amazon S3 web page
- Developer Connection web site
- Resource Center for Amazon S3
- Developer Forums
- Solutions Built on Amazon S3
Amazon Web Services is the effort of many minds, but the two here are Craig Noeldner and Mike Culver.