This is Article #3 of a 4-part series. For a good primer, check out the first two articles listed below. Otherwise, jump right in!
In my last post, I discussed the three techniques used to improve asset load speed. In this post, I will discuss how to combine the use of GZipping and a Content Delivery Network (CDN) for the fastest possible page loads.
Everyone's favorite CDN these days is Amazon's CloudFront service, which serves files directly from Amazon's scalable "simple storage system", Amazon S3. It is very easy to work with, has widespread support in Ruby gems and plugins (and countless other libraries), and is very inexpensive with it's pay-as-you-go billing.
However, there is one large pitfall to using Amazon S3 + CloudFront, and that is that neither S3 nor CloudFront support GZip detecting and encoding. It would seem that we need to now decide whether we'll do without GZipping or using a CDN. Not so! There is another way.
Amazon S3 and CloudFront servers do not detect whether the incoming requests accept GZip encoding, and so they are not able to Gzip and serve components on the fly. Then, it's simply a matter of figuring out whether we should link to the compressed or the uncompressed components when the user visits the page.
This solution is similar to the last, except that it attacks the problem one step earlier in the workflow. So, let's take a step back. Instead of linking to the asset through our own server, this time, we'll revert to linking directly to CloudFront:
<link href="http://xxxxxx.cloudfront.net/stylesheets/application.css" media="screen" rel="stylesheet" type="text/css" />
However, this time we'll have our application (whether it be Ruby, PHP, Python, or whatever) detect if the request header accepts GZip encoding, and rewrite the asset tag accordingly.
<link href="http://xxxxxx.cloudfront.net/stylesheets/application.css.gz" media="screen" rel="stylesheet" type="text/css" />
I won't go into detail about how to actually accomplish this, because the truth is, this won't work either.
This will only work as long as your code is run dynamically every time a user loads the page. That means, once you implement this strategy, you no longer have the option to cache the page. Ever.
Sure, you could probably come up with some system that creates two versions of each cached page (one with gzipped links and one without), but that will add a lot of complexity to your server setup and filesystem, and it's just too much trouble. So, let's move on to another solution.
Now this first solution may seem clever, but let's see if you can figure out why it won't work. The idea here is that rather link to a stylesheet, for example, on CloudFront like this:
<link href="http://xxxxxx.cloudfront.net/stylesheets/application.css" media="screen" rel="stylesheet" type="text/css" />
...we'll instead link to our own server, which will read the request and redirect to either the compressed or the uncompressed stylesheet on CloudFront as appropriate.
<link href="http://compressed.yourdomain.com/stylesheets/application.css" media="screen" rel="stylesheet" type="text/css" />
And then the Apache configuration for the compressed.yourdomain.com virtual host would look like this:
<VirtualHost *:80>
ServerName compressed.yourdomain.com
DocumentRoot /home/user/yourapp/current/public
RewriteEngine On
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} -s
RewriteRule ^(.+) http://xxxxxx.cloudfront.net$1.gz
</VirtualHost>
Remember in the last article, one of the added benefits of off-loading your assets to a CDN is that your server no longer must listen for and respond to asset requests. This solution rescinds that benefit; even though the asset download still takes place between the CDN server and the user's browser, the initial request must still go through your server to be resolved to the CDN.
Furthermore, each component request now requires two DNS lookups instead of one, which adds to the request latency (though the request is latency is small compared to the download time in the request/download cycle).
But the real reason this won't work well is because it disrupts the magic that make CDNs fast. A CDN is beneficial primarily because serves files faster by choosing, for each request, the server (or "service node") that is closest in proximity to the user. CDNs are able to estimate the closest server in the CDN using a variety of techniques, including reactive probing, proactive probing, and connection monitoring. (See Content Networking Techniques for more info)
By inserting your server (acting as a proxy) into the request cycle between the user's computer and the CDN, you may cause the CDN to choose a sub-optimal service node for the delivery of content directly to the user. If the CDN probes the network from the request side, it will most likely choose the edge node location closest to your server rather than to the user's computer, completely negating the benefit of using the CDN in the first place.
To illustrate this point, consider the typical request/download cycle for a javascript file served from your application's server:
Below is a simplified diagram of the typical request/response cycle for a javascript file when using a CDN to serve the component.
This final diagram depicts the request/response cycle when delivery components through the CDN with your application server acting as a proxy (so that your app server can read the request and tell the CDN whether to serve the unzipped or the zipped component).
Notice in the diagram above, the CDN should have chosen the service node closest to the User, so that the javascript file would have less distance to travel and would thus download the fastest. Instead, it chose the node closer to the application server that proxied the request to the CDN.
The graph below compares download times for the user from my server (located in St. Louis, Missouri), from a server in Amazon CloudFront's CDN, and from CloudFront with my server acting as a proxy. I performed this comparison from my own computer here in Ann Arbor, MI, while my buddy, Dave Leal, downloaded the file from his computer in Portugal.
At this point, it may seem like we can choose between two alternatives:
In our last post, we saw that Gzipping our components can compress them down to ~25% of their original size, which means they'll transfer 4X faster. And in this post, we see that serving components from Amazon CloudFront can transfer component files ~2X faster*.
Ideally we'd be able to do both (and some other more expensive Content Delivery Networks actually will allow you to). But if we must choose, compressible file-types gain much more by way of serving them compressed, than by serving them uncompressed from a CDN edge location. So, we will serve compressible file-types (stylesheets, javascripts, and static HTML files) from our own server, GZipped.
However, images are already compressed in the image encoding; image file size is unaffected by Gzipping them on our server. So, we may as well allow images to benefit from the 2X speed improvement by serving them straight from our Amazon CloudFront CDN.
Using this solution for hosting/serving components, we've been able to reduce page load time by 75% on several of our sites.
If you have a Ruby on Rails application, implementing this solution is easy, and won't take you more than an hour or so. Stay tuned for Part 4: Caching, Zipping, and (Amazon CloudFront) CDN For A Rails App.
Comments are loading...