I have been making up ground using the Scaling Rails screencasts. In episode 11 which provides coverage for advanced HTTP caching (using reverse proxy caches for example Varnish and Squid etc.), they recommend only thinking about utilizing a reverse proxy cache once you have already exhausted the options of page, action and fragment caching in your Rails application (in addition to memcached etc. but that is not highly relevant to this).

Things I can't quite seem to comprehend is how utilizing an HTTP reverse proxy cache can offer a performance boost to have an application that already uses page caching. To simplify matters, let us think that I am speaking in regards to a single host here.

This really is my knowledge of how both techniques work (maybe I am wrong):

  • With page caching the Rails process is hit initially after which creates a static HTML file that's offered directly through the Web server for subsequent demands, as lengthy because the cache for your request applies. When the cache has expired then Rails is hit again and also the static file is regenerated using the up-to-date content ready for the following request

  • By having an HTTP reverse proxy cache the Rails process is hit once the proxy must see whether this content is stale or otherwise. This is accomplished using various HTTP headers for example ETag, Last-Modified etc. When the submissions are fresh then Rails responds towards the proxy by having an HTTP 304 Not Modified and also the proxy serves its cached content towards the browser, as well as, responds using its own HTTP 304. When the submissions are stale then Rails serves the up-to-date content towards the proxy which caches it after which serves it towards the browser

If my understanding is correct, then does not page caching lead to less hits towards the Rails process? There is not everything backwards and forwards to find out when the submissions are stale, meaning better performance than reverse proxy caching. Why might you utilize both approaches to conjunction?

You're right.

The only real reason to think about it's in case your apache sets expires headers. Within this configuration, the proxy may take a few of the load off apache.

Getting stated this, apache static versus proxy cache is virtually an irrelevancy within the rails world. Both are astronomically fast.

The advantages you can get could be for the none page cacheable stuff.

I favor using proxy caching over page caching (ala heroku), but thats just me, along with a digression.

A great proxy cache implementation (e.g., Squid, Traffic Server) is massively more scalable than Apache while using the prefork MPM. If you are while using worker MPM, Apache is alright, but a proxy it's still a lot more scalable at high loads (hundreds of 1000's of demands / second).

Varnish for instance includes a feature once the synchronised demands towards the same URL (which isn't in cache) are queued and just single/first request really hits the rear-finish. That may prevent some nasty dog-pile cases that are extremely difficult to workaround in traditional page caching scenario.

Utilizing a reverse proxy inside a setup with just one application server appears a little overkill IMO. Inside a configuration using more than one application server, a reverse proxy (e.g. varnish, etc.) is the best way for page caching.

Think about a setup with 2 application servers:

  • User 'Bob'(rerouted to node 'A') posts a brand new message, the page will get expired and recreated on node 'A'.

  • User 'Cindy' (rerouted to node 'B') demands the page in which the new message from 'Bob' should appear, but she can't begin to see the new message, since the page on node 'B' wasn't expired and recreated.

This concurrency problem might be solved having a reverse proxy.