Use Varnish and Nginx to follow, hide and cache 301 / 302 redirects

Hello!

Varnish is a robust, open source and stable caching solution that has been employed on many different high traffic environments with a significant amount of success.

One of the things that we have come across, specifically with environments such as Amazon Web Services is that websites tend to spread their web stack across multiple services. For example static media such as JS, CSS and image files may be hosted on Amazon S3 storage. This would require either implementing additional CNAMES for your domain (i.e. static.yourdomain.com) that point to the S3 URL, or have your CMS redirect requests for static media to the S3 backend.

Remember with S3, you have to generate the static files and copy them over to S3, so these URLs may need to be generated and maintained by the CMS often times with a redirect (301 or 302) that rewrites the URL to the S3 backend destination.

When Varnish is caching a website, and it comes across a request that is rewritten in a 301/302 redirect by the backend response (“beresp”), varnish typically will simply cache the 301/302 redirect as a response, saving that minuscule amount of processing power that needed to happen to process the request and send the rewrite. Some may argue that that is simply negligible!

Wouldn’t it be nice to actually cache the content after the redirect happens? There’s two ways one could go about doing this.

Cache 301 / 302 Redirects with Varnish 4

In simpler setups, Varnish can simply process the rewrite/redirect and change the url and issue a return(restart) to restart the request with the new URL. This means that Varnish is processing the rewrite and returning the new url so that it can be re-requested and ultimately cached. So in vcl_deliver you can add the following :

The above should work for you if, lets say, you are using Varnish in front of a simple apache/nginx server that is then processing the request. If Varnish is sending traffic to another proxy (i.e. nginx + proxy_pass), then this above directive may not work for you. The reason why one may want to proxy traffic from varnish to another proxy like nginx may be in a scenario where you want to do some fancy redirection of traffic + DNS resolution. Still confused?

Lets say varnish is at the edge of a network, caching a complicated website. Requests to varnish need to go to another load balancer (i.e. an Amazon ELB). ELB endpoints are never going to be a static IP address and Varnish (as of v4) cannot do DNS resolution of hostnames on a per request basis, so you would need to proxy the request to Nginx which would handle the reverse proxy over to ELB which would then load balance the backend fetch to the CMS.

If your scenario sounds more like the aforementioned one, then you could try following the 301/302 redirect with nginx instead of varnish.

Cache 301 / 302 Redirects with Nginx

Nginx and Varnish seem to go hand in hand. They’re great together! In this scenario you are using Varnish as your edge cache and sending all backend requests to an nginx proxy_pass directive. In order to tell Nginx to follow a redirect before sending any response to Varnish (and ultimately the end-user), you can tell varnish to simply save the redirect location and return the response after redirecting back to Varnish so it can simply cache the response!

You can see that the proxy_pass directive is configured normally. In the event of any 301, 302 or 307, process it with the @handle_redirects location directive. Then simply proxy pass the $saved_redirect_location as if it were the backend server! This means that even if the proxy_pass location is not even in your Varnish configuration as a valid hostname (i.e. random S3 url) Varnish will still cache it, thinking it is backend-server.com.

Hopefully this will help someone out there!

Force SSL for your site with Varnish and Nginx

Hello!

For those of you who depend on Varnish to offer robust caching and scaling potential to your web stack, hearing about Google’s prioritization (albeit arguably small, for now) of sites that force SSL may cause pause in how to implement.

Varnish currently doesn’t have the ability to handle SSL certificates and encrypt requests as such. It may never actually have this ability because its focus is to cache content and it does a very good job I might add.

So if Varnish can’t handle the SSL traffic directly, how would you go about implementing this with Nginx?

Well, nginx has the ability to proxy traffic. This is one of the many reasons why some admins choose to pair Varnish with Nginx. Nginx can do reverse proxying and header manipulation out of the box without custom modules or configuration. Combine that with the lightweight production tested scalability of Nginx over Apache and the reasons are simple. We’re not interested in that debate here, just a simple implementation.

Nginx Configuration

With Nginx, you will need to add an SSL listener to handle the ssl traffic. You then assign your certificate. The actual traffic is then proxied to the (already set up) non-https listener (varnish).

The one thing to note before going further is the second last line of the configuration. That is important because it allows you to avoid an infinite redirect loop of a request proxying to varnish, varnish redirecting non-ssl to ssl and back to nginx for a proxy. You’ll notice that pretty quickly because your site will ultimately go down 🙁

What nginx is doing is defining a custom HTTP header and assigning a value of “https” to it :

So the rest of the nginx configuration can remain the same (the configuration that varnish ultimately forwards requests in order to cache).

Varnish

What you’re going to need in your varnish configuration is a minor adjustment :

What the above snippet is doing is simply checking if the header “X-Forwarded-Proto” (that nginx just set) exists and if the value equals (case insensitive) to “https”. If that is not present or matches , it sets a redirect to force the SSL connection which is handled by the nginx ssl proxy configuration above. Its also important to note that we are not just doing a clean break redirect, we are still appending the originating request URI in order to make it a smooth transition and potentially not break any previously ranked links/urls.

The last thing to note is the error 750 handler that handles the redirect in varnish :

You can see that were using a 302 temporary redirect instead of a permanent 301 redirect. This is your decision though browsers tend to be stubborn in their own internal caching of 301 redirects so 302 is good for testing.

After restarting varnish and nginx you should be able to quickly confirm that no non-SSL traffic is allowed anymore. You can not only enjoy the (marginal) SEO “bump” but you are also contributing to the HTTPS Everywhere movement which is an important cause!

Web based system to purge multiple Varnish cache servers

Hello!

We have been working with varnish for quite a while. And there is quite a lot of documentation out there already for the different methods for purging cache remotely via Curl, the varnish admin tool sets and other related methods.

We deal with varnish in the Amazon Cloud as well as on dedicated servers. In many cases varnish sits in a pool of servers in the web stack before the web services such as Nginx and Apache. Sometimes purging specific cache urls can be cumbersome when you’re dealing with multiple cache servers.

Depending on the CMS you are using, there is some modules / plugins that are available that offer the ability to purge Varnish caches straight from the CMS, such as the Drupal Purge module.

We have decided to put out a secure, web accessible method for purging Varnish cached objects across multiple varnish servers. As always, take the word “secure” with a grain of salt. The recommended way to publish a web accessible method on apache or nginx that gives the end-user the ability to request cache pages be purged would be to take these fundamentals into consideration :

– Make the web accessible page available only to specific source IPs or subnets
– Make the web accessible page password protected with strong passwords and non-standard usernames
– Make the web accessible page fully available via SSL encryption

On the varnish configuration side of things, with security still in mind, you would have to set up the following items in your config :

ACL

Set up an access control list in varnish that only allows specific source IPs to send the PURGE request. Here is an example of one :

vcl_recv / vcl_hit / vcl_miss / vcl_pass

This is self explanatory (I hope). Obviously you would be integrating the following logic into your existing varnish configuration.

The code itself is available on our GitHub Project page. Feel free to contribute and add any additional functionality.

It should be important to note that what differentiates our solution among the existing ones out there is that our script will manipulate the host headers of the Curl request in order to submit the same hostname / url request across the array of varnish servers. That way the identical request can be received by multiple varnish servers with no local host file editing or anything like that.

There is lots of room for input sanity checks, better input logic and other options to perhaps integrate with varnish more intuitively. Remember this is a starting point, but hopefully it is useful for you!

Varnish Caching with Joomla

Hello There!

One of the exciting new technologies to come out in the last few years is a tremendously efficient and dynamic caching system called Varnish (see : http://www.varnish-cache.org).

We have been employing the use of Varnish for high traffic websites for the purposes of user experience improvements as well as for redundancy and load balancing purposes.

Varnish can do it all – complex load balancing and polling based on many different weighting methodologies for fail over, as well as holding on to a “stale” cache in the event of a back end web server outage, or perhaps for geographic redundancy (holding a stale cache in a secondary data center).

One of the challenges we have faced in the many different implementations of varnish into web stacks, is dealing with dynamic and user session (i.e. “logged in”) content.

If the Internet was full of only static (see 1995) html files, varnish would work beautifully out of the box. Unfortunately the web is a complicated mess of session based authentication, POSTS, GETS and query strings among a few things.

One of our recent accomplishments was getting the Joomla 1.5 content management system to work with Varnish 2.1.

The biggest challenge for Joomla was that it creates a session cookie for all users. This means the session is created and established for any guest visiting the site, and if they decide to log in , that same session is used to establish a logged in session through authentication. This is an apparent effort to deter or avoid session hijacking.

The problem with this is that Varnish ends up caching all the logged in session content, as well as the anonymous front page content.

I spent a significant amount of time fine tuning my VCL (varnish configuration language) to play nice with Joomla. Unfortunately it became apparent that some minor modifications to the Joomla code was necessary in order for it to communicate properly with Varnish.

Step 1 : Move the login form off the front page

I realize this might be a hard decision. I cant offer an alternative. If you have an integrated login form on the front page of your site, and you wish to cache that page with varnish, you will likely have to chose one or the other. It would probably be ideal to replace that login form with a button to bring the user to a secondary page off the main page.

For the sake of argument, lets call our site “example.com” and the login page url within Joomla should look like the following :

http://www.example.com/index.php?option=com_user&view=login

Take note of login URI in this string.

The reason we need the login form on a secondary page is because we need an almost “sandboxed” section of the site where the anonymous session cookie can be established, and passed through the authentication process to a logged in session. We will tell varnish to essentially ignore this page.

Step 2 : Modify Joomla to send HTTP headers for user/guest sessions

This isn’t that hard. In the Joomla code, there is a section where it defines the HTTP headers it sends to the browser for cache variables such as expire times and whatnot. I’m going to assume you have turned off the built-in Joomla caching system.

What you need to do is tell Joomla to send a special HTTP header that will give either a True or False value if the user is logged in or not. This is useful information. It will allow varnish to not cache any logged in content such as “Welcome back, USERNAME” after the user is passed back to the front page from logging in.

In my joomla installation, I modified the following file :

The parent folder being the public_html / root folder for your Joomla installation. In this file, please find the line that determines if the Joomla caching system is disabled :

After this line, you will see about 5 HTTP header declarations (expires, last-modified, cache-control, cache-control again and pragma). Above those declarations , add the following 6 lines of code :

If you read the above code, its fairly straight forward. I do a check to see if the user is a guest (aka anonymous) or not. If they are logged in I send an HTTP header called “X-Logged-In”, and assign a “True” value to it. If the user is not logged in, it sets it to “False”.

Pretty easy, right?

This will allow varnish to avoid caching a logged in user’s page.

Step 3 : Configure Varnish

This is the part that took the most time during this entire process. Mind you patching the Joomla code and whatnot took some time as well, this process took a lot of experimentation and long hours examining session cookies and host headers.

What I will do is break down the generalized configuration directives into two groups : VCL_RECV and VCL_FETCH.

VCL_RECV

In here, I set a bunch of IF statement directives to tell varnish what it should look up in the cache and what it should pipe to the backend and what it should pass. This could probably be optimized and improved upon, but it works for me :

VCL_FETCH

The fetch section is a little bit easier. I only have about 5 directives. The first one is the most important one you want to look at. It “unsets” the cookie from any page on the site, EXCEPT the login page. This allows varnish to properly establish the logged in session. The subsequent rules determine what to deliver and what to pass based on URI or HTTP header checks :

Thats it! I just saved you many sleepless nights (I hope!). Hopefully your headers will look something like this after you implement varnish in front of Joomla :

UPDATE : 12/08/2011

I realize I made a mistake and have corrected this post. In vcl_fetch, i had the following :

Well I realize I should be unsetting the response cookie, not the set cookie. For some reason, the above (erroneous) directive works only right after you login. If you start clicking around the site, your logged in session disappears. I suspect this is because either joomla or varnish is mistakenly unsetting a logged in session.

This is the correct entry (I have fixed it in my original post as well) :

After making the above change, I can login and browse the site and my session stays intact. Mind you, the Joomla site I am testing with is definitely not a vanilla Joomla installation.

I’d love to hear from anyone who has accomplished the above scenario either way!