Nginx, Varnish, HAProxy, and Thin/Lighttpd

Over the last few days, I’ve been playing with Ruby on Rails again and came across Thin, a small, yet stable web server which will serve applications written in Ruby.

This is a small tutorial on how to get Nginx, Varnish, HAProxy working together with Thin (for dynamic pages) and Lighttpd (for static pages).

I decided to take this route as from reading in many places I found that separating static and dynamic content improves performance significantly.

Nginx

Nginx is a lightweight, high performance web server and reverse proxy. It can also be used as an email proxy, although this is not an area I have explored. I will be using Nginx as the front-end server for serving my rails applications.

I installed Nginx using the RHEL binary package available from EPEL.

Configuration of Nginx is very simple. I have kept it very simple, and made Nginx My current configuration file consists of the following:

user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] $request "$status" $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';

    sendfile on;
    tcp_nopush on;
    tcp_nodelay off;

    keepalive_timeout 5;

    # This section enables gzip compression.
    gzip on;
    gzip_comp_level 2;
    gzip_proxied any;
    gzip_types text/plain text/html text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

    # Here you can define the addresses on which varnish will listen. You can place multiple servers here, and nginx will load balance between them.
    upstream cache_servers {
      server localhost:6081 max_fails=3 fail_timeout=30s;
    }

    # This is the default virtual host.
    server {
        listen 80 default;
        access_log /var/log/nginx/access.log main;
        error_log /var/log/nginx/error.log;
        charset utf-8;

        # This is optional. It serves up a 1x1 blank gif image from RAM.
        location = /1x1.gif {
          empty_gif;
        }

        # This is the actual part which will proxy all connections to varnish.
        location / {
          proxy_pass http://cache_servers/;
          proxy_redirect http://cache_servers/ http://$host:$server_port/;

          proxy_set_header Host $host;
          proxy_set_header X-Real-IP $remote_addr;
          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

Varnish

Varnish is a high performance caching server. We can use Varnish to cache content which will not be changed often.

I installed Varnish using the RHEL binary package available from EPEL as well. Initially, I only needed to edit /etc/sysconfig/varnish, and configure the address on which varnish will listen on.

DAEMON_OPTS="-a localhost:6081 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -u varnish -g varnish \
             -s file,/var/lib/varnish/varnish_storage.bin,10G"

This will make varnish listen on port 6081 for normal HTTP traffic, and port 8082 for administration.

Next, you must edit /etc/varnish/default.vcl to actually cache data. My current configuration is as follows:

backend thin {
  .host = "127.0.0.1";
  .port = "8080";
}

backend lighttpd {
  .host = "127.0.0.1";
  .port = "8081";
}

sub vcl_recv {
    if (req.url ~ "^/static/") {
        set req.backend = lighttpd;
    } else {
        set req.backend = thin;
    }

    # Allow purging of cache using shift + reload
    if (req.http.Cache-Control ~ "no-cache") {
        purge_url(req.url);
    }

    # Unset any cookies and autorization data for static links and icons, and fetch from catch
    if (req.request == "GET" && req.url ~ "^/static/" || req.request == "GET" && req.url ~ "^/icons/") {
        unset req.http.cookie;
        unset req.http.Authorization;
        lookup;
    }

    # Look for images in the cache
    if (req.url ~ "\.(png|gif|jpg|ico|jpeg|swf|css|js)$") {
        unset req.http.cookie;
        lookup;
    }

    # Do not cache any POST'ed data
    if (req.request == "POST") {
        pass;
    }

    # Do not cache any non-standard requests
    if (req.request != "GET" && req.request != "HEAD" &&
        req.request != "PUT" && req.request != "POST" &&
        req.request != "TRACE" && req.request != "OPTIONS" &&
        req.request != "DELETE") {
        pass;
    }

    # Do not cache data which has an autorization header
    if (req.http.Authorization) {
        pass;
    }

    lookup;
}

sub vcl_fetch {
    # Remove cookies and cache static content for 12 hours
    if (req.request == "GET" && req.url ~ "^/static/" || req.request == "GET" && req.url ~ "^/icons/") {
        unset obj.http.Set-Cookie;
        set obj.ttl = 12h;
        deliver;
    }

    # Remove cookies and cache images for 12 hours
    if (req.url ~ "\.(png|gif|jpg|ico|jpeg|swf|css|js)$") {
        unset obj.http.set-cookie;
        set obj.ttl = 12h;
        deliver;
    }

    # Do not cache anything that does not return a value in the 200's
    if (obj.status >= 300) {
        pass;
    }

    # Do not cache content which varnish has marked uncachable
    if (!obj.cacheable) {
        pass;
    }

    # Do not cache content which has a cookie set
    if (obj.http.Set-Cookie) {
        pass;
    }

    # Do not cache content with cache control headers set
    if(obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") {
        pass;
    }

    if (obj.http.Cache-Control ~ "max-age") {
        unset obj.http.Set-Cookie;
        deliver;
    }

    pass;
}

HAProxy

HAProxy is a high performance TCP/HTTP load balancer. It can be used to load balance almost any type of TCP connection, although I have only used it with HTTP connections.

We will be using HAProxy to balance connections over multiple thin instances.

HAProxy is also available in EPEL. My HAProxy configuration is as follows:

global
  daemon
  log 127.0.0.1 local0
  maxconn 4096
  nbproc 1
  chroot /var/lib/haproxy
  user haproxy
  group haproxy

defaults
  mode http
  clitimeout 60000
  srvtimeout 30000
  timeout connect 4000

  option httpclose
  option abortonclose
  option httpchk
  option forwardfor

  balance roundrobin

  stats enable
  stats refresh 5s
  stats auth admin:123abc789xyz

listen thin 127.0.0.1:8080
  server thin 10.10.10.2:2010 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2011 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2012 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2013 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2014 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2015 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2016 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2017 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2018 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2019 weight 1 minconn 3 maxconn 6 check inter 20000

Thin

My Thin server is actually run on a separate Gentoo box. I installed Thin using the package in Portage.

To configure Thin, I used the following command:

thin config -C /etc/thin/config-name.yml -c /srv/myapp --servers 10 -e production -p 2010

This configures thin to start 10 servers, listening on port 2010 to 2019. If you want an init script for Thin, so you can start it at boot, run

thin init

This is will create the init script, and you can set it to start up at boot using the normal method (rc-update add thin default or chkconfig thin on).

You should now be able to access your rails app through http://nginx.servers.ip.address

Next, we must configure the static webserver.

Lighttpd

I decided to go with Lighttpd as it is a fast, stable and lightweight webserver which will do the job perfectly with little configuration.

You could also use nginx as the static server instead of using lighttpd, but I decided to separate it.

I decided to use the package from EPEL for Lighttpd, and found that most of the default configuration was as I wanted it to be. The only thing I needed to change was the port and address the server was listening on:

server.port = 8081
server.bind = "127.0.0.1"

And thats pretty much it! Now you just have to dump any static content into /var/www/lighttpd/ (the default location that the Lighttpd package in EPEL is configured to use) and reference any static links using “/static/document_path_of_file”, for example if I put an image into /var/www/lighttpd/images/ called “bg.png”, I can access it using http://servers_hostname/static/images/bg.png.

I have not really done any performance tests onto how well this works, and there are probably many things which I could have done better. This is the first time I made any attempt HTTP performance tuning, and so I am always looking for feedback or tips on how to make this better, so please do contact me if you have any suggestions! :)

  1. Thanks, the configs are helpful. What was the reasoning behind using both lighttpd and nginx? Seems like the setup is needlessly complex, can you explain your reasoning behind it? Thanks.

  2. Well, there isn’t a real reason behind it.

    I was originally going to just use Nginx, but I had most of my static content on a different server and I didn’t want to move it over to the new one at that time, so I just stuck with using Nginx to proxy connections to lighttpd.

    I’ll probably move the data over some time soon though, I don’t think there is any need in having both running.

    • geoffrey
    • March 1st, 2010

    Hey,

    That was very useful. But in your default.vci varnish cfg file, you wrote this :
    backend thin {
    .host = “127.0.0.1″;
    .port = “8080″;
    }

    Whitch server is thin? Is it your varnish server?
    On my configuration, I’ve actually one squid settin on reverse proxy mode, one nginx who’s for the static files (css,jpg,text..), and one apache who’s for dynamic content.
    So my question is : thin = my nginx server?

    Sorry for my english i’m a french guy :p
    thanks

  3. Thin is actually the name of the Ruby HTTP server I use to serve my Rails applications.

    In my varnish configuration, the backend definition is actually pointing to HAproxy which in turn load-balances connections and between my Thin instances (Nginx -> varnish -> HAproxy -> Multiple Thin instances).

    If you only have one nginx server which is serving static content, you could just point varnish to that dynamic server rather than to HAProxy.

    For example if you have your nginx server running on 10.2.1.2:80, you could do :
    backend static {
    .host = “10.2.1.2″;
    .port = “80″;
    }

    and use “static” instead of “thin” everywhere else in your varnish configuration.

    I hope that helps.

    • geoffrey
    • March 2nd, 2010

    First thanks for your quick answer.
    And in my configuration I had already did that and it works fine.
    Thanks for for great tutorial, and for your time, I appreciate it

  1. No trackbacks yet.