Nginx, Varnish, HAProxy, and Thin/Lighttpd

29/9/2009
7-minute read

Over the last few days, I have been playing with Ruby on Rails again and came across Thin, a small, yet stable web server which will serve applications written in Ruby.

This is a small tutorial on how to get Nginx, Varnish, HAProxy working together with Thin (for dynamic pages) and Lighttpd (for static pages).

I decided to take this route as from reading in many places I found that separating static and dynamic content improves performance significantly.

Nginx

Nginx is a lightweight, high performance web server and reverse proxy. It can also be used as an email proxy, although this is not an area I have explored. I will be using Nginx as the front-end server for serving my rails applications.

I installed Nginx using the RHEL binary package available from EPEL.

Configuration of Nginx is very simple. I have kept it very simple, and made Nginx My current configuration file consists of the following:

user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    log_format main '$remote_addr - $remote_user [$time_local] $request "$status" $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';
    sendfile on;
    tcp_nopush on;
    tcp_nodelay off;
    keepalive_timeout 5;

    # This section enables gzip compression.
    gzip on;
    gzip_comp_level 2;
    gzip_proxied any;
    gzip_types text/plain text/html text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

    # Here you can define the addresses on which varnish will listen. You can place multiple servers here, and nginx will load balance between them.
    upstream cache_servers {
      server localhost:6081 max_fails=3 fail_timeout=30s;
    }

    # This is the default virtual host.
    server {
        listen 80 default;
        access_log /var/log/nginx/access.log main;
        error_log /var/log/nginx/error.log;
        charset utf-8;

        # This is optional. It serves up a 1x1 blank gif image from RAM.
        location = /1x1.gif {
          empty_gif;
        }

        # This is the actual part which will proxy all connections to varnish.
        location / {
          proxy_pass http://cache_servers/;
          proxy_redirect http://cache_servers/ http://$host:$server_port/;
          proxy_set_header Host $host;
          proxy_set_header X-Real-IP $remote_addr;
          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

Varnish

Varnish is a high performance caching server. We can use Varnish to cache content which will not be changed often.

I installed Varnish using the RHEL binary package available from EPEL as well. Initially, I only needed to edit /etc/sysconfig/varnish, and configure the address on which varnish will listen on.

DAEMON_OPTS="-a localhost:6081 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -u varnish -g varnish \
             -s file,/var/lib/varnish/varnish_storage.bin,10G"`

This will make varnish listen on port 6081 for normal HTTP traffic, and port 8082 for administration.

Next, you must edit /etc/varnish/default.vcl to actually cache data. My current configuration is as follows:

backend thin {
  .host = "127.0.0.1";
  .port = "8080";
}

backend lighttpd {
  .host = "127.0.0.1";
  .port = "8081";
}

sub vcl_recv {
    if (req.url ~ "^/static/") {
        set req.backend = lighttpd;
    } else {
        set req.backend = thin;
    }

    # Allow purging of cache using shift + reload
    if (req.http.Cache-Control ~ "no-cache") {
        purge_url(req.url);
    }

    # Unset any cookies and autorization data for static links and icons, and fetch from catch
    if (req.request == "GET" && req.url ~ "^/static/" || req.request == "GET" && req.url ~ "^/icons/") {
        unset req.http.cookie;
        unset req.http.Authorization;
        lookup;
    }

    # Look for images in the cache
    if (req.url ~ "\.(png|gif|jpg|ico|jpeg|swf|css|js)$") {
        unset req.http.cookie;
        lookup;
    }

    # Do not cache any POST'ed data
    if (req.request == "POST") {
        pass;
    }

    # Do not cache any non-standard requests
    if (req.request != "GET" && req.request != "HEAD" &&
        req.request != "PUT" && req.request != "POST" &&
        req.request != "TRACE" && req.request != "OPTIONS" &&
        req.request != "DELETE") {
        pass;
    }

    # Do not cache data which has an autorization header
    if (req.http.Authorization) {
        pass;
    }

    lookup;
}

sub vcl_fetch {
    # Remove cookies and cache static content for 12 hours
    if (req.request == "GET" && req.url ~ "^/static/" || req.request == "GET" && req.url ~ "^/icons/") {
        unset obj.http.Set-Cookie;
        set obj.ttl = 12h;
        deliver;
    }

    # Remove cookies and cache images for 12 hours
    if (req.url ~ "\.(png|gif|jpg|ico|jpeg|swf|css|js)$") {
        unset obj.http.set-cookie;
        set obj.ttl = 12h;
        deliver;
    }

    # Do not cache anything that does not return a value in the 200's
    if (obj.status >= 300) {
        pass;
    }

    # Do not cache content which varnish has marked uncachable
    if (!obj.cacheable) {
        pass;
    }

    # Do not cache content which has a cookie set
    if (obj.http.Set-Cookie) {
        pass;
    }

    # Do not cache content with cache control headers set
    if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") {
        pass;
    }

    if (obj.http.Cache-Control ~ "max-age") {
        unset obj.http.Set-Cookie;
        deliver;
    }

    pass;
}

HAProxy

HAProxy is a high performance TCP/HTTP load balancer. It can be used to load balance almost any type of TCP connection, although I have only used it with HTTP connections.

We will be using HAProxy to balance connections over multiple thin instances.

HAProxy is also available in EPEL. My HAProxy configuration is as follows:

global
  daemon
  log 127.0.0.1 local0
  maxconn 4096
  nbproc 1
  chroot /var/lib/haproxy
  user haproxy
  group haproxy

defaults
  mode http
  clitimeout 60000
  srvtimeout 30000
  timeout connect 4000
  option httpclose
  option abortonclose
  option httpchk
  option forwardfor
  balance roundrobin
  stats enable
  stats refresh 5s
  stats auth admin:123abc789xyz

listen thin 127.0.0.1:8080
  server thin 10.10.10.2:2010 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2011 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2012 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2013 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2014 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2015 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2016 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2017 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2018 weight 1 minconn 3 maxconn 6 check inter 20000
  server thin 10.10.10.2:2019 weight 1 minconn 3 maxconn 6 check inter 20000

Thin

My Thin server is running on a separate Gentoo box. I installed Thin using the package in Portage.

To configure Thin, I used the following command:

thin config -C /etc/thin/config-name.yml -c /srv/myapp --servers 10 -e production -p 2010

This configures thin to start 10 servers, listening on port 2010 to 2019. If you want an init script for Thin, so you can start it at boot, run

thin init

This is will create the init script, and you can set it to start up at boot using the normal method (rc-update add thin default or chkconfig thin on).

You should now be able to reach your rails app through http://nginx.servers.ip.address.

Next, we must configure the static webserver.

Lighttpd

I decided to go with Lighttpd as it is a fast, stable and lightweight webserver which will do the job perfectly with little configuration.

You could also use nginx as the static server instead of using lighttpd, but I decided to separate it.

I decided to use the package from EPEL for Lighttpd, and found that most of the default configuration was as I wanted it to be. The only thing I needed to change was the port and address the server was listening on:

server.port = 8081
server.bind = "127.0.0.1"

And that’s pretty much it!

Now you just have to dump any static content into /var/www/lighttpd/ (the default location that the Lighttpd package in EPEL is configured to use) and reference any static links using “/static/document_path_of_file”, such as if I put an image into /var/www/lighttpd/images/ called “bg.png”, I can reach it using http://servers_hostname/static/images/bg.png.

I have not really done any performance tests on how well this works, and there are probably many things which I could have done better. This was mainly an experiment, so I am always looking for feedback or tips on how to make this better, so please do contact me if you have any suggestions! 🙂

Linux HAProxy HTTP Lighttpd Nginx Ruby on Rails Varnish