Albert Blog

Some time ago a friend introduced me to Varnish Cache, an excellent caching HTTP reverse proxy. In simpler terms, it's a piece of software between the browser and the back-end server that intercepts (and potentially cache) every request made to the server. It runs in Linux, but that should not stop ASP.NET developers using it. You'll need some basic Linux skills, but with Google's help it's easily doable.

Recently, I moved a few sites to use it and there's a very noticeable performance improvement. Along the road, I learned some things which I'm sharing here. If you're designing a new site and you expect some traffic (more than a few thousand hits/day), you should definitely design it with the intention of supporting a caching HTTP reverse proxy. However in most cases we have to maintain/improve existing sites and in these cases you have to dig dipper into Varnish and HTTP protocol details to get things working. Some things to keep in mind, especially if you're using ASP.NET in back-end are:

You can use a single Varnish instance to server multiple sites. For this to work, first you define all the back-ends in your VCL file. Next in vcl_recv sub-routine, you set the correct back-end depending on the request (typically the host), with something along the lines:
```
if(req.http.host == "example1.com")
  set req.backend_hint = "backend1";
if(req.http.host == "example2.com")
  set req.backend_hint = "backend2";
```
In anything, but very simple configurations, you have to learn a bit of VCL language and to use varnishlog. VCL is pretty straighforward for a developer, but varnishlog is your friend to debug it. It will show all the details of requests and responses going through varnish and at first it may look difficult to trace the details. However if you combine it with std.log() and grep you get useful info. To use std.log() you have to import the std module by using import statement in top of your VCL file. Anything you log using std.log() will go to varnishlog along with everything else that's being logged by default. To distinguish your logs, you can use some special string in the beginning, for example
```
std.log("AT DEBUG - Request for...")
```
and then using grep to get only your debug messages
```
varnishlog | grep "AT DEBUG"
```
Another helpful tip is to debug only request from your machine. For this you can test client.ip inside vcl_recv, like:
```
if(client.ip == "185.158.1.35") {
         req.x-at-debug = "1";
        std.log("AT DEBUG - recv URL: " + req.url + ". Cookies: '" + req.http.Cookie + "'");
      }
```
At the same time, I'm setting a new header (x-at-debug), so I can test against it in other routines (for example vcl_backend_response).
One of the biggest issues you're going to face with existing websites is handling of user specific content. Most websites have some section where users can login and access some additional content, manage they profile, add items to shopping cart, etc. You certainly don't want the shopping cart of one user to mix up with another user. Without varnish we take it as a given the session state and use it to manage any kind of user specific content. However behind the scenes, session state is made possible by cookies and varnish and cookies are not best friends! To understand this, you have to read a bit more about hashing and how Varnish store the cache internally. By default Varnish will generate the key based on the host/url combination, so cookies are ignored. While it's relatively easy to include cookies in hash, that's a bad idea. In a simple request to one of the websites I tested, the request cookie field contains the following among others:
```
_gat=1; __utmt=1; __asc=2990a40415733b8f022836a9f7f; __auc=e3955e251562dd1302a38b507f9; _ga=GA1.2.1395830573.1469647499; __utma=1.1395830573.1469647499.1474034735.1474041546.90; __utmb=1.4.9.1474041549617; __utmc=1; __utmz=1.1473756384.78.8.utmcsr=facebook.com|utmccn=(referral)|utmcmd=referral|utmcct=/
```
As you can see, these are third parties cookies from Google Analytics, Facebook, etc. Chances are the users will have similar cookies. If you hash all of them, you end up with different versions for each user and pretty much Varnish will be useless! The solution is to include cookies in hash, but to be very careful of what cookies you will use. The place to do this is in vcl_recv and you'll have do some work with cookies. The approach I have used, is to have a "special" cookie any time a user is logged on (for example "__user=1". Inside vcl_recv I do a test for this cookie and if found, return pass, meaning do not cache it:
```
if(req.http.Cookie ~ "__user=1")
 {
   # AT: Add extra header so we do not strip any cookie in backend response
   set req.http.x-keep-cookies = "1";
   return (pass);
 }
```
Also, as you can see in the comment, I add an additional header "x-keep-cookies", so that I can do the same test in vcl_backend_response:
```
if(bereq.http.x-keep-cookies == "1")
  {
    #AT: We should pass the response back to client as it is
    return (deliver);
  }
```
For all other users, I strip all cookies using unset req.http.Cookie in vcl_recv and unset beresp.http.set-cookie in vcl_backend_response. This is all fine except if you're using session state for not logged in user. Next item tackles this issue.
Typically in a large website, session state is used heavily, not only for logged in users, but for all users. For example you can keep shopping cart items in session before the user has logged in, or you may keep a flag if a user is shown a special offer the first time he gets in the site. Again, this works fine without Varnish, but will break with the above configuration. The reason is that ASP.NET session cookie (the same applies to PHP and other technologies) will get removed and the users will get every time a new ASP.NET session. To work around this issue, you should not use session state for users that are not logged in. Instead you have to rely on cookies to track the same thing. To make things easier, you should name all cookies that you want to keep in Varnish using the same prefix, for example "__at" and add logic in your VCL file to keep these cookies both in request and response. In vcl_recv, you can do the trick with some regular expression:
```
set req.http.Cookie = ";" + req.http.Cookie;
   set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
   set req.http.Cookie = regsuball(req.http.Cookie, ";(__at.*)=", "; \1=");
   set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
   set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
```
Things get more complicated in vcl_backend_response. When you set a cookie from your backend (using Response.Cookies.Add() in ASP.NET), it is translated to a Set-Cookie HTTP header. If you set a few cookies you'll have several Set-Cookie headers and Varnish will give you only the first one when you access beresp.http.set-cookie (a flaw in my opinion!). I spent a lot of time around this issue, until I found it. The solution is to use Varnish Modules, specifically the header module. After you import the module, it's easy to remove all Set-Cookie headers, except those starting with "__at" with something like:
```
header.remove(beresp.http.set-cookie,"^(?!(__at.*=))");
```
Another heavily used method in ASP.NET is Response.Redirect(). It's a very simple method that will redirect the user to a new page. Behind the scenes it's translated into a 302 HTTP response header. However if you have some logic behind, for example to redirect users to a mobile site, if they are coming from a mobile device, it will mess up with Varnish. The reason is that even a 302 response will get cached by Varnish and next user coming from desktop may get redirect to the mobile version. There are two options to solve it:

Do not cache 302 response, maybe cache only 200 response.
Do not use Reponse.Redirect() with logic behind. In the above scenario is better to redirect the users in the client side (by the way there's a excellent JS library here). I would prefer this second option.

For performance reasons, Varnish does not support HTTPS, so you can only cache content going through HTTP. So what about an e-commerce site that has both normal and secure content? If you have the secure content in a separate sub-domain, it's easy to use varnish for the public part and leave the rest unchanged. However if that's not possible, there's an alternative:

Set Varnish to listen only to HTTP protocol (port 80)

Install nginx and use it as a reverse proxy to listen only for HTTPS (port 443). Configure the SSL certificates in config file using ssl_certificate and ssl_certificate_key directives and use proxy_pass directive to pass the request to the backend server. The config file for the site will look something like:

    listen 443 ssl default_server;
    listen [::]:443 ssl default_server;
    ssl_certificate           /etc/ssl/certs/example.crt;
    ssl_certificate_key       /etc/ssl/private/example-keyfile.key;

    ssl on;
    ssl_session_cache  builtin:1000  shared:SSL:10m;
    ssl_protocols  TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
    ssl_prefer_server_ciphers on;

     server_name www.example.com;

     location / {

         proxy_set_header        Host $host;
         proxy_set_header        X-Real-IP $remote_addr;
         proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
         proxy_set_header        X-Forwarded-Proto $scheme;

         # Fix the .It appears that your reverse proxy set up is broken" error.
         proxy_pass              https://backend.example.com;
         proxy_read_timeout      90;

         proxy_redirect      https://backend.example.com https://www.example.com;
       }

If you have set a password for your key file (as you should), you have to save it a a file and use "ssl_password_file /etc/keys/global.pass" directive so nginx can use it.

After having Varnish running you'll have to check the status and maintain it. Some useful recommendations are:

Check how Varnish is performing with varnishstat. Two most important indicators are cache_hit and cache_miss. You will want to have as many hits as possible, so if your numbers don't look good, check the config file.
Another similar tool is varnishhist which will show a histogram of varnish requests. Hits are shown with "|", while misses with "#". The more "|" you have and the more in the left they are the better.
To manage varnish, there's the varnishadm tool. It has a few command, but the one most used is to check the backend status. Run "varnishadm backend.list" to check the status of all your server backends.
If you're used with checking IIS log files, the equivalent is varnishncsa tool. It will log all request and you can even customize it to include HIT/MISS details and check what URL's are getting misses and you may need to cache.

To summarize Varnish is a excellent and very fast tool. I would highly recommend it to anyone developing/managing public websites.

As a final note, Mattias Geniar templates are very useful to get started and helped me a lot. You have to check them out.

CodeProject

Recently I have moved a dozen websites and web apps to Azure. Some are small apps used by a few user with a few database tables, while some are public sites visited by tens of thousands of visitors every day with big databases (tens of GB). I've learned quite a bit during this process and below are some of the things to take into account, if you're going to do something similar:

Check carefully hard disk performance and max IOPS supported by each virtual machine. If you have a I/O intensive system, you'll need premium disks. Standard disk performance is poor, especially on basic machines (max 300 IOPS).
Bandwidth is expensive. 1 TB of outgoing traffic will cost you around 80 euro, so if you have a public website you have to use a CDN for most resources (especially images). Otherwise you're going to pay a lot just for the bandwidth.
There's no snapshot functionality for VM. You have to manually manage backups (using Windows Server Backup for example). Microsoft should have this a priority.
For databases Basic and Standard options are useless except for very small apps. If you have more than 4-5 databases look at Elastic Pool. They have a better performance and are cheaper (if you have more than a few databases).

The process itself of moving websites it's pretty standard, while moving databases it's another story. First thing you'll find out is that SQL Server for Azure does not support restore functionality! So how do you move database to Azure? If you check the official Microsoft documentation, your options are:

SSMS Migration Wizard, works for small databases as pointed out by Microsoft itself
Export/Import through BACPAC format, which is cumbersome and works only for small and medium databases. You have to export from SSMS, upload it to a blob storage (standard, not premium) and then import it from there.
Combination of BACPAC (for schema) and BCP (for data) which gets complicated.

Fortunately I didn't have to go through any of them. I have used xSQL tools for database comparison and synchronization for a few years now and they are the perfect option to migrate your databases to Azure. The process is straightforward:

Create a empty database in Azure.
Use xSQL Schema Compare, to compare the schema of your existing database with the new empty database in Azure. In comparison options you can fine tune the details. For example I do not synchronize users and logins, because I prefer to manually check the security stuff. After comparison it will generate the synchronization script that you can execute directly in Azure and your new database will have the same schema as the existing one.
Use xSQL Data Compare to compare the data. Since both databases have the same schema, it will map all tables correctly (The exception is if you have tables without a primary key, which you shouldn't! Still, if for some reason you have one, you can use custom keys to synchronize them as well) and generate the synchronization script. If the database is large, the script will be large as well, but it will take care of executing it properly. I had some databases in the range of few GB and it worked very well.

In addition to working well, this approach has another very important benefit. If you're moving medium/big websites, it's unlikely that migration will be done in one step. Most likely it will take weeks/month to complete and you have to synchronize the databases continuously. You just have to run schema compare first to move any potential schema changes and then data compare to synchronize the data. If you expect to perform these steps many times, you may even use the command line versions to automate it and sync everything with just one command.

In Overall, Azure still has some way to go, but it's already a pretty solid platform. Some things, such as lack of database restore, are surprising, but fortunately there are good alternatives out there.

Disclaimer: I've known the guys working with xSQL since a long time, but the above is a post from a happy customer. They have really great products for a very good price.

CodeProject

Using Varnish Cache with ASP.NET Websites

Migrating to Azure - Quirks & Tips