To many Drupal developers, Varnish is a black magic. When it works, we marvel at its speed; when it’s broken, we break out in cold sweat. Recently, a client of ours needed to figure out why their homepage was not being cached by Varnish, and I learned good lessons debugging the issue. I hope you can learn from me and avoid my mistakes in the future!

Digging in

First, the stack is Varnish 3 in front of Drupal 7 on Apache httpd, and I have SSH access to test and production environments.Their Varnish config (“VCL”) is not-too-crazy, so I started by adding debugging statements to it on the test site. My hunch was that a cookie got passed from the backend, causing Varnish to register a pass (a cache miss, in Varnish parlance). That backend response is passed through Varnish’s vcl_fetch() method, so I dumped the cookie(s) being set:

sub vcl_fetch {
...
    std.log(beresp.http.Set-Cookie);
    ...
}

Next, I used varnishlog to show the dumped values, and filter the results by tag to avoid “drinking from the firehose”:

$ varnishlog -c -i VCL_Log

Headers, anyone?

This showed me that, on the homepage, no cookies were being set. There goes my theory! Next, I knew that the time-to-live (TTL) for a backend response can control whether the page gets cached. Per Varnish docs, the TTL may be set from “Cache-Control” header. Indeed, the response header for homepage did instruct Varnish not to cache the response:

Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0

Hmm, on non-homepage responses, I got a different value:

Cache-Control: public, max-age=86400

Finally, something helpful after hours of hair-pulling! The next step was to determine why “no-cache” directive was set on homepage, but not others. Maybe Varnish isn’t the culprit?

Assuming Drupal was setting the cache-control header, I used Devel module’s dd() to dump headers sent to drupal_add_http_header(). At first, I was confused because all pages sent the “no-cache” cache-control header. To make it even more confusing, however, I next saw that cached pages later replaced that header with “public, max-age=86400”.

Found it!

A co-worker pointed me towards drupal_page_is_cacheable(), which modules can call to set the cacheability of a specific page. For example, Drupal core’s drupal_set_message() disables page-caching when a message (notice, error, etc.) is set. However, no callers of drupal_page_is_cacheable() seemed to be the offender. 

Further down the stack, I found drupal_page_footer(). This one checks the global page-caching option: variable_get(‘cache’) (I verified that variable is enabled at the core performance settings page). The strangest part of this whole debacle is that check returns FALSE for homepage, but TRUE on other pages.

Great. Just great. Something’s changing the global $conf array. I grepped for anything changing “cache” key in $conf or $GLOBALS:

$ egrep -rn '\$GLOBALS|\$conf' sites/all/modules | grep "'cache'"

That implicated Honeypot module. Turns out that when an anonymous user visits a page, and that page has a form which is protected by Honeypot’s time-based protection, page caching for that page is disabled. The homepage did have a form using that protection. To fix it, I implemented hook_honeypot_form_protections_alter() to disable time-based protections on homepage. 

Don’t get me wrong -- I love that module, and its functionality in this regard makes sense, but this was a pain to track down! 

Lessons learned

  • Don’t assume Varnish is the problem.
  • Learn to love cache-control header.
  • Avoid changing global $conf at all costs.

Resources

Additional Resources

Top Drupal 7 Modules: Winter 2014 Edition | Mediacurrent Blog Post
Building Multilingual Websites in Drupal 7 | Mediacurrent Blog Post
Dynamic Data in Quick Tabs | Mediacurrent Blog Post