Wednesday, September 16, 2009

Caching user-supplied images in the browser

We have been engaged in a big project for the last months. Despite of the nice performance marks, there was a warning in Google PageSpeed that was itching us badly: "The following resources are missing a cache expiration".
It's all about speed

PageSpeed was complaining about images uploaded by the user. Resources referenced from CSS are conveniently handled by Loom, but user-provided files were not setting their cache headers appropriately.

The following is, to the best of my knowledge, the same algorithm implemented by blogger and wordpress to avoid delivering the same image file over and over again. The theory has been already covered here.

I will be using Loom in this example, but you can reproduce this using any web framework.

Choosing your Strong Cache Validator

Cache Validators are introduced in the http 1.1 spec. Here we will use a Strong Cache Validator to indicate that a resource should never expire. You should include both validator and resource ID in the URL, like "/resources/{id}/{cacheValue}".

The most well-known candidates are:
  • MD5: The file checksum can be calculated when the file is saved or when the application starts up. This is how Loom (and others) set the cache headers for CSS and javascript files.
  • Version numbers: any @Version field can be used to identify persistent JPA entities.
In this case we will be using PersistentFile, which includes both fields. After trying both, we found that using @Version yields much shorter URLs ("/resources/151/2" instead of "/resources/151/3b4efe6405d4fb1ada4a081f4dbef6a9").

public class PersistentFile {

/** the primary key value */
@Id @GeneratedValue private Integer id;

/** the MD5 hash of this file */
@Column(length=32) private String MD5;

/** the revision number */
@Version private Integer version;

/** the last update timestamp */
private Date lastModified;

/** ... etc ... */


This field will be used to set "Expires" and "Cache-Control" headers to expire ten years in the future. As long as the user is clicking links to browse normally these resources should never be fetched again, but it will be ignored if the user clicks the "Refresh" button, which is where Weak Cache Validation comes into place.

Weak Cache Validator

With Weak Cache Validation the browser must ask for the resource, giving the server a chance to answer with a 304 (not modified) or 200 (OK) response.

For this purpose we will use the "Last-Modified" header. The first request by the browser (the empty cache experience or any "Shift + Refresh" click) will be answered by:

On a browser refresh the server will return a 304 response based on the value of the If-Modified-Since header:

The big picture

During the sample navigation to the homepage, images have run down from this:

59 image requests = 544 KB (492 KB from cache)

To this:

0 image requests = 0 KB

Saving bandwidth is great, but keep in mind that we are doing it to reduce response time. In this case, 2-3 seconds have been reduced to 0.3 just by introducing this change.


public class FileAction extends AbstractAction {

@Autowired private FileManager fileManager;

/** the id of the file */
private Integer id;

/** the version number of the resource */
private Integer version;

/** the retrieved file */
private PersistentFile pfile;

* This method will be invoked before getFile(). If the current request includes a If-Modified-Since
* header that matches the lastModified attribute of the file, a 304 (NOT MODIFIED) response will be
* automatically sent, and getFile() will not be invoked.
public CacheControl getCacheControl() {
pfile = fileManager.find(id);
CacheControl c = new CacheControl();
if (pfile.getVersion().equals(version)) {
return c;

@GET @Path("/{id}/{version}")
public Resolution getFile() {
if (pfile.getVersion().equals(version)) {
return send(pfile);
} else {
// if the version number differs, send a 301 redirect
return redirect(FileAction.class, "getFile")
.addParameter("id", pfile.getId())
.addParameter("version", pfile.getVersion())

CacheControl is the class that does all the thinking. It sets the http headers and decides whether to return a 304 or 200 response.

We will be in London on Sept 23rd

We are quite excited that Loom has been selected finalist of the ACCESS-IT awards in the web 2.0 category. We will be in London on September 22 and 23, which is my first trip there. Should be fun!


  1. Enhorabuena por la nominación, finalistas y seguro que ganadores....

    Joder, como me gusta que los españoles estemos pisando fuerte por el mundo.

    Nacho y el resto de componentes del equipo teneis todo mi apoyo.

    Pasarlo bien en Londres.


  2. Ufff, pues si que voy bien yo con las fechas.....jejeje

    Es que voy algo retrasado con las lecturas de los Feeds....jejeje

    Igualmente sabeis que va con cariño y todo mi apoyo.

    Y los abrazos son sinceros.

  3. Gracias igualmente Jorge ^_^

    No es tanto que tú vayas retrasado con los feeds como que yo me he despistado de escribir hace ya tiempo. Al final fue un evento realmente interesante y quería escribir sobre él, pero me he liado con una mezcla a partes iguales de Ajax y Cloud Computing en dos proyectos a la vez.

    Aaaargh. Enero será mejor.


Something on your mind?

Note: Only a member of this blog may post a comment.