Domain Sharding revisited | High Performance Web Sites

September 5, 2013 10:41 am | 13 Comments

With the adoption of SPDY and progress on HTTP 2.0, I hear some people referring to domain sharding as a performance anti-pattern. I disagree. Sharding resources across multiple domains is a major performance win for many websites. However, there is room for debate. Domain sharding isn’t appropriate for everyone. It may hurt performance if done incorrectly. And it’s utility might be short-lived. Is it worth sharding domains? Let’s take a look.

Compelling Data

The HTTP Archive has a field call “maxDomainReqs”. The explanation requires a few sentences: Websites request resources from various domains. The average website today accesses 16 different domains as shown in the chart below. That number has risen from 12.5 a year ago. That’s not surprising given the rise in third party content (ads, widgets, analytics).

The HTTP Archive counts the number of requests made on each domain. The domain with the most requests is the “max domain” and the number of requests on that domain is the “maxDomainReqs”. The average maxDomainReqs value has risen from 47 to 50 over the past year. That’s not a huge increase, but the fact that the average number of requests on one domain is so high is startling.

50 is the average maxDomainReqs across the world’s top 300K URLs. But averages don’t tell the whole story. Using the HTTP Archive data in BigQuery and bigqueri.es, both created by Ilya Grigorik, it’s easy to find percentile values for maxDomainReqs: the 50th percentile is 39, the 90th percentile is 97, and the 95th percentile is 127 requests on a single domain.

This data shows that a majority of websites have 39 or more resources being downloaded from a single domain. Most browsers do six requests per hostname. If we evenly distribute these 39 requests across the connections, each connection must do 6+ sequential requests. Response times per request vary widely, but I use 500 ms as an optimistic estimate. If we use 500 ms as the typical responsive time, this introduces a 3000 ms long pole in the response time tent. In reality, requests are assigned to whatever connection is available, and 500 ms might not be the typical response time for your requests. But given the six-connections-per-hostname limit, 39 requests on one domain is a lot.

Wrong sharding

There are costs to domain sharding. You’ll have to modify your website to actually do the sharding. This is likely a one time cost; the infrastructure only has to be setup once. In terms of performance the biggest cost is the extra DNS lookup for each new domain. Another performance cost is the overhead of establishing each TCP connection and ramping up its congestion window size.

Despite these costs, domain sharding has great benefit for websites that need it and do it correctly. That first part is important – it doesn’t make sense to do domain sharding if your website has a low “maxDomainReqs” value. For example, if the maximum number of resources downloaded on a single domain is 6, then you shouldn’t deploy domain sharding. With only 6 requests on a single domain, most browsers are able to download all 6 in parallel. On the other hand, if you have 39 requests on a single domain, sharding is probably a good choice. So where’s the cutoff between 6 and 39? I don’t have data to answer this, but I would say 20 is a good cutoff. Other aspects of the page affect this decision. For example, if your page has a lot of other requests, then those 20 resources might not be the long pole in the tent.

The success of domain sharding can be mitigated if it’s done incorrectly. It’s important to keep these guidelines in mind.

These and other issues are explained in more detail in Chapter 11 of Even Faster Web Sites.

Short term hack?

Perhaps the strongest argument against domain sharding is that it’s unnecessary in the world of SPDY (as well as HTTP 2.0). In fact, domain sharding probably hurts performance under SPDY. SPDY supports concurrent requests (send all the request headers early) as well as request prioritization. Sharding across multiple domains diminishes these benefits. SPDY is supported by Chrome, Firefox, Opera, and IE 11. If your traffic is dominated by those browsers, you might want to skip domain sharding. On the other hand, IE 6&7 are still somewhat popular and only support 2 connections per hostname, so domain sharding is an even bigger win in those browsers.

A middle ground is to alter domain sharding depending on the client: 1 domain for browsers that support SPDY, 2 domains for non-SPDY modern browsers, 3-4 domains for IE 6-7. This makes domain sharding harder to deploy. It also lowers the cache hit rate on intermediate proxies.

There’s no need for domain sharding in the world of HTTP 2.0 across all popular browsers. Until then, there’s no silver bullet answer. But if you’re one of the websites with 39+ resources on a single hostname, domain sharding is worth exploring.