There are reasons in favor of each of these approches:
|Package per page|| |
|Separate JS files|| || |
|Hybrid|| || |
Performance got worse.
On investigation we found out there were two reasons for this:
The sections below explore these issues in more detail.
|Size||Packages||Individual files||Pct difference|
|uncompressed JS||2,421,176 bytes||2,282,839 bytes||-5.7%|
|compressed JS||646,806 bytes||662,754 bytes||+2.5%|
|number of files||28 files||296 files||(+921%)|
On reflection, this is no surprise: due to how zlib operates, using a sliding window of previous text to guide its compression, it does much better on big files than small ones. In particular, it will always compress 100 1K files worse (in aggregate) than the single 100K file you get by concatenating them all together.
(More details: at a high level, zlib compresses like this: it goes through a document, and for every sequence of text it's looking at, it sees if that text had occurred previously in the document. If so, it replaces that sequence of text by a (space-efficient) pointer to that previous occurrence. It stands to reason that the further along in the document it goes, the more "previous text" there is for finding a potential match and thus an opportunity for compression.
This discussion omits some details, like the limited size of the sliding window, that do not affect the overall conclusion. For more details on zlib, and the LZ77 algorithm it implements, see Wikipedia.)
zlib actually has a mechanism built in for improving compression in the case of many small files: you can specify a "preset dictionary", which is just a big string of bytes. Basically, when compressing with a preset dictionary, you can replace text either with a pointer to earlier in the document, or into the preset dictionary. With a preset dictionary, early parts of the document have more opportunities to find a good pointer match.
This takes time, of course, and really only works well if it's supported at the protocol layer. That said, for cases like this it would be a significant net win overall. But it likely wouldn't be easy to augment the HTTP/2 spec to allow for something like this in a safe way!
However, by analyzing HAR files we could see the effect plainly:
These tests were done on a recent Chrome browser. It's possible other browsers would have different effects. And the test was emulating a super-fast FiOS connection; you can see that all the time is taken in the green part of the bars (time to first byte) and not the blue part (time to download the full file).
Furthermore, reloading the page gave HAR files looking substantially different each time. But the end result was the same: a page that had much more latency than when using packages.
When we stuck with a relatively small number of packages, the waterfall was consistent and reliable (and much shorter!):
The time-to-first byte is still longer than we would like, for reasons we are not entirely sure of, but it's much more consistent than in the individual-source-file case!
We are wondering that too. The page in question is the Khan Academy homepage for logged in users, and it's acquired a lot of, um, functionality over the years.