CSS Testing with PhantomCSS, PhantomJS, CasperJS and Grunt

A new and exciting area of Front-end Development is regression testing. I know, I know… Testing is exciting? Well in a field that has had no formal testing practices, and is constantly plagued by unexpected regressions, the opportunity to write effective tests is incredibly welcome!

Screenshot 2015-03-06 14.53.10

Now we’re not talking Test Driven Design or anything, but one form of CSS Testing, called Visual Regression Testing, allows us to make visual comparisons between the correct (baseline) versions of our site and versions in development or just about to be deployed  (new). The process is as simple as taking a screenshot of each page and comparing the pixels to find differences (diff)… Okay, it does get a bit more complicated, but that’s where it gets fun.

At Redhat.com we’ve been developing a robust end to end solution for using Visual Regression Testing and I wanted to take a little time to lay out the tools and processes that we are using. It’s still a work in progress, so in one sense I want to share this so others may benefit, but as we continue to hone this system, any feedback is extremely welcome!

The Testing Tools

We start things off by picking our tools. We decided to use PhantomCSS as our testing tool. It is a powerful combination of 3 different tools:

  1. PhantomJS is a headless Webkit browser that allows you to quickly render web pages, and most importantly, take screenshots of them
  2. CasperJS is a navigation and scripting tool that allows us to interact with the page rendered by PhantomJS. We are able to move the mouse, perform clicks, enter text into fields and even perform javascript functions directly in the DOM
  3. ResembleJS is a comparison engine that can compare two images and determining if there are any pixel differences between them

We also wanted to automate the entire process, so we pulled PhantomCSS into Grunt and set up a few custom grunt commands to test all, or just part of our test suite.

If you want to follow along, you can find all of this code on github.

Getting Grunt Set Up

Now before you run off and download the first Grunt PhantomCSS you find on Google, I’ll have to warn you that it is awfully stale. Sadly someone grabbed the prime name-space and then just totally disappeared….like not even a tweet for the past 2 years. This has lead to a few people taking it upon themselves to  continue on with the codebase, merging in existing pull requests and keeping things current. One of the better ones is maintained by Anselm Hannemann and can be found here.

First you’ll need to import Anselm’s Grunt PhantomCSS project into your Grunt project. He doesn’t have a NPM namespace picked out yet, so we’ll just be pulling it in directly from Github:

With that installed we need to do the typical Grunt things like loading the task in the Grunfile.js

Then set a few options for PhantomCSS, also in the Gruntfile.js.  Most of these are just default:

Our Test File

Next, in our phantomcss.js file, this is where Casper.js kicks in. PhantomCSS is going to spin up a PhantomJS web browser, but it is up to Casper.js to navigate to a web page and perform all of the various actions needed.

We decided the best place to test our components would be inside of our styleguide. It shared the same CSS with our live site, and it was a consistent target that we could count on not to change from day to date. So we start off by having casper navigate us to that URL.

After starting up casper at the correct page, we use Javascript method chaining to string together a list of all the screen shots we need to take.  First we target the  .cta-link and take a screen shot. We aptly call it “cta-link”. That will be its base file name in the baselines folder.

Screenshot 2015-03-10 16.29.12

Next we need to test our button to make sure it behaves like we’d expect when we hover over it. We can use Casper.js to actually move the cursor inside of PhantomJS so that when we take our next screenshot, and call it cta-link-hover, we get the following:

Screenshot 2015-03-10 16.33.16

Making A Comparison

With those baselines in place we are now able to run the test over and over again. Normally images created by the new tests will be identical to the baseline images and everything will pass. But if something were to change…..say someone accidentally added the following to their css while they were working on some other feature:

The next time we ran our comparison tests we’d get the following:

Screenshot 2015-03-10 16.46.11

As expected, the change from uppercase to lowercase created a failure. Not only was the text different, but the button ended up being shorter. The 3rd “fail” image shows us which pixels were different between the two images.

Running The Entire Suite

Screenshot 2015-03-11 08.29.29After doing this for each component (or feature) we want to test in our styleguide, we can run   $ grunt phantomcss  and it will do the following:

  1. Spin up a PhantomJS browser
  2. Use CasperJS to navigate to a page in our styleguide
  3. Take a screen shot of a single component on that page
  4. Interact with the page: click the mobile nav, hover over links, fill out a form, submit the form etc
  5. Take screen shots of every one of those states
  6. Compare all of those screenshots with baseline images we captured and committed when the component was created
  7. Report if all images are the same (PASS!) or if there is an image that has changed (FAIL!)
  8. Repeat this process for every component and layout in our library.

What Do We Do With Failing Tests?

Obviously if your feature branch is concerned with changing the physical appearance of a component (adjusting font size or background color etc…) you are going to get failing tests. The point is that you should ONLY be getting failing tests on the component you were working on. If you are trying to update the cta-link and you get failing tests on your cta-link AND your pagination component, one of two things happened:

  1. You changed something you shouldn’t have. Maybe your changes were too global, or you fat fingered some keystrokes in the wrong partial. Either way, find out what changed on the pagination component, and fix it.
  2. On the other hand, you might determine that the changes you made to the cta-link SHOULD have affected the pagination too. Perhaps they share the same button mixin, and for brand constancy they should be using the same button styles. At this point you’d need to head back to the story owner/designer/person-who-makes-decisions-about-these-things and ask them if they meant for these changes to apply to both components, and you act accordingly.

failures

Regardless of the reasons you might have had some false positives, you will still be left with a ‘failing’ test because the old baseline is no longer correct. In this case, just delete the old baselines and commit the new ones. If this new look is the brand approved one, then these new baselines needs to be committed with your feature branch code so that once your code is merged in, it doesn’t cause failures when others run the test.

Screenshot 2015-03-11 08.54.30

The magic of this approach is that at any given time, every single component in your entire system has a “gold standard” image that the current state can be compared to. This also means that this test suite can be run at anytime, in any branch, and should always pass without a single failure. So, as we do….we push this to the cloud.

To the cloud!

At Red Hat we already have a large suite of Behavior Tests that are automatically by Jenkins CI before any code gets merged into master, or deployed to production. These tests insure that our application is working. We can create nodes, edit them, display them so on and so forth.  Since these tests are performed by Jenkins,  we have a single, consistent environment to run the tests. No more “but it worked on my machine”. These environments mirror those used in production, so we rarely have surprises once our passing code goes live.

Using a similar methodology we plan on using Jenkins to run our suite of Visual Regression Tests before any code is merged, or any code is pushed to production.  This means that a developer’s merge request must pass all PhantomCSS tests before the code is accepted. Now if you recall, there are 2 ways that developers can deal with failing tests. They can either fix the code that is causing the problem, or if it was an expected change, they can commit a new baseline for the changed component. So as long as you are doing sufficient merge reviews, and keeping an eye out for unexpected new baselines, there is no way for code to sneak through that breaks the visual integrity of the design system.

Making it Our Own

I started to work with Anselm’s code at the beginning of the Red Hat project and found that it met 90% of our needs, but it was that last 10% that I really needed to make our workflow come together.  So as any self respecting developer does, I forked it and started in on some modifications that make it fit our specific implementations. Let me walk you through some of those changes that I have in my branch labeled alt-runner

Place Baselines in Component Folder

Screenshot 2015-03-06 14.11.58One thing important to us was good encapsulation. We put everything in the component folder….i mean everything!

Because of this,  we also really wanted our baseline images to stay inside of the component folder. This would make it easier to find the baselines for each component, and when a merge request contains new baselines, the images are in the same folder as the code change that made them necessary.

The default behavior of Grunt PhantomCSS is to place all of the baselines, for every test in our system, into the same folder. And then when we ran our regression tests, all of those images were placed into a single results folder.  With dozens of different components in our system, each with up to a dozen different tests, this process just didn’t scale. So one of the key changes I made in the alt-runner was to put baseline images into a folder called baseline right next to each individual test file.

Run Each Component Test Suite Individually

Secondly,  I changed the test behavior to test each component individually, instead of all together. Instead of running all 100+ tests and telling me if the build passed or failed I now get a pass/fail for every component.

tests

OR

newfailure

Test Portability

The last change I made is that I wanted my tests to be more portable. Instead of a single test file, we had broken our tests up into dozens of different tests files that Grunt pulled in when it ran the test. The original implementation required that the first test file start with casper.start('http://mysite.com/page1')  and all subsequent files start   casper.thenOpen('http://mysite.com/page2') .  This become problematic because the order in which Grunt chose to run these files was based on alphabetical order. So as soon I added a test starting with a letter 1 earlier in the alphabet than my current starting test, my test suite broke!

The fix was relatively easy as I just needed to call casper.start  as soon as Grunt initiates the task, and then all of the tests can start with casper.thenOpen without any problems.

Where to next?

We’re in a bit of a holding pattern until we get full Jenkins integration set up, and I’m also anxious for PhantomJS 2.0 to make it down the line. The 1.X version is sadly a bit out of date with modern browsers and is missing important features like flex-box that make the tests less valuable than they could be. Therefore we are currently just writing our test, making sure we have sufficient coverage, and working out the kinks in the NPM module and our workflow.

So if you feel like diving in, please feel free to kick the tires a bit and toss up any issues or pull-requests into the alt-runner branch. The branch is a bit raw, and needs a lot of documentation, so any help is appreciated! I’m hoping to clean it up as much as I can to either get a pull request ready for Anselm’s fork, or publish it as a separate NPM module.

Visual Regression Testing is a really hot area right now. New tools are being built all the time, but the real challenge is finding ways to work those tools into our current workflows. So go out, experiment, try them all, and write about it! I’d love to read about your own experiences.

Don’t miss any future front-end thought leadership and news! Sign up for the Phase2 newsletter!