The Sea of Ideas

Introduction

You may have heard the term frames per second (fps), and that 60 fps is a really good target for everything animated. But most console games run at 30 fps, and motion pictures generally run at 24 fps, so why should you go all the way to 60 fps?

Frames… per second?

The early days of filmmaking

800px-Charlton_Heston_as_Antony_in_Julius_Caesar,_B&W_image_by_Chalmers_Butterfield

A production scene from the 1950 Hollywood film Julius Caesar starring Charlton Heston. Via Wikipedia.

When the first filmmakers started to record motion pictures, many discoveries weren’t made scientifically, but by trial and error. The first cameras and projectors were hand controlled, and in the early days analog film was very expensive – so expensive that when directors recorded motion on camera, they used the lowest acceptable frame rate for portrayed motion in order to conserve film. That threshold usually hovered at around 16 fps to 24 fps.

When sound was added to the physical film (as an audio track next to the film) and played back along with the video at the same pace, hand-controlled playback suddenly became a problem. It turns out that humans can deal with a variable frame rate, but not with a variable sound rate (where both tempo and pitch change), so filmmakers had to settle on a steady rate for both. That rate was 24 fps and, almost a hundred years later, it remains the standard for motion pictures. (In television, frame rates had to be modified slightly due to the way CRT TVs sync with the AC power frequency.)

The human eye vs. frames

But if 24 fps is barely acceptable for motion pictures, then what is the optimal frame rate? This is a trick question, as there is none.

Motion perception is the process of inferring the speed and direction of elements in a scene based on visual, vestibular and proprioceptive inputs. Although this process appears straightforward to most observers, it has proven to be a difficult problem from a computational perspective, and extraordinarily difficult to explain in terms of neural processing. – Wikipedia

The eye is not a camera. It does not see motion as a series of frames. Instead, it perceives a continuous stream of information rather than a set of discrete images. Why, then, do frames work at all?

Two important phenomena explain why we see motion when looking at quickly rotated images: persistence of vision and the phi phenomenon.

Most filmmakers think persistence of vision is the sole reason, which isn’t true; although observed but not scientifically proven, persistence of vision is the phenomenon by which an afterimage seemingly persists for approximately 40 milliseconds (ms) on the retina of the eye. This explains why we don’t see black flicker in movie theaters or (usually) on CRTs.

The Phi Phenomenon in action. Notice that even nothing is moving, it still feels that way?

The Phi Phenomenon in action. Notice that even though nothing is moving, it still feels that way?

The phi phenomenon, on the other hand, is the true reason we perceive motion when being shown individual images. It’s the optical illusion of perceiving continuous motion between separate objects viewed rapidly in succession.

Our brain is very good at helping us fake it – not perfect, but good enough. Using a series of still frames to simulate motion creates perceptual artifacts, depending largely on the frame rate. So no frame rate will ever be optimal, but we can get pretty close.

Common frame rates, from poor to perfect

To get a better idea of the absolute scale of frame rate quality, here’s an overview chart. Keep in mind that because the eye is complex and doesn’t see individual frames, none of this is hard science, merely observations by various people over time.

10-12 fps	Absolute minimum for motion portrayal. Anything below is recognized as individual images.
< 16 fps	Causes visible stutter, headaches for many.
24 fps	Minimum tolerable to perceive motion, cost efficient.
30 fps	Much better than 24 fps, but not lifelike. The standard for NTSC video, due to the AC signal frequency.
48 fps	Good, but not good enough to feel entirely real (even though Thomas Edison thought so). Also see this article.
60 fps	The sweet spot; most people won’t perceive much smoother images above 60 fps.
∞ fps	To date, science hasn’t proven or observed our theoretical limit.

Note: Even though 60 fps is observed to be a good number for smooth animations, that’s not all there is to a great picture. Contrast and sharpness can still be improved beyond that number. As an example of how sensitive our eyes are to changes in brightness, there have been scientific studies showing that test subjects can perceive a white frame between a thousand black frames. If you want to go deeper, here are a few resources.

Demo time: How does 24 fps compare to 60 fps?

Thanks to Marc Tönsing for creating this fantastic comparison.

HFR: Rewiring your brain with the help of a Hobbit

“The Hobbit” was the first popular motion picture shot at twice the standard frame rate, 48 fps, called high frame rate or HFR. Unfortunately not everyone was happy about the new look. There were multiple reasons for this, but the biggest one by far was the so-called soap opera effect.

Most people’s brains have become trained to assume that 24 fps equals expensive movies, while 50-60 half frames (interlaced TV signals) reminds us of TV productions and destroys the “film look”. A similar effect is created when enabling motion interpolation on your TV for 24p (progressive) material, disliked by many viewers (even though modern algorithms are usually pretty good at rendering smooth motion without artefacts, a common reason naysayers dismiss the feature).

Even though higher frame rates are measurably better (by making motion less jerky and fighting motion blur), there’s no easy answer on how to make them feel better. It requires retraining your brain, and while some viewers reported that everything was fine after ten minutes into “The Hobbit”, others have sworn off HFR entirely.

Cameras vs. CGI: The story of motion blur

But if 24 fps is supposedly barely tolerable, why have you never walked out of a cinema, complaining that the picture was too choppy? It turns out that video cameras have a natural feature – or bug, depending on your definition – that CGI (including CSS animation!) is missing: motion blur.

Once you see it, the lack of motion blur in video games and in software is painfully obvious. Dmitri Shuralyov has created a nifty WebGL demo that simulates motion blur. Move your mouse around quickly to see the effect.

Motion blur, as defined at Wikipedia, is

…the apparent streaking of rapidly moving objects in a still image or a sequence of images such as a movie or animation. It results when the image being recorded changes during the recording of a single frame, either due to rapid movement or long exposure.

In this case, a picture says it better.

Motion blur “cheats” by portraying a lot of motion in a single frame at the expense of detail. This is the reason a movie displayed at 24 fps looks relatively acceptable compared to a video game displayed at 24 fps.

But how is motion blur created in the first place? In the words of E&S, pioneers at using 60 fps for their mega dome screens:

When you shoot film at 24 fps, the camera only sees and records a portion of the motion in front of the lens, and the shutter closes between each exposure, allowing the camera to reposition the film for the next frame. This means that the shutter is closed between frames as long as it is open. With fast motion and action in front of the camera, the frame rate is actually too slow to keep up, so the imagery ends up being blurred in each frame (because of the exposure time).

Here’s a simplified graphic explaining the process.

Classical movie cameras use a so-called rotary disc shutter to do the job of capturing motion blur. By rotating the disc, you open the shutter for a controlled amount of time at a certain angle and, depending on that angle, you change the exposure time on the film. If the exposure time is short, less motion is captured on the frame, resulting in less motion blur; if the exposure time is long, more motion is captured on the frame, resulting in more motion blur.

The rotary disc shutter in action. Via Wikipedia.

If motion blur is such a good thing, why would a movie maker want to get rid of it? Well, by adding motion blur, you lose detail; by getting rid of it, you lose smoothness. So when directors want to shoot scenes that require a lot of detail, such as explosions with tiny particles flying through the air or complicated action scenes, they often choose a tight shutter that reduces blur and creates a crisp, stop motion-like look.

Motion Blur capture visualized. Via Wikipedia.

So why don’t we just add it?

Sadly, even though motion blur would make lower frame rates in games and on web sites much more acceptable, adding it is often simply too expensive. To recreate perfect motion blur, you would need to capture four times the number of frames of an object in motion, and then do temporal filtering or anti-aliasing (there is a great explanation by Hugo Elias here). If making a 24 fps source more acceptable requires you to render at 96 fps, you might as well just bump up the frame rate in the first place, so it’s often not an option for live content. Exceptions are video games that know the trajectory of moving objects in advance and can approximate motion blur, as well as declarative animation systems such as CSS Animations, and of course CGI films like Pixar’s.

60 Hz != 60 fps: Refresh rates and why they matter

Note: Hertz (Hz) is usually used when talking about refresh rates, while frames per second (fps) is an established term for frame-based animation. To not confuse the two, we’ll use Hz for refresh rates and fps for frame rates.

If you’ve ever wondered why Blu-Ray playback is so poor on your laptop, it’s often because the frame rate is not evenly divisible by the refresh rate (DVD’s, on the other hand, are converted before they arrive in your drive). Yes, the refresh rate and frame rate are not the same thing. Per Wikipedia, “[..] the refresh rate includes the repeated drawing of identical frames, while frame rate measures how often a video source can feed an entire frame of new data to a display.” So the frame rate describes the number of individual frames shown on screen, while the refresh rate describes the number of times the image on screen is refreshed or updated.

In the best case, refresh rate and and frame rate are in perfect sync, but there are good reasons to use three times the refresh rate of the frame rate in certain scenarios, depending on the projection system being used.

A new problem with every display

Movie projectors

Many people think that movie projectors work by rolling a film past a light source. But if that were the case, we would only see a continuous blurry image. Instead, as with capturing the film in the first place, a shutter is used to project separate frames. After a frame is shown, the shutter is closed and no light is let through while the film moves, after which the shutter is opened to show the next frame, and the process repeats.

A movie projector’s shutter in action. From Wikipedia.

That isn’t the whole picture, though. Sure, this process would show you a movie, but the flicker caused by the screen being black half the time would drive you crazy. This “black out” between the frames is what would destroy the illusion. To compensate for this issue, movie projectors actually close the shutter twice or even three times during a single frame.

Of course, this seems completely counter-intuitive – why would adding more flicker feel like less flicker? The answer is that it reduces the “black out” period, which has a disproportionate effect on the vision system. The flicker fusion threshold (closely related to persistence of vision) describes the effect that these black out periods have. At roughly ~45Hz “black out” periods need to be less than ~60% of the frame time, which is why the double shutter method for movies works. At above 60Hz the “black out” period can be over 90% of the frame time (needed by displays like CRTs).The full concept is subtly more complicated, but as a rule of thumb, here’s how to prevent the perception of flicker.

Use a different display type that doesn’t flicker due to no “black out” between frames – that is, always keep a frame on display
Have constant, non-variable black phases that modulate at less than 16 ms

Flickery CRTs

CRT monitors and TVs work by shooting electrons onto a fluorescent screen containing low persistence phosphor. How low is the persistence? So low that you never actually see a full image! Instead, while the electron scan is running, the lit-up phosphor loses its intensity in less than 50 microseconds – that’s 0.05 milliseconds! By comparison, a full frame on your Android or iPhone is shown for 16.67ms.

The refresh scan, captured at a 1/3000 second exposure. From Wikipedia.

So the whole reason that CRTs work in the first place is persistence of vision. Because of the long black phase between light samples, CRTs are often perceived to flicker – especially in PAL, which operates at 50 Hz, versus NTSC, which operates at 60 Hz, right where the flicker fusion threshold kicks in.

To make matters even more complicated, the eye doesn’t perceive flicker equally in every corner. In fact, peripheral vision, though much blurrier than direct vision, is more sensitive to brightness and has a significantly faster response time. This was likely very useful in the caveman days for detecting wild animals leaping out from the side to eat you, but it causes plenty of headaches when watching movies on a CRT up close or from an odd angle.

Blurry LCDs

Liquid Crystal Displays (LCDs), categorized as a sample-and-hold type display, are pretty amazing because they don’t have any black phases in the first place. The current image just stays up until the display is given a new image.

Let me repeat that: There is no refresh-induced flicker with LCDs, no matter how low the refresh rate.

But now you’re thinking, “Wait – I’ve been shopping for TVs recently and every manufacturer promoted the hell out of a better refresh rate!” And while a large part of it is surely marketing, higher refresh rates with LCDs do solve a problem – just not the one you’re thinking of.

Eye induced motion blur

LCD manufacturers implement higher and higher refresh rates because of display or eye-induced motion blur. That’s right; not only can a camera record motion blur, but your eyes can as well! Before explaining how this happens, here are two mind blowing demos that help you experience the effect (click the image).

In this first experiment, focusing your vision onto the unmoving flying alien at the top allows you to clearly see the white lines, while focusing on the moving alien at the bottom magically makes the white lines disappear. In the words of the Blur Busters website, “Your eye tracking causes the vertical lines in each refresh to be blurred into thicker lines, filling the black gaps. Short-persistence displays (such as CRT or LightBoost) eliminate this motion blur, so this motion test looks different on those displays.”

In fact, the effect of our eyes tracking certain objects can’t ever be fully prevented, and is often such a big problem with movies and film productions that there are people whose whole job is to predict what the eye will be tracking in a scene and to make sure that there is nothing else to disturb it.

In the second experiment, the folks at Blur Busters try to recreate the effect of an LCD display vs. short-persistence displays by simply inserting black frames between display frames and, amazingly, it works.

As illustrated earlier, motion blur can either be a blessing or a curse – it sacrifices sharpness for smoothness, and the blurring added by your eyes is never desirable. So why is motion blur such a big issue with LCDs compared to CRTs that do not have such issues? Here’s an explanation of what happens if a frame that has been captured in a short amount of time is held on screen longer than expected.

The following quote from Dave Marsh’s great MSDN article about temporal rate conversion is amazingly accurate and timely for a sixteen-year-old article:

When a pixel is addressed, it is loaded with a value and stays at that light output value until it is next addressed. From an image portrayal point of view, this is the wrong thing to do. The sample of the original scene is only valid for an instant in time. After that instant, the objects in the scene will have moved to different places. It is not valid to try to hold the images of the objects at a fixed position until the next sample comes along that portrays the object as having instantly jumped to a completely different place.

And, his conclusion:

Your eye tracking will be trying to smoothly follow the movement of the object of interest and the display will be holding it in a fixed position for the whole frame. The result will inevitably be a blurred image of the moving object.

Yikes! So what you want to do is flash a sample onto the retina and then let your eye, in combination with your brain, do the motion interpolation.

Extra: So how much does our brain interpolate, really?

Nobody knows for sure, but it’s clear that there are plenty of areas where your brain helps to create the final image shown to your brain. Take this wicked blind spot test as example: Turns out there’s a blind spot right where the optic nerve head on the retina is, a spot that’s supposed to be black but gets filled in which interpolated interpolation by our brain.

Frames and screen refreshes are not mix and match!

As mentioned earlier, there are problems when the refresh rate and frame rate are not in sync; that is, when the refresh rate is not evenly divisible by the frame rate.

Problem: Screen tearing

What happens when your movie or app begins to draw a new frame to the screen, and the screen is in the middle of a refresh? It literally tears the frame apart (see it on video).

Here’s what happens behind the scenes. Your CPU/GPU does some processing to compose a frame, then submits it to a buffer that must wait for a monitor to trigger a refresh through the driver stack. The monitor then reads the pending frame and starts to display it (you need double buffering here so that there is always one image being presented and one being composited). Tearing happens when the buffer that’s currently being drawn by the monitor from top to bottom gets swapped by the graphics card with the next frame pending consumption. The result is that the top half of your screen is from frame A (the frame that is drawn too early before the refresh), while the bottom half is from frame B (the frame that was drawn before frame A).

Note: To be precise, screen tearing can occur even when both refresh rate and frame rate match! They need to match both phase and frequency.

That’s clearly not what we want. Fortunately, there is a solution!

Solution: Vsync

Screen tearing can be eliminated through Vsync, short for vertical synchronization. It’s a feature of either hardware or software that ensures that tearing doesn’t happen – that your software can only draw a new frame when the previous refresh is complete. Vsync throttles the consume-vs.-display frequency of the above process so that the image being presented doesn’t change in the middle of the screen.

Thus, if the new frame isn’t ready to be drawn in the next screen refresh, the screen simply recycles the previous frame and draws it again. This, unfortunately, leads to the next problem.

New problem: Jitter

Even though our frames are now at least not torn, the playback is still far from smooth. This time, the reason is an issue that is so problematic that every industry has been giving it new names: judder, jitter, stutter, jank, or hitching. Let’s settle on “jitter”.

Jitter happens when an animation is played at a different frame rate than the rate at which it was captured (or supposed to play). Often, this means Jitter happens when the playback rate is unsteady or variable, rather than fixed (as most content is recorded at fixed rates). Unfortunately, this is exactly what happens when trying to display, for example, 24 fps on a screen with 60 refreshes per second. Every once in a while, because 60 cannot be evenly divided by 24, a single frame must be presented twice (when not utilizing more advanced conversions), disrupting smooth effects such as camera pans.

In games and websites with lots of animation, this is even more apparent. Many can’t keep their animation at a constant, divisible frame rate. Instead, they have high variability due to reasons such as separate graphic layers running independently of each other, processing user input, and so on. This might shock you, but an animation that is capped at 30 fps looks much, much better than the same animation varying between 40 fps and 50 fps.

You don’t have to believe me; experience it with your own eyes. Here’s a powerful microstutter demo.

Fighting jitter

During conversion: Telecine

Telecine describes the process of converting motion picture film to video. Expensive professional converters such as those used by TV stations do this mostly through a process called motion vector steering that can create very convincing new fill frames, but two other methods are still common.

Speed up

When trying to convert from 24 fps to a PAL signal at 25 fps (e.g., TV or video in the UK), a common practice is to simply speed up the original video by 1/25th of a second. So if you’ve ever wondered why “Ghostbusters” in Europe is a couple of minutes shorter, that’s why. While this method often works surprisingly well for video, it’s terrible for audio. How bad can 1/25th of a second realistically be without an additional pitch change, you ask? Almost a half-tone bad.

Take this real example of a major fail. When Warner released the extended Blu-Ray collection of Lord of the Rings in Germany, they reused an already PAL-corrected sound master for the German dub, which was sped up by 1/25th, then pitched down to correct the change. But because Blu-Ray is 24 fps, they had to convert it back, so they slowed it down again. Of course, it’s a bad idea to do such a two-fold conversion anyway, as it is a lossy process, but even worse, when slowing it down again to match the Blu-Ray video, they forgot to change the pitch back, so every actor in the movie suddenly sounded super depressing, speaking a half-tone lower. Yes, this actually happened, and yes, there was fan outrage, lots of tears, lots of bad copies, and lots of money wasted on a large media recall.

The moral of the story: speed change is not a great idea.

Pulldown

Converting movie material to NTSC, the US standard for television, isn’t as simple as speeding up the movie, because changing 24 fps to 29.97 fps would mean a 24.875% speed up. Unless you really love chipmunks, this may be not the best option.

Instead, a process called 3:2 pulldown was invented (among others), which became the most popular way of conversion. It’s the process of taking 4 original frames and converting them to 10 interlaced half-frames, or 5 full frames. Here’s a picture describing the process.

2:3 Pull down in action. From Wikipedia.

3:2 Pulldown in action. From Wikipedia.

On an interlaced screen (i.e. a CRT), the video fields in the middle are shown in tandem, each of them interlaced, and so are made up of only every second row of pixels. The original frame, A, is split into two half frames that are both shown on screen. The next frame, B, is also split but the odd video field is shown twice, so it’s distributed across 3 half frames and, in total, we arrive at 10 distributed half frames for the 4 original full frames.

This works fairly well when portrayed on an interlaced screen (such as a CRT TV) at roughly 60 video fields (practically half frames, every odd or even row blank), as you never see two of these together at once. But it can look terrible on displays that don’t support half frames and must composite them together again into 30 full frames, as in the row at the far right of the picture. This is because every 3rd and 4th frame is stitched together from two different original frames, resulting in what I call a “Frankenframe”. This looks especially bad with fast motion, when the difference between the two original frames is significant.

So pulldown sounds nifty, but it isn’t a general solution either. Then what is? Is there really no holy grail? It turns out there is, and the solution is deceptively simple!

During display: G-Sync, Freesync and capping

NVIDIA_GSync-24.1

Much better than trying to work around a fixed refresh rate is, of course, a variable refresh rate that is always in sync with the frame rate, and that’s exactly what Nvidia’s G-Sync technology and AMD’s Freesync do. G-Sync is a module built into monitors that allows them to synchronize to the output of the GPU instead of synchronizing the GPU to the monitor, while Freesync achieves the same without a module. It’s truly groundbreaking and eliminates the need for telecine, and it makes anything with a variable frame rate, such as games and web animations, look so much smoother.

Unfortunately, both G-Sync and Freesync are still fairly new technologies and not yet widely deployed on consumer devices, so if you’re a developer doing animations on websites or apps and can’t afford the full 60 fps, your best bet is to cap the frame rate so that it is evenly divisible by the refresh rate – in almost every case, that cap is 30 fps.

Conclusion & actionable follow-ups

So how do we achieve a decent balance among our desired effects – minimal motion blur, minimal flickering, constant frame rates, great portrayal of motion, and great compatibility with all displays – without taxing the screen and GPUs too much? Yes, super high frame rates could reduce motion blur further, but at a great cost. The answer is clear and, after reading this article, you should know what it is: 60 fps.

Now that you are wiser, go do your best at running all of your animated content at 60 fps.

a) If you’re a web developer

Head over to jankfree.org, where members of the Chrome team are collecting the best resources on how to get all of your apps and animations silky smooth. If you only have time for one article, make it Paul Lewis’s excellent runtime performance checklist.

b) If you’re an Android developer

Check out Best Practices for Performance in our official Android Training pages, where we summarize the most important factors, bottlenecks, and optimization tricks for you.

c) If you work in the film industry

Record all of your content at 60 fps or, even better, at 120 fps, so you can scale down to 60 fps, 30 fps and 24 fps when needed (sadly, to also support PAL’s 50 fps and 25 fps, you’ll need to drive it up to 600 fps). Display all your content at 60 fps and don’t apologize for the soap opera effect. This revolution will take time, but it will work.

d) For everyone else

Demand 60 fps whenever you see moving images on the screen, and when someone asks why, direct them to this article.

Important: If this article influenced you and your business decisions in a positive way, I would love to hear from you.

Let’s all work together for a silky smooth future!