How it works: the camera autofocus

There are many autofocus mechanisms, but we can classify them roughly in two types: phase autofocus and contrast autofocus. Most autofocus systems and even manual-focus systems follow the basic principles of either type.

Phase detection autofocus

This is the autofocus system used in SLR/DSLR ("professional") cameras. Manual-focus systems like the rangefinder (Leica), split image (SLRs, Leica), microprisms (SLRs) share some fundamentals, that is, all of them work due to the same properties.

The fundamentals begin with the pinhole camera, where any image is always in focus. Sharpness is only a function of the pinhole diameter: smaller hole, higher sharpness until the diffraction limit is met.

Figure 1: Pinhole camera

Now we can imagine two pinhole cameras side-by-side. Both will take the same picture if they are "looking" to the same direction and the scene subject is at "infinity", that is, the camera-subject distance is much higer than camera-camera distance.

Figure 2: Two pinhole cameras seeing a distant scene

But, if the subject is relatively near, each pinhole camera will make a different picture, since the lightrays emanating from the subject are no longer straight, they reach the pinhole in angle.

Figure 3: Two pinhole cameras seeing a nearby scene

If we replace the pinhole by a lens, and the lens is correctly focused, we will get the same picture that would be generated by a pinhole, only sharper and needing less exposure time.

Figure 4: Focused camera

On the other hand, an unfocused lens does not take a sharp picture; it will generate a blurred image of the subject.

Figure 5: Unfocused camera

Actually, every fragment of the lens does generate a sharp image, but every one of these images has a small offset (left to right, or top to bottom), and the sum of all images is blurry.

If we want to recover a sharp image from an unfocused lens, we can again use a pinhole, putting it behind the lens. We can even use more than one pinhole.

Figure 6: Unfocused camera, with two pinholes behind the lens

In the diagram above, two pinholes behind the unfocused lens will project two sharp subjects over the sensor or film. The images may overlap (the more unfocused is the lens, the bigger is the separation).

When the lens is correctly focused, two or more pinholes will not generate separate images. The projected image is still unique, and the pinholes are only restricting the luminosity.

Figure 7: Focused camera with two pinholes behind the lens

This is the basic principle of phase detection autofocus: compare images generated by different parts of a lens. If the images are offset, the lens is not in focus. When images are exactly the same, the lens is in focus.

That's also how the prismatic and/or split-image viewfinder, found in older SLRs, can work. The viewfinder's ground glass is carved in a way that every microprism, or each half of a split-image circle, can only see the image that is coming from a part of the lens.

The smooth area of a ground glass "sees" the image from the whole lens, so it generates a blurry image when the lens is not in focus. Let's remember again that a blurry image is the result of many sharp images together, coming from every small part of the lens, but every image is slightly offset.

The Leica's rangefinder works similarly, but Leica system has an actual second lens, apart from the main viewfinder lens, some inches to the right. The idea is the same — make the two images to overlap so the focus is perfect — but this system is more precise for manual focus, since the separation between viewfinder and rangefinder is much larger than the diameter of a single lens.

The human vision system also works, in part, as a rangefinder, since we have two eyes, and therefore our vision always "knows" the distance of every subject in view.

Now, let's see how the automatic autofocus can work. The principle is the same described up to now: gather the image generated by extreme sides of the lens, using some kind of aperture, like a pinhole or a prism. The autofocus sensor does not analyse the whole image; it handles just a one-dimentional "strip" of the image.

Figure 8: Image one-dimentional strips, as seen by autofocus sensors

When the lens is focused, the image strips seen by the two sensors will be the same. If the lens if unfocused, the strips will be different. But not completely different; one is the offset of the other.

Figure 9: Image strips, and the phase difference caused by lack of focus

Once the two strips have been captured, the camera's computer will analyse them and find the correlation between them. In the example above, we can see that strips partially equal, and offset. The easiest way to find this out is looking to the colors. Indeed, a good DSLR uses RGB (color-sensitive) autofocus sensors to find this offset.

Once the offset or phase of the strips is determined, all that's left is to change lens focus in the same proportion, and the focus should be perfect, right at the first try. This is what makes phase detection so fast: the method can find "how much" the focus needs to be adjusted.

For the autofocus to work, the image must have some sort of texture or detail, so the sampled strips are actually different when the lens is out of focus.

The simplest autofocus implementations sample vertical strips of the image. Almost all DSLRs have at least one cross focus point that analyzes vertical and horizontal strips, increasing the chances of achieving focus on subjects that lack texture in one dimention.

Contrast-based autofocus

Contrast-based autofocus is the poor cousin. The algorithm is the simplest possible: tweak lens focus back and forth until the image contrast is best. It is a trail-and-error process, since the camera does not know a priori which is the highest contrast of the subject.

Figure 10: Contrast difference between focused and unfocused scene

Also, it is not possible to determine the focus direction looking to a blurry image; the focus may be either too long or too short. The camera must try one direction, and reverse it in case the contrast dips even more or doesn't improve.

Figure 11: Scene contrast detail, one part in focus, one part unfocused

The simplest contrast metric is the sum of all absolute brightness differences between adjacent pixels. The grand total is a unitless measurement of image contrast, and means nothing by itself. It is necessary to tweak the focus a bit, and recalculate the contrast. If the new contrast is higher, the image is (probably) sharper, meaning that focus has been moved to the right direction. Rinse and repeat until a maximum is found.

Given the limitations of contrast autofocus, cameras (even cell phones) are incorporating phase-detection autofocus. Such cameras use both systems together: phase detection for the first approximation and contrast for the final tweak. Phase autofocus implemented on the main image sensor might not be as precise as a dedicated sensor as found in SLRs — but it does not have to be, since the contrast-based autofocus is there to help.

Contrast-based autofocus may be the poor cousin but it has its share of advantages over phase-detection. It is cheaper to implement and works well with high-aperture lenses. More importantly, it is not affected by lens defects. Since the phase autofocus sensor can see only a strip of the image, it can be fooled by certain aberrations (and every lens has some kind of aberration). This is one reason (among others) why a DSLR may have focus issues with one particular lens while it works perfectly with other lenses.

Other systems

Other mechanisms, analog to radar e.g. ultrasound, laser, etc. may be used to measure distance from camera to subject, and therefore they might be employed in autofocus. Such methods don't see much use nowadays. Ultrasound was employed at the beginning of the 1980's. Such systems have the potential advantage of working in the pitch dark, and with subjects completely devoid of texture.