Seeing With OpenCV, Part 4: Face Recognition With Eigenface
by Robin Hewitt
This article originally appeared in
SERVO Magazine, April 2007.
Reprinted by permission of T & L Publications, Inc.
Last month's article
introduced Camshift, OpenCV's built-in face tracker. This month and next, this series concludes by showing you how to use OpenCV's implementation of eigenface for face recognition.
Face recognition is the process of putting a name to a face. Once you've detected a face, face recognition means figuring out whose face it is. You won't see security level recognition from eigenface. It works well enough, however, to make a fun enhancement to a hobbyist robotics project.
This month's article gives a detailed explanation of how eigenface works and the theory behind it. Next month's article will conclude this topic by taking you through the programming steps to implement eigenface.
What is Eigenface?
Eigenface is a simple face recognition algorithm that's easy to implement. It's the first face-recognition method that computer vision students learn, and it's a standard, workhorse method in the computer vision field. Turk and Pentland published the paper that describes their Eigenface method in 1991
(Reference 3, below).
Citeseer lists 223 citations for this paper - an average of 16 citations per year since publication!
The steps used in eigenface are also used in many advanced methods. In fact, if you're interested in learning computer vision fundamentals, I recommend you learn about and implement eigenface, even if you don't plan to incorporate face recognition into a project! One reason eigenface is so important is that the basic principles behind it - PCA and distance-based matching - appear over and over in numerous computer vision and machine learning applications.
Here's how recognition works: given example face images for each of several people, plus an unknown face image to recognize,
- Compute a "distance" between the new image and each of the example faces
- Select the example image that's closest to the new one as the most likely known person
- If the distance to that face image is above a threshold, "recognize" the image as that person, otherwise, classify the face as an "unknown" person
How "Far Apart" Are These Images?
Distance, in the original eigenface paper, is measured as the point-to-point distance. This is also called Euclidean distance. In two dimensions (2D), the Euclidean distance between points P1 and P2 is
d12 = sqrt(dx2 + dy2),
where dx = x2-x1, and dy = y2-y1.
In 3D, it's sqrt(dx2 + dy2 + dz2).
shows Euclidean distance in 2D.
Figure 1. Euclidean distance, d12, for two points in two dimensions.
In a 2D plot, such as
the dimensions are the X and Y axes. To get 3D, throw in a Z axis. But what are the dimensions for a face image? The simple answer is that eigenface considers each pixel location to be a separate dimension. But there's a catch...
The catch is that we're first going to do something called dimensionality reduction. Before explaining what that is, let's look at why we need it.
Noise Times 2,500 Is A Lot Of Noise
By computing distance between face images, we've replaced 2,500 differences between pixel values with a single value. The question we want to consider is, "What effect does noise have on this value?"
Let's define noise as anything, other than an identity difference, that affects pixel brightness. No two images are exactly identical, and small, incidental influences cause changes in pixel brightness. If each one of these 2,500 pixels contributes even a small amount of noise, the sheer number of noisy pixels means the total noise level will be very high.
Amidst all these noise contributions, whatever information is useful for identifying individual faces is presumably contributing some sort of consistent signal. But with 2,500 pixels each adding some amount noise to the answer, that small signal's hard to find and harder to measure.
Very often, the information of interest has a much lower dimensionality than the number of measurements. In the case of an image, each pixel's value is a measurement. Most likely, we can (somehow) represent the information that would allow us to distinguish between faces from different individuals with a much smaller number of parameters than 2,500. Maybe that number's 100; maybe it's 12. We don't claim to know in advance what it is, only that it's probably much smaller than the number of pixels.
If this assumption's correct, summing all the squared pixel differences would create a noise contribution that's extremely high compared to the useful information. One goal of dimensionality reduction is to tone down the noise level, so the important information can come through.