“Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments”, 2008-09-16 (; backlinks):
Most face databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variables as position, pose, lighting, background, camera quality, and gender. While there are many applications for face recognition technology in which one can control the parameters of image acquisition, there are also many applications in which the practitioner has little or no control over such parameters.
This database, Labeled Faces in the Wild (LFW), is provided as an aid in studying the latter, unconstrained, recognition problem. The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life. The database exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
In addition to describing the details of the database, we provide specific experimental paradigms for which the database is suitable. This is done in an effort to make research performed with the database as consistent and comparable as possible. We provide baseline results, including results of a state-of-the-art face recognition system combined with a face alignment system.
To facilitate experimentation on the database, we provide several parallel databases, including an aligned version.
…The database contains 13,233 target face images.
Some images contain more than one face, but it is the face that contains the central pixel of the image which is considered the defining face for the image. Faces other than the target face should be ignored as “background”.
The name of the person pictured in the center of the image is given.
Each person is given a unique name (“George W. Bush” is the current US president while “George H. W. Bush” is the previous US president), so no name should correspond to more than one person, and each individual should appear under no more than one name (unless there are unknown errors in the database).
The database contains images of 5,749 different individuals.
Of these, 1680 people have two or more images in the database. The remaining 4,069 people have just a single image in the database.
The images are available as 250×250 pixel JPEG images.
Most images are in color, although a few are grayscale only.
All of the images are the result of detections by the Viola-Jones face detector, but have been rescaled and cropped to a fixed size (see §6 for details).
After running the Viola-Jones detector on a large database of images, false positive face detections were manually eliminated, along with images for whom the name of the individual could not be identified.