Uploaded Test files
This commit is contained in:
parent
f584ad9d97
commit
2e81cb7d99
16627 changed files with 2065359 additions and 102444 deletions
126
venv/Lib/site-packages/sklearn/datasets/descr/lfw.rst
Normal file
126
venv/Lib/site-packages/sklearn/datasets/descr/lfw.rst
Normal file
|
@ -0,0 +1,126 @@
|
|||
.. _labeled_faces_in_the_wild_dataset:
|
||||
|
||||
The Labeled Faces in the Wild face recognition dataset
|
||||
------------------------------------------------------
|
||||
|
||||
This dataset is a collection of JPEG pictures of famous people collected
|
||||
over the internet, all details are available on the official website:
|
||||
|
||||
http://vis-www.cs.umass.edu/lfw/
|
||||
|
||||
Each picture is centered on a single face. The typical task is called
|
||||
Face Verification: given a pair of two pictures, a binary classifier
|
||||
must predict whether the two images are from the same person.
|
||||
|
||||
An alternative task, Face Recognition or Face Identification is:
|
||||
given the picture of the face of an unknown person, identify the name
|
||||
of the person by referring to a gallery of previously seen pictures of
|
||||
identified persons.
|
||||
|
||||
Both Face Verification and Face Recognition are tasks that are typically
|
||||
performed on the output of a model trained to perform Face Detection. The
|
||||
most popular model for Face Detection is called Viola-Jones and is
|
||||
implemented in the OpenCV library. The LFW faces were extracted by this
|
||||
face detector from various online websites.
|
||||
|
||||
**Data Set Characteristics:**
|
||||
|
||||
================= =======================
|
||||
Classes 5749
|
||||
Samples total 13233
|
||||
Dimensionality 5828
|
||||
Features real, between 0 and 255
|
||||
================= =======================
|
||||
|
||||
Usage
|
||||
~~~~~
|
||||
|
||||
``scikit-learn`` provides two loaders that will automatically download,
|
||||
cache, parse the metadata files, decode the jpeg and convert the
|
||||
interesting slices into memmapped numpy arrays. This dataset size is more
|
||||
than 200 MB. The first load typically takes more than a couple of minutes
|
||||
to fully decode the relevant part of the JPEG files into numpy arrays. If
|
||||
the dataset has been loaded once, the following times the loading times
|
||||
less than 200ms by using a memmapped version memoized on the disk in the
|
||||
``~/scikit_learn_data/lfw_home/`` folder using ``joblib``.
|
||||
|
||||
The first loader is used for the Face Identification task: a multi-class
|
||||
classification task (hence supervised learning)::
|
||||
|
||||
>>> from sklearn.datasets import fetch_lfw_people
|
||||
>>> lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
|
||||
|
||||
>>> for name in lfw_people.target_names:
|
||||
... print(name)
|
||||
...
|
||||
Ariel Sharon
|
||||
Colin Powell
|
||||
Donald Rumsfeld
|
||||
George W Bush
|
||||
Gerhard Schroeder
|
||||
Hugo Chavez
|
||||
Tony Blair
|
||||
|
||||
The default slice is a rectangular shape around the face, removing
|
||||
most of the background::
|
||||
|
||||
>>> lfw_people.data.dtype
|
||||
dtype('float32')
|
||||
|
||||
>>> lfw_people.data.shape
|
||||
(1288, 1850)
|
||||
|
||||
>>> lfw_people.images.shape
|
||||
(1288, 50, 37)
|
||||
|
||||
Each of the ``1140`` faces is assigned to a single person id in the ``target``
|
||||
array::
|
||||
|
||||
>>> lfw_people.target.shape
|
||||
(1288,)
|
||||
|
||||
>>> list(lfw_people.target[:10])
|
||||
[5, 6, 3, 1, 0, 1, 3, 4, 3, 0]
|
||||
|
||||
The second loader is typically used for the face verification task: each sample
|
||||
is a pair of two picture belonging or not to the same person::
|
||||
|
||||
>>> from sklearn.datasets import fetch_lfw_pairs
|
||||
>>> lfw_pairs_train = fetch_lfw_pairs(subset='train')
|
||||
|
||||
>>> list(lfw_pairs_train.target_names)
|
||||
['Different persons', 'Same person']
|
||||
|
||||
>>> lfw_pairs_train.pairs.shape
|
||||
(2200, 2, 62, 47)
|
||||
|
||||
>>> lfw_pairs_train.data.shape
|
||||
(2200, 5828)
|
||||
|
||||
>>> lfw_pairs_train.target.shape
|
||||
(2200,)
|
||||
|
||||
Both for the :func:`sklearn.datasets.fetch_lfw_people` and
|
||||
:func:`sklearn.datasets.fetch_lfw_pairs` function it is
|
||||
possible to get an additional dimension with the RGB color channels by
|
||||
passing ``color=True``, in that case the shape will be
|
||||
``(2200, 2, 62, 47, 3)``.
|
||||
|
||||
The :func:`sklearn.datasets.fetch_lfw_pairs` datasets is subdivided into
|
||||
3 subsets: the development ``train`` set, the development ``test`` set and
|
||||
an evaluation ``10_folds`` set meant to compute performance metrics using a
|
||||
10-folds cross validation scheme.
|
||||
|
||||
.. topic:: References:
|
||||
|
||||
* `Labeled Faces in the Wild: A Database for Studying Face Recognition
|
||||
in Unconstrained Environments.
|
||||
<http://vis-www.cs.umass.edu/lfw/lfw.pdf>`_
|
||||
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller.
|
||||
University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.
|
||||
|
||||
|
||||
Examples
|
||||
~~~~~~~~
|
||||
|
||||
:ref:`sphx_glr_auto_examples_applications_plot_face_recognition.py`
|
Loading…
Add table
Add a link
Reference in a new issue