附录B 外文原文
Robust Face Recognition via Sparse Representation
Abstract
We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models, and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by pound;1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as Eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly, by exploiting the fact that these errors are often sparse w.r.t. to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm, and corroborate the above claims.
Keywords: Face Recognition, Feature Extraction, Occlusion and Corruption
Ⅰ INTRODUCTION
Prsimony has a rich history as a guiding principle for inference. One of its most celebrated instantiations, the principle of minimum description length in model selection stipulates that within a hierarchy of model classes, the model that yields the most compact representation should be preferred for decision-making tasks such as classification. A related, but simpler, measure of parsimony in high-dimensional data processing seeks models that depend on only a few of the observations, selecting a small subset of features for classification or visualization. Such sparse feature selection methods are, in a sense, dual to the support vector machine (SVM) approach which instead selects a small subset of relevant training examples to characterize the decision boundary between classes. While these works comprise only a small fraction of the literature on parsimony for inference, they do serve to illustrate a common theme: all of them useparsimony as a principle for choosing a limited subset of features or models from the training data, rather than directly using the data for representing or classifying an input (test) signal.
The role of parsimony in human perception has also been strongly supported by studies of human vision. Investigators have recently revealed that in both low-level and mid-level human vision many neurons in the visual pathway are selective for a variety of specific stimuli, such as color, texture, orientation, scale, and even view-tuned object images. Considering these neurons to form an overcomplete dictionary of base signal elements at each visual stage, the firing of the neurons w.r.t. to a given input image is typically highly sparse.
In the statistical signal processing community, the algorithmic problem of computing sparse linear representations w.r.t. to an overcomplete dictionary of base elements or signal atoms has seen a recent surge of interest. Much of this excitement centers around the discovery that whenever the optimal representation is sufficiently sparse, it can be efficiently computed by convex optimization, even though this problem can be extremely difficult in the general case . The resulting optimization problem, similar to the Lasso in statistics penalizes the pound;1-norm of the coefficients in the linear combination, rather than the directly penalizing the number of nonzero coefficients (i.e. the pound;0-norm).
The original goal of these works was not inference or classification per se, but rather representation and compression of signals, potentially using lower sampling rates than the Shannon-Nyquist bound . Algorithm performance was therefore measured in terms of sparsity of the representation and fidelity to the original signals. Furthermore, individual base elements in the dictionary were not assumed to have any particular semantic meaning - they are typically chosen from standard bases (e.g., Fourier, Wavelet, Curvelet, Gabor), or even generated from random matrices . Nevertheless, the sparsest representation is naturally discriminative: amongst all subsets of base vectors, it selects the subset which most compactly expresses the input signal and rejects all other possible but less compact representations.
In this paper, we exploit the discriminative nature of sparse representation to perform classification. Instead of using the generic dictionaries discussed above, we represent the test sample in an overcomplete dictionary whose base elements are the training samples themselves. If sufficient training samples are availablefrom each class,it will be possible to represent the test samples as a linear combination of just those training samples from the same class. This representation is naturally sparse, involving only a small fraction of the overall training database. We argue that in many problems of interest, it is actually the sparsest linear representation of the test sample i
剩余内容已隐藏,支付完成后下载完整资料
外文译文
办公门禁自动人脸识别系统的研究
摘要
安全性在公共或私人机构中已成为一个非常重要的问题,在该系统中,针对某些关键过程(例如人员身份识别,验证或识别),特别是对于建筑物出入控制,警察,驾驶员的可疑身份识别,已经提出并开发了各种安全系统。许可证等。自1980年代末以来,人脸识别一直是一个活跃的研究领域,具有众多应用,并且已成为安全系统开发的重要元素之一。本文重点研究自动人脸识别系统的研究和开发,该系统在办公门禁系统中具有潜在的应用。基于主成分分析(PCA)和人工神经网络的特征脸技术已被应用到系统中。该研究包括分析面部识别的三个主要因素,即照明,距离和对象的头部方向,对专门为办公室门禁控制而开发的面部识别系统的影响。实验结果表明,所开发的系统在40 cm至60 cm的摄像头与被摄体之间的距离上取得了80%的良好人脸识别率,并且被摄体的定向头角必须在-20至 20的范围内度。
关键字:人脸识别,神经网络,特征脸,主成分分析,门禁
Ⅰ简介
在过去的二十年中,人脸识别(FR)受到了广泛的关注,许多研究人员研究了它的各个方面。其原因可能是我们试图探索广泛的商业和安全应用程序,其次是可行的计算机技术的实用性,以开发和实现需要高计算能力的应用程序。实际上,FR已成为许多应用程序中的重要问题,例如访问控制,安全系统,信用卡验证和犯罪识别。例如,建模特定面部并将其与其他面部图像模型区分开的能力将有可能改善一个人的识别或识别。但是,仅FR系统具有局限性,因为它要求非常合作的主体将其面孔摆在系统前面。实际上,与识别面部相反,首先检测面部的能力可能非常重要。这是由于以下事实:面部检测过程被认为是自动面部识别系统的第一步。没有检测到的脸部区域,将无法进行脸部识别。否则,面部识别是一项非常高级的计算机视觉任务,其中涉及许多早期视觉技术。人脸识别的第一步是从面部图像中提取相关特征。自然而然的问题是面部特征的量化程度如何。如果有可能进行这样的量化,则计算机应该能够识别出具有一组特征的面部。有三个主要的研究小组提出了三种不同的方法来解决人脸识别问题。最大的一组涉及面部特征,这些面部特征被人类用来识别个人面部。第二组基于从轮廓轮廓提取的特征向量执行人脸识别,而第三组使用从面部正面视图提取的特征向量。大多数人脸识别算法属于以下两种主要方法之一:基于特征的算法和基于图像的算法。基于特征的方法探索一组几何特征,例如眼睛之间的距离或眼睛的大小,并使用这些度量来表示给定的脸部。这些功能是使用带有预期模板的简单相关性过滤器计算的。这些方法在一定程度上不会改变照明,并且可以部分补偿相机位置的变化。但是,它们对衰老和面部表情敏感。还不清楚哪些功能对于分类很重要,这是一个需要更多数学研究的领域。文献中有一些基本的数学结果试图解决这些问题,但尚未被人脸识别充分利用。在本文中,针对门禁应用程序的目的设计了一种自动人脸识别系统应用程序。所开发的FR系统基于众所周知的Eigenface技术,该技术源自主成分分析(PCA)。
Ⅱ主成分分析用于人脸识别
FR中用于特征选择和降维的流行技术是主成分分析(PCA)。此技术已在[3] [4] [5]中使用。PCA是一种标准的去相关技术和应用,可以推导一个正交,该正交可以直接导致其投影基数减少后的维数,并可能具有特征选择的特征.Xisin;RN是表示图像的随机向量,其中N为图像空间的维数。通过将可以归一化为具有单位范数的图像的行或列进行级联来形成矢量。定义了X的协方差矩阵。
如下:
sum;X = E{[X — E(X)][X — E(X)]t}, (1)
其中E(。)是期望运算符,t表示转置运算,而Sigma;Xisin;NxN。随机向量X的PCA将协方差矩阵分解为以下形式:
Sigma;X=Phi;Lambda;Phi;t,其中Phi;= [$1$2⋯$N],Lambda;= diag {h1,h2,⋯,hN}, (2)
其中Phi;isin;NxN是正交特征向量矩阵,而Lambda;isin;NxN是对角本征值矩阵,其对角元素的降序为h1le;h2le;⋯le;hN,
$1,$2,⋯,$N和h1,h2,⋯,hN分别是特征向量和Sigma;X的特征值。
PCA的一个重要属性是去相关,即,由于X的协方差矩阵是对角线的,因此变换的分量Xu=Phi;tX是不相关的,
sum;许=Lambda;,对角元素是相应分量的方差。PCA的另一个特性是,当仅当P = [$1$2⋯$n],m lt;N,和Pisin;Rx代表原始信号,遵循此特性,PCA的直接应用是降维:
y=ptx (3)
低维向量Yc%n捕获了原始数据X的最富有表现力的特征。由于PCA使用所有训练样本根据观察到的变化得出投影轴,因此在用新颖图像进行测试时,它具有良好的泛化能力,可以进行图像重建训练期间未见。PCA的缺点是它不能区分内部变量和内部变量之间的不同作用,并且将它们同等对待。当面部类别的分布不是由均值差而是由协方差来分开时,这将导致较差的测试性能。高方差本身并不一定会导致良好的辨别能力,除非相应的分布是多峰的且模式与要区分的类别相对应。还应该意识到,由于PCA仅针对二阶统计信息进行编码,因此缺少相位,因此缺少位置信息。基于PCA的技术,Turk和Pentland在[4]中成功开发了一个著名的近实时人脸识别系统,称为特征脸,其中特征脸对应于与人脸协方差矩阵的主要特征值相关的特征向量。特征脸定义了一个特征空间或“脸部空间”,它会大大降低原始空间的维数,并在缩小后的空间中进行脸部检测和识别。
Ⅲ方法论与系统开发
该自动人脸识别系统的开发是在典型的人脸识别的两个阶段中完成的,即训练阶段和评估阶段。在第一阶段,捕获特定数量的面部候选者的训练图像。使用主成分分析从人额脸部的强度图像中提取特征。然后,系统将学习提取的特征并将其存储在其数据库中。在第二阶段,系统将以无人监督的方式识别新面孔,并且易于使用人工神经网络实现。该系统的通用框架如图1所示。
图1:FR系统的通用框架
使用Microsoft Visual C 和Visual Basic 2008平台已经完成了图形用户界面(GUI)的开发和系统中人工神经网络的应用。两种类型的面部数据库被用来训练系统。第一类包括要识别的人的捕获和裁剪的面部图像。每个人都有十张图像,其中正面位置的变化是在垂直于相机的方向上沿左右方向旋转二十度。这些图像可以使用相机预先捕获,裁剪,然后训练到系统中并保存在系统的面部数据库中。同时,第二个人脸数据库由临时使用的人脸正面图像组成,这些图像可以使用 的 系统的 相机。这些捕获图像的数量和特征类似于第一种类型的面部数据库。面部训练图像的数量可以稍后增加,以观察FR系统的性能。实际上,第二种面部数据库的创建具有可以在线进行操作的优势。图2显示了FR系统的GUI快照。
图2:系统界面
Ⅳ 实验框架与结果
在该项目中,研究了人脸识别的应用,以研究设计用于门禁控制的自动人脸识别系统的适用性。实验工作仅着眼于照明效果,被摄对象脸部距离和被摄对象脸部旋转角度的研究。照明强度已经改变,并且使用数字勒克斯照度计测量该强度变化。测量单位为勒克斯,光源位于FR系统本身的摄像机上。在第二个因素即被摄体的距离中,通常使用以厘米为单位的测量单位。此测量代表从FR相机到被摄体正面的距离。考虑到这两个因素,逻辑上假设距离越大,获得的强度越小。因此,进行了第一个实验以研究它们之间的关系,结果如图3所示。在该图中,进行的假设是合理的,距离越远,光强度值越低。
图3:照度与距离之间的关系
本文中考虑的第三个因素是对象的脸部朝向系统相机的方向角度。通过将距相机和正面对象的垂直距离假定为零度,将任何朝向面部右方向的头部旋转都视为正角,而将头部向左侧方向旋转则视为负角。该实验的图示如图4所示。
图4:受试者头部的定向位置示意图
考虑到这三个因素,已对五个不同的人进行了系统测试,这些人的正面面部图像已被捕获并存储在面部数据库中。使用第一个面部数据库类型创建了前两个个人面部数据库,并且使用第二个面部数据库类型创建了其他三个个人面部数据库。测试是根据被摄体距离及其头部旋转角度的变化而进行的。所得结果示于表1。
表1:测试时系统识别的总体性能,包括面部距离和拍摄对象的旋转头位置相对于相机的变化。“ x”表示可识别人脸,而“-”表示未知人脸。
角度(度) |
|||||||||||
距离(厘米) |
-40 |
-30 |
-20 |
-10 |
0 |
10 |
20 |
30 |
40 |
识别率 |
% |
10 |
- |
- |
- |
- |
- |
- |
- |
- |
- |
0/9 |
0.0 |
20 |
- |
- |
- |
x |
x |
x |
- |
- |
- |
3/9 |
33.3 |
30 |
- |
- |
- |
x |
x |
x |
- |
- |
- |
3/9 |
33.3 |
40 |
- |
x |
x |
x |
x |
x |
x |
x |
- |
7/9 |
77.7 |
50 |
- |
- |
x |
x |
x |
x |
x |
- |
- |
5/9 |
55.5 |
60 |
- |
- |
x |
x |
x |
x |
x |
- |
- |
5/9 |
55.5 |
70 |
- |
- |
- |
x |
x |
x |
x |
- |
- |
4/9 |
44.4 |
80 |
- |
- |
- |
x |
剩余内容已隐藏,支付完成后下载完整资料 资料编号:[607500],资料为PDF文档或Word文档,PDF文档可免费转换为Word |
课题毕业论文、外文翻译、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。