Rama Chellappa - Selected Publications#
1. “Face recognition: A literature survey” (with W. Zhao, P.J. Phillips and A. Rosenfeld), ACM computing surveys, vol. 35, 2003 (Google Scholar citations: 9315).
As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system. This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, not only existing recognition techniques have been categorized but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.
2. “Human and machine recognition of faces: A survey (with C.L. Wilson and S. Sirohey), Proceedings of the IEEE, vol. 83, 1995 (Google Scholar citations: 4333).
This paper presents a critical survey of existing literature on human and machine recognition of faces. Machine recognition of faces has several applications, ranging from static matching of controlled photographs as in mug shots matching and credit card verification to surveillance video images. Such applications have different constraints in terms of complexity of processing requirements and thus present a wide range of different technical challenges. Over the last 20 years researchers in psychophysics, neural sciences and engineering, image processing analysis and computer vision have investigated a number of issues related to face recognition by humans and machines. Ongoing research activities have been given a renewed emphasis over the last five years. Existing techniques and systems have been tested on different sets of images of varying complexities. But very little synergism exists between studies in psychophysics and the engineering literature. Most importantly, there exists no evaluation or benchmarking studies using large databases with the image quality that arises in commercial and law enforcement applications. This paper first presents different applications of face recognition in commercial and law enforcement sectors. This is followed by a brief overview of the literature on face recognition in the psychophysics community. We then present a detailed overview of move than 20 years of research done in the engineering community. Techniques for segmentation/location of the face, feature extraction and recognition are reviewed. Global transform and feature based methods using statistical, structural and neural classifiers are summarized.
3. “Machine recognition of human activities: A survey (with P. Turaga, V.S. Subrahmanian and O. Udrea), IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, 2008 (Google Scholar citations: 1744).
A rapid proliferation of video cameras in all walks of life has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing-robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing-make this problem hard to solve. This review paper presents a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications.
4. “Soft-NMS-improving object detection with one line of code (with N. Bodla, B. Singh and L.S. Davis), Proceedings of the IEEE International Conference on Computer Vision, 2017 (Google Scholar citations: 1466).
Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on the basis of their scores. The detection box M with the maximum score is selected and all other detection boxes with a significant overlap (using a pre-defined threshold) with M are suppressed. This process is recursively applied on the remaining boxes. As per the design of the algorithm, if an object lies within the predefined overlap threshold, it leads to a miss. To this end, we propose Soft-NMS, an algorithm which decays the detection scores of all other objects as a continuous function of their overlap with M. Hence, no object is eliminated in this process. Soft-NMS obtains consistent improvements for the coco-style mAP metric on standard datasets like PASCAL VOC 2007 (1.7% for both R-FCN and Faster-RCNN) and MS-COCO (1.3% for R-FCN and 1.1% for Faster-RCNN) by just changing the NMS algorithm without any additional hyper-parameters. Using Deformable-RFCN, Soft-NMS improves state-of-the-art in object detection from 39.8% to 40.9% with a single model. Further, the computational complexity of Soft-NMS is the same as traditional NMS and hence it can be efficiently implemented. Since Soft-NMS does not require any extra training and is simple to implement, it can be easily integrated into any object detection pipeline. Code for Soft-NMS is publicly available on GitHub.
5. “Discriminant analysis for recognition of human face images” (with K. Elemad), Josa a, vol. 14, 1997 (Google Scholar citations: 1458).
The discrimination power of various human facial features is studied and a new scheme for automatic face recognition (AFR) is proposed. The first part of the paper focuses on the linear discriminant analysis (LDA) of different aspects of human faces in the spatial as well as in the wavelet domain. This analysis allows objective evaluation of the significance of visual information in different parts (features) of the face for identifying the human subject. The LDA of faces also provides us with a small set of features that carry the most relevant information for classification purposes. The features are obtained through eigenvector analysis of scatter matrices with the objective of maximizing between-class variations and minimizing within-class variations. The result is an efficient projection-based feature-extraction and classification scheme for AFR. Each projection creates a decision axis with a certain level of discrimination power or reliability. Soft decisions made based on each of the projections are combined, and probabilistic or evidential approaches to multisource data analysis are used to provide more reliable recognition results. For a medium-sized database of human faces, excellent classification accuracy is achieved with the use of very-low-dimensional feature vectors. Moreover, the method used is general and is applicable to many other image-recognition tasks.
6. “A method for enforcing integrability in shape from shading algorithms” (with R.T. Frankot), IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, 1988 (Google Scholar citations: 1327).
An approach for enforcing integrability, a particular implementation of the approach, an example of its application to extending an existing shape-from-shading algorithm, and experimental results showing the improvement that results from enforcing integrability are presented. A possibly nonintegrable estimate of surface slopes is represented by a finite set of basis functions, and integrability is enforced by calculating the orthogonal projection onto a vector subspace spanning the set of integrable slopes. The integrability projection constraint was applied to extending an iterative shape-from-shading algorithm of M.J. Brooks and B.K.P. Horn (1985). Experimental results show that the extended algorithm converges faster and with less error than the original version. Good surface reconstructions were obtained with and without known boundary conditions and for fairly complicated surfaces
7. “Hyperface: A deep multi-task learning framework for face detection, landmark localization” (with R. Ranjan and V.M. Patel), IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, 2017 (Google Scholar citations: 1271).
This paper presents an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.
8. “Domain adaptation for object recognition: An unsupervised approach” (with R. Gopalan and R. Li), Proceedings of the IEEE International Conference on Computer Vision, 2011 (Google Scholar citations: 1198).
Adapting the classifier trained on a source domain to recognize instances from a new target domain is an important problem that has received considerable attention. This paper presents one of the first studies on unsupervised domain adaptation in the context of object recognition, where we have labeled data only from the source domain (and therefore do not have correspondences between object categories across domains). Motivated by incremental learning, we create intermediate representations of data between the two domains by viewing the generative subspaces (of same dimension) created from these domains as points on the Grassmann manifold, and sampling points along the geodesic between them to obtain subspaces that provide a meaningful description of the underlying domain shift. We then obtain the projections of labeled source domain data onto these subspaces, from which a discriminative classifier is learnt to classify projected data from the target domain. We discuss extensions of our approach for semi-supervised adaptation, and for cases with multiple source and target domains, and report competitive results on standard datasets.
9. “Discriminant analysis of principal components for face recognition” (with W. Zhao, A. Krishnaswamy, D.L. Swets and J Weng), Chapter in Face Recognition (Springer), 1988 (Google Scholar citations: 1154).
This chapter describes a face recognition method based on PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis). The method consists of two steps: first steo projects the face image from the original vector space to a face subspace via PCA and the second step uses LDA to obtain a linear classifier. The basic idea of combining PCA and LDA is to improve the generalization capability of LDA when only few samples per class are available. Using FERET dataset it has been demonstrated that a significant improvement when principal components rather than original images are fed to the LDA classifier. The hybrid classifier using PCA and LDA provides a useful framework for other image recognition tasks as well.
10. “Entropy rate super-pixel segmentation,” (with M.Y. Liu, O. Tuzel and S. Ramalingam), CVPR 2011 (Google Scholar citations: 1031).
A new objective function for super-pixel segmentation is proposed in this paper. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. We present a novel graph construction for images and show that this construction induces a matroid - a combinatorial structure that generalizes the concept of linear independence in vector spaces. The segmentation is then given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting submodular and mono-tonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of ½ for the optimality of the solution. Extensive experiments on the Berkeley segmentation benchmark show that the proposed algorithm outperforms the state of the art in all the standard evaluation metrics.