top of page
  • Writer's picturekkoninkodesoolaral

Tracking human faces in infrared video: A review of recent advances and future directions



A facial recognition system[1] is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.[2]


Automated facial recognition was pioneered in the 1960s by Woody Bledsoe, Helen Chan Wolf, and Charles Bisson, whose work focused on teaching computers to recognize human faces.[10] Their early facial recognition project was dubbed "man-machine" because a human first needed to establish the coordinates of facial features in a photograph before they could be used by a computer for recognition. Using a graphics tablet, a human would pinpoint facial features coordinates, such as the pupil centers, the inside and outside corners of eyes, and the widows peak in the hairline. The coordinates were used to calculate 20 individual distances, including the width of the mouth and of the eyes. A human could process about 40 pictures an hour, building a database of these computed distances. A computer would then automatically compare the distances for each photograph, calculate the difference between the distances, and return the closed records as a possible match.[10]




tracking human faces in infrared video



While humans can recognize faces without much effort,[30] facial recognition is a challenging pattern recognition problem in computing. Facial recognition systems attempt to identify a human face, which is three-dimensional and changes in appearance with lighting and facial expression, based on its two-dimensional image. To accomplish this computational task, facial recognition systems perform four steps. First face detection is used to segment the face from the image background. In the second step the segmented face image is aligned to account for face pose, image size and photographic properties, such as illumination and grayscale. The purpose of the alignment process is to enable the accurate localization of facial features in the third step, the facial feature extraction. Features such as eyes, nose and mouth are pinpointed and measured in the image to represent the face. The so established feature vector of the face is then, in the fourth step, matched against a database of faces.[31]


To enable human identification at a distance (HID) low-resolution images of faces are enhanced using face hallucination. In CCTV imagery faces are often very small. But because facial recognition algorithms that identify and plot facial features require high resolution images, resolution enhancement techniques have been developed to enable facial recognition systems to work with imagery that has been captured in environments with a high signal-to-noise ratio. Face hallucination algorithms that are applied to images prior to those images being submitted to the facial recognition system use example-based machine learning with pixel substitution or nearest neighbour distribution indexes that may also incorporate demographic and age related facial characteristics. Use of face hallucination techniques improves the performance of high resolution facial recognition algorithms and may be used to overcome the inherent limitations of super-resolution algorithms. Face hallucination techniques are also used to pre-treat imagery where faces are disguised. Here the disguise, such as sunglasses, is removed and the face hallucination algorithm is applied to the image. Such face hallucination algorithms need to be trained on similar face images with and without disguise. To fill in the area uncovered by removing the disguise, face hallucination algorithms need to correctly map the entire state of the face, which may be not possible due to the momentary facial expression captured in the low resolution image.[40]


Three-dimensional face recognition technique uses 3D sensors to capture information about the shape of a face. This information is then used to identify distinctive features on the surface of a face, such as the contour of the eye sockets, nose, and chin.[41]One advantage of 3D face recognition is that it is not affected by changes in lighting like other techniques. It can also identify a face from a range of viewing angles, including a profile view.[41][33] Three-dimensional data points from a face vastly improve the precision of face recognition. 3D-dimensional face recognition research is enabled by the development of sophisticated sensors that project structured light onto the face.[42] 3D matching technique are sensitive to expressions, therefore researchers at Technion applied tools from metric geometry to treat expressions as isometries.[43] A new method of capturing 3D images of faces uses three tracking cameras that point at different angles; one camera will be pointing at the front of the subject, second one to the side, and third one at an angle. All these cameras will work together so it can track a subject's face in real-time and be able to face detect and recognize.[44]


DeepFace is a deep learning facial recognition system created by a research group at Facebook. It identifies human faces in digital images. It employs a nine-layer neural net with over 120 million connection weights, and was trained on four million images uploaded by Facebook users.[54][55] The system is said to be 97% accurate, compared to 85% for the FBI's Next Generation Identification system.[56]


In 2006, the performance of the latest face recognition algorithms was evaluated in the Face Recognition Grand Challenge (FRGC). High-resolution face images, 3-D face scans, and iris images were used in the tests. The results indicated that the new algorithms are 10 times more accurate than the face recognition algorithms of 2002 and 100 times more accurate than those of 1995. Some of the algorithms were able to outperform human participants in recognizing faces and could uniquely identify identical twins.[41][155]


In the 18th and 19th century, the belief that facial expressions revealed the moral worth or true inner state of a human was widespread and physiognomy was a respected science in the Western world. From the early 19th century onwards photography was used in the physiognomic analysis of facial features and facial expression to detect insanity and dementia.[220] In the 1960s and 1970s the study of human emotions and its expressions was reinvented by psychologists, who tried to define a normal range of emotional responses to events.[221] The research on automated emotion recognition has since the 1970s focused on facial expressions and speech, which are regarded as the two most important ways in which humans communicate emotions to other humans. In the 1970s the Facial Action Coding System (FACS) categorization for the physical expression of emotions was established.[222] Its developer Paul Ekman maintains that there are six emotions that are universal to all human beings and that these can be coded in facial expressions.[223] Research into automatic emotion specific expression recognition has in the past decades focused on frontal view images of human faces.[224]


We propose a novel tracking method that uses a network of independent particle filter trackers whose interactions are modeled using coalitional game theory. Our tracking method is general; it maintains pixel-level accuracy, and can negotiate surface deformations and occlusions. We tested our method in a substantial video set featuring nontrivial motion from over 40 objects in both the infrared and vi sual spectra. The coalitional tracker demonstrated fault-tolerant behavior that far exceeds the performance of single-particle filter trackers. Our method represents a shift from the typical tracking paradigms and may find application in demanding imaging problems across the electromagnetic spectrum.


Previous work on infrared based facial analysis and ROI tracking primarily explored the use of standard machine learning techniques33,34,35,36. These models allow optimal landmark detection in some cases but need further improvement as they rely on data attributes (features) which in the case of IR facial images lack the details present in visible spectrum images. Therefore, it is necessary to combine features from visible images and thermal images for facial analysis. Hence, we employed this in our second approach and incorporated two landmark estimation models in a cascade framework which proved successful.


Heart rate (HR) is extremely valuable in the study of complex behaviours and their physiological correlates in non-human primates. However, collecting this information is often challenging, involving either invasive implants or tedious behavioural training. In the present study, we implement a Eulerian video magnification (EVM) heart tracking method in the macaque monkey combined with wavelet transform. This is based on a measure of image to image fluctuations in skin reflectance due to changes in blood influx. We show a strong temporal coherence and amplitude match between EVM-based heart tracking and ground truth ECG, from both color (RGB) and infrared (IR) videos, in anesthetized macaques, to a level comparable to what can be achieved in humans. We further show that this method allows to identify consistent HR changes following the presentation of conspecific emotional voices or faces. EVM is used to extract HR in humans but has never been applied to non-human primates. Video photoplethysmography allows to extract awake macaques HR from RGB videos. In contrast, our method allows to extract awake macaques HR from both RGB and IR videos and is particularly resilient to the head motion that can be observed in awake behaving monkeys. Overall, we believe that this method can be generalized as a tool to track HR of the awake behaving monkey, for ethological, behavioural, neuroscience or welfare purposes.


Tracking variations in autonomous responses has proven to be invaluable in the study of complex behaviours and their physiological correlates in non-human primates1. These include tracking changes in pupil diameter2,3,4,5, in skin conductance6, social blinks7, blink rates8,9, nose temperature10 and heart rate (HR)11,12. HR measure is of particular relevance in diverse cognitive contexts. For example, it has been shown that HR increases when monkeys watch videos with high affective content13 and during learning process14,15. In spite of this, very few methods currently allow to easily, reliably and non-invasively track HR in awake behaving untrained monkeys. The aim of the present study is to fill this methodological gap. 2ff7e9595c


0 views0 comments

Recent Posts

See All
bottom of page