Repo for my research project on real-time face recognition and emotion classification using a camera.
PIBIC: Programa Institucional de Bolsas de Iniciação Científica - Research Project at Universidade Federal do ABC (UFABC), in Brazil. Official paper number 01/2019, voluntary. This project was developed alongside the research PIE633-2020 from UFABC: "Análise de requisitos linguístico-computacionais em interfaces presenciais homem-máquina".
Supervisor: Dr. João Henrique Ranhel Ribeiro.
It is important to note that this project was developed in 2019-2020 and possibly may not be repeated today in the same method.
The full report in portuguese can be found in the repo. This is just a simplified explanation.
Automate real-time facial expression recognition, integrating with the video and audio annotation tool ELAN. Through this integration, establish real-time emotion classification through multimodal interaction of video and audio.
A large part of this research was focused on studying and testing facial recognition and emotion classification processes and methods, such as YOLO (You Only Look Once), CNN (Convolutional Neural Network), LBP (Local Binary Patterns) and HOG (Histogram of Oriented Gradientes). Afterwards, it was decided that the ideal method was through the toolkit Intel(R) Distribution of OpenVINO toolkit.
This toolkit used Haar Feature-Based Cascade and Histogram of Oriented Gradientes descriptors to classify images, in an optimized manner. The application Interactive Face Detection C++ Demo allowed us to idetify human faces, as well as head orientation, facial traces, biological sex and emotion on each face, trained from the dataset AffectNet.
This application is very good, although it only classifies 5 emotions (neutral, happy, sad, surprised, angry) instead of the 7 universal emotions defined by Paul Ekman.
By running this toolkit on each frame from the input camera, it was possible to detect faces and classify their emotion in real-time.
By altering the output of the application, it was possible to display the emotion detected in each frame in a text format, and, afterwards, display in text format the frames in which the emotion changed. The notation used in this was "frame_number,new_emotion"; where the 5 emotions where n (neutral), h (happy), t (sad), s (surprised), a (angry), and 0 (unidentified).
This notation could be treated as a track in ELAN, allowing for frame by frame comparison of the emotion classification obtained in this project and the emotion classification from a specialist obtained from viewing the video.
The comparison of the emotion classification obtained in this project and the classification of a specialist showed that the automated emotion classification was close to what humans would consider. This way it was possible to develop and automated process for identifying emotion in real-time and saving it as a track in ELAN for video-audio emotion analysis (part of the other project related to this one).

