Exploration of optimal gaze pattern in eye tracking studies using deep learning approach and its application to automatic image retrieval and object detection

A VIIHM Sponsored Academic Exchange

Roy (Ruixuan) Wang, School of Engineering & Physical Sciences, Heriot-Watt University

I visited the Visual Attention and Perception Lab in the School of Psychology, University of Lincoln, on 30th and 31st January 2015, and had an insightful discussion with Prof Kun Guo, the director of the Visual Attention and Perception Lab in the two days. The short exchange visit is to explore the potential application of machine learning approaches to visual attention research in psychology, particularly the exploration of optimal gaze pattern in eye tracking studies. From this visit, I not only understood the general methodology, data format and analysis protocol used in eye-tracking research, but more importantly identified two inter-disciplinary research problems to which machine learning approaches could be potentially applied.

The first research problem is to predict whether a subject correctly or incorrectly recognizes a specific facial emotion from subjects’ eye-tracking scan-path on the facial expression image. This problem could be formulated as a binary classification task, where the input is the scan-path data, which includes a sequence of eye-fixation positions and durations, and the output is the correctness of subject’s facial emotion recognition. The challenge here is how to effectively represent the scan-path information under varying number of eye fixations and varying temporal orders of fixation positions. Once the scan-path representation issue is resolved, standard classification techniques, like support vector machine (SVM) or neural network models, can be directly used to train a classifier to predict the correctness of subject’s facial emotion recognition.

The second research problem is to find the common or optimal gaze pattern across participants for a type of visual tasks like facial emotion recognition. It has been hypothesized that there might exist a common, optimal gaze pattern when recognizing facial emotion, such that the participants whose scan-path is more similar to the optimal gaze pattern would be more likely successful in recognizing facial expressions. This problem may be relevant to clustering and metric learning in machine learning. Similar to the above problem, the challenge is to design or learn an appropriate feature representation of scan-paths, such that those scan-paths with correct facial emotion recognition are closer to each other in the feature space, while those scan-paths with incorrect facial emotion recognition are further from those scan-path features with correct emotion recognition in the feature space. Once such scan-path feature is well designed or automatically learned, the common, optimal gaze pattern(s) could be obtained by searching for one (or even more) scan-path pattern whose feature is equivalent to the average of all (or part of) the scan-paths with correct facial emotion recognition in the feature space.

We planned to firstly initiate a proof-of-concept study for the first research problem, and then would like to prepare a grant proposal once we obtain any positive results. The plan was largely affected soon after the short visit due to my career move from academics to industry.