Context is Crucial in Identifying Emotions

Edited by Maedbh King
Mandy Chen grew up in Zhongshan in southern China and attended Peking University in Beijing where she studied Psychology. She is starting her fourth year in the psychology graduate program at UC Berkeley where she studies cognitive neuroscience. She is supervised by Prof. David Whitney.

Mandy Chen is a fourth-year graduate student working under the supervision of Prof. David Whitney
Mandy Chen is a fourth-year graduate student working under the supervision of Prof. David Whitney

What drew you to study emotion perception?

I have always had an interest in using quantitative methods to study mechanisms of the human mind. I was drawn to emotion perception because it is crucial for daily social interactions and it is a fascinating topic with a lot of gaps to fill. For example, one major advance in human object recognition is that we have taught computer vision models to recognize objects with amazing accuracy. However, recognizing emotion is a way more complex and challenging problem, and there have not been many successful models. I am hoping to bring some of the rigorous experimental techniques that I have learned in vision sciences (e.g. psychophysics) to help tackle this social science topic.

What has your research uncovered about facial expressions?

When it comes to reading a person’s state of mind, is it enough to just look at facial expressions? It is intuitive to say yes, and this has been the main direction of research for decades. However, my studies show that visual context – as in background and action – is both sufficient and necessary to accurately and rapidly recognize emotions. We blurred the faces and bodies of characters in muted video clips. Despite the characters’ invisible appearance, hundreds of participants were able to accurately read their emotions by extracting information from the visual context. We further show that the context provides a substantial and unique contribution beyond the information provided by the face and body. My research reveals that emotion recognition is, at its heart, an issue of context as much as faces.

How will your findings have real-world implications?

Currently, companies are developing machine learning models to recognize emotions, but they only train their models on cropped faces and these models can only read emotions from faces. My research shows that only looking at faces does not reveal emotions very accurately and models should consider the context as well. The method that I developed could be used to quantify the contribution of facial expression versus visual context in any video of any scenario. My findings can lead us to understand under what scenarios visual context is more important and what mechanism the brain employs to perform the inference.

In addition, current measures of emotional intelligence typically rely on decontextualized, oversimplified face stimuli. My findings suggest that tests of emotional intelligence will need to be revised to incorporate the separate but important issue of context. A person may be able to recognize static photos of facial emotions but fail to understand the displayed emotion accurately, unless they successfully incorporate the context. My method could eventually be used to evaluate how people with disorders like autism and schizophrenia recognize emotions in real time and help with their diagnoses.

Finally, what advances would you like to see in your field in the next two decades?

High-level cognitive functions such as emotion inference have been considered too difficult to solve in the field of computer vision, and very few artificial intelligence algorithms have succeeded in imitating them. I would like to see advances in the mechanisms underlying these high-level cognitive functions and hopefully build AI that can approach human abilities.