The Power of Pictures in the 2016 US Election
Analyzing Photos of Trump and Clinton in Media Coverage of the 2016 Election
Newspapers exercise editorial judgement in numerous ways, including selecting which topics to cover, which people to feature, which stories to highlight, and which way to tell a story. Our Media Cloud project helps researchers analyze these decisions, and the content that results, with a suite of web-based tools that dig into attention, language, representation, and influence of media online; all based on the HTML content of the webpages we collect. Until now we haven't focused on looking at the images embedded in the webpages. But analyzing images is important - photos in media have a huge impact on the public's perception of a story. #IfTheyGunnedMeDown was a recent example of critiquing the choice of images used in the news, and the power of the editorial choice in picking which photos accompany a story.
This blog post shares some initial research done by Rahul Bhargava and Daniel Kornhauser during 2016 into using photos as a way to assess media depictions of newsworthy individuals. Specifically, we collected and analyzed photos of Donald Trump and Hillary Clinton during the US 2016 election campaign. We collected stories from a handful of sources using Media Cloud, narrowed to just stories that mentioned either candidate, filtered the images in those stories to identify a set of photos to check for their faces, used a Microsoft web service to scan those photos for either person's faces, and then analyzed the emotional ratings the algorithm delivered for each. We acknowledge a huge caveat about algorithmic performance in facial recognition and emotion detection tasks, but decided to experiment with this analysis anyway. Our key findings:
- The media pictured Trump more often than Clinton, matching the greater reporting coverage that he received in text.
- Photos of Clinton were more often classified as neutral, while photos of Trump were classified across a wider range of emotions.
We believe these findings echo and amplify a larger narrative that played out during the 2016 US election around media portraying Clinton as the wonky old-school candidate and Trump as the cavalier outsider.
Collecting Images
We used Media Cloud to collect the URLs of stories about Clinton or Trump from 6 news sources (BuzzFeed, the Guardian, the New York Times, Politico, the Wall Street Journal, and the Washington Post). This short list was selected as a testing set based on advice from veteran journalist (and colleague) Matt Carroll. We collected stories from the Republican convention until our work started (from July 18th until October 18th of 2016). Out of a total of 114,077 stories from those media sources, 17,516 mentioned either candidate (15%).
Of course, not all of these stories have photos of either candidate. Extracting images from all these stories was easy with simple HTML parsing. However, that identified 219,367 images that we needed to check! Processing this amount of images in a reasonable amount of time and cost was a challenge, so we decided to weed out images that were not likely to actually include any faces in them. We developed a heuristic approach to determine which photos were viable candidate images worth checking for faces. Our pipeline looked like this:
- Remove images with invalid URLs. Any image with a URL that started with "http*" was kept. This left us with 90% of the URLs (198,414).
- Filter out repeated URLs. A significant number of the images were branding and graphic design related to the look of the source websites. To remove these, we filtered out images used more than 11 times in the dataset (more than 1000 unique URLs). This left us with 22% of the URLs (47,783).
- Remove invalid image types. We found from a qualitative assessment that files ending with .gif, .svg, or .png were not images of people. Filtering those out left us with 21% of the URLs (45,721).
- Keep just the unique URLs. This left us with 13% of the URLs (27,854).
- Remove images that were not available online at the URL indicated at the time of our study (late 2016). This left us with 13% of the URLs (27,578).
- Filter out small images. We subjectively determined that these wouldn't detect faces well based on guidelines from the 3rd party algorithms we might use online. This left us with just 11% of the original URLs (24,279).
After this reduction, we downloaded them all and created a corpus of 24,279 candidate images to check for faces.
Downloading the images was a significant technical challenge. After multiple tests we finally settled on a parallel process with Python code that took roughly 6 hours with 8 parallel threads. Any attempt to scale image processing in news media online would need to devote significant efforts to optimizing this gathering process.
Identifying Faces
Facial recognition has become a relatively common computational task, despite performance and accuracy limitations. At the time we did this work, in late 2016, there were a number of market solutions for analyzing faces and detecting emotions from them. After tinkering around a bit with various options available as web services, we ended up using Microsoft's solution.
The Microsoft Computer Vision services allowed two solutions - using their "celebrity recognition" or using their Face API to train a model to recognize each individual. Initial explorations found the celebrity recognition did not work that well. For instance, Clinton's official Senate photograph was only identified as a "woman" during July 2016. Note that currently in June 2018, the Face API correctly identifies the same senate photograph as "Hillary Clinton". So we decided to train our own model with the Face API. This also simplified the identification, given that would only would need to classify two faces, thereby reducing the difficulty of the problem. Utilizing their API proved to be another technical challenge, as at the time there was not a Python 3 library that made it simple to use their API. In fact we communicated with their development team a number of times to try and find the best solution. In the end it required us creating far more code than we expected to have to create.
To use their Faces API we created a training dataset of photos of Trump and Clinton by performing a Google News Image search for each candidate and combining some of those with some manually picked diverse images from our set of articles. The final training set had 45 images of Clinton and 47 images of Trump.
After running all the 24,279 candidate images through the Microsoft service, we ended up with 703 faces identified as Clinton and 1003 faces identified as Trump. The total cost of this process was roughly $50, which includes a significant amount of testing and re-running of the data (see their pricing details for more info). Not every image was easy to classify; one challenging example was collages of multiple photos of one or both individual.
Breaking it down by source, we find that in all but one source Trump was depicted more often. This matches the chart above about total mentions to candidates - Trump had more. It also matches our popular understanding of Trump having dominated the news cycle throughout the election season.
BuzzFeed
Politico
Guardian
Wall Street Journal
New York Times
Washington Post
The number of images of Trump and Clinton in our dataset, broken down by media source. In these donut charts Trump is in red, while Clinton is in blue (reflecting the standard colors associated with political parties in the US).
Emotions
One of the features Microsoft's algorithm offers is a confidence score for emotions (see their website for details). They bill this as a "preview", which to us implied that they are still testing it but believed it worked well enough to offer as a service. We decided to use that to assess media depictions of the candidates. The eight emotions they attempt to detect are anger, contempt, disgust, fear, happiness, neutral, sadness and surprise. Based on our (biased) understanding of popular perception of the candidates, we hypothesized that images of Clinton would be serious, while images of Trump would be angry.
These algorithms deliver confidence scores with each assessment of an image. Here are a few examples of the top strongest emotion classifications.
Pulling all those scores together into a simple visual is challenging, so we created the box-whisker plots below to explorer the results for each candidate.
As the chart above demonstrates, photos of Clinton was usually classified as neutral or happy. This suggests that media depictions of Clinton were serious in tone or friendly, or that she was simply less demonstrative in her overall facial expressions. We argue this reinforced the perception of her as a boring, wonky candidate.
In contrast, the photos depicting Trump were classified across a large range of emotions. We argue this supports the perception of Trump being cast as an expressive, vivacious candidate.
One research question that popped up related to potential differences between partisan sources - were they selected more emotionally loaded images of these candidates? However when we generated these charts by media source we saw no discernable difference in how each depicted either person. This could be interesting to dig into more with a wider set of sources.
It is useful here to pause and think about the context of this analysis - detecting emotions from images. Humans have a whole section of our brain devoted to this, yet still struggle to do it well! One would have to spend time creating a "political face detector" to do this well. Many of the faces classified as "neutral" had emotions a handful of real humans we checked with could detect. The Microsoft emotion detector seems to detect exaggerated facial expressions. Like with sentiment analysis, the algorithmic solution tends to work best when detection the extremes. So perhaps a better thing to say is that Trump exaggerates his facial expressions to the point that they are more easily detectable than Clinton. If you wanted to assess emotions in politicians' faces you probably want to build a whole training model just on their mannerisms, because they likely don't reflect the general population (and of course those norms diff culturally). This could be a fun project that would require more specific funding.
Emotions over Time
After the 3rd presidential debate, the Economist ran a similar analysis of the candidates emotions during the actual debates. The charts at the end of the the video, measuring emotion live over time during the debate, show a similar pattern to what we found. Clinton's chart is mostly flat, with small spikes of surprise and contempt, while Trump's chart shows spikes in all four emotions they show.
We were curious about whether depictions of the two changed over time in response to key events during the campaign. To analyze this we charted the frequency of emotional classification of the images for each candidate over time. These charts include one row for each emotion, with spikes indicating stronger detection of that emotion in the photos for that day. At a high level for Clinton we see spikes of anger and surprise against of backdrop of sadness and neutrality.
Looking at the same chart for photos of Trump, we see spikes of contempt after the 1st and 2nd debates (Sept 26th and Oct 9th), perhaps reflecting images from the debate itself being published in the news.
If we choose to analyze this dataset more, we would likely chart more key events during the campaign and layer them on top of these charts. An interactive visualization like that would then let us scan the photos that appeared after those events to explore any causal relationship between the event and a shift in emotions presented in photos of the candidates. This is a step beyond the scope of this prototype research.
Conclusion
So what does this all mean? We've shown that facial analysis can be applied to images in the Media Cloud corpus of news reporting. We've argued that photos selected in a handful of media sources of Clinton and Trump during the 2016 campaign reflected popular perceptions of the candidates. However, we know that facial recognition is a nascent technology that hasn't been stress-tested, despite its widespread roll-out. Our colleague Joy Buolamwini's Gender Shades project has demonstrated that facial recognition services reflect our culture's strong intersectional biases (for instance, their performance on African women is particularly poor). Can we rely on these algorithms at all when it comes to identification and emotion detection? To us the jury is still out on that, so these results should be taken with a big grain of salt. At the same time, they suggest another tool on our toolbelt for analyzing media coverage of issues. The photos included in news stories certainly serve to highlight the main topic, content, and framing of an article. Perhaps one day we can add a "photos" feature to Media Cloud that would support this kind of research en masse… though probably not until these algorithms have been exercised and validate to a larger degree.