Mining the Media II: Have We Actually Learned Anything?

Continued from previous post.

If my analysis has only served to illustrate media bias, text mining is not worth the trouble. The process of locating, saving, cleaning, uploading, and analyzing a substantial corpus is time-consuming and often tedious. While my experiment with Voyant is only a starting point, I believe that text mining digital media sources on a larger scale can be useful to both researchers and the general public.

Below, I outline four uses for text-mining online news content.

1. Evaluating new media sources

I included Axios in my analysis partially because it claims to be neutral, but also because it is fairly obscure and most people – myself included – do not have preconceptions about its political leaning. Axios’ tweets do relay a nuanced narrative about topics like defunding the police and the 2020 election, one that is often critical of both political parties. More than other news sources, Axios emphasizes technology and business news on Twitter. These tweets include events and interviews with CEOs of organizations like Verizon, Bank of America, and Qualcomm. In fact, Verizon and Qualcomm were among Axios’ unique frequent terms. Despite tweeting far less often, Axios tweeted about the economy almost as many times as CNN during the same 1-month period.

This focus on technology and economy, while not inherently political, suggests that Axios appeals to an audience that is deeply invested in business, finance, and innovation. Such an interest may affect how Axios covers other topics, including domestic and international social unrest. Text analysis allows us to read Axios with a more critical eye, going beyond the claims of nonpartisan neutrality.

2. Uncovering the unexpected

My approach to mining the media has many limitations, not least because I started with presuppositions about the biases in my corpus. Despite this methodological flaw, some of my findings still surprised me. For example, I assumed that CNN Twitter would capitalize on race and immigration during the 2020 election. Instead, I found that all five sources talked about racism at about the same frequency, and CNN mostly avoided the topics of immigration, BLM, and police brutality.

3. Detecting unique language and loaded terms

Many people, including Vice-President elect Kamala Harris, have used the phrase “dogwhistling” this year. Currently, “dogwhistling” often describes how white supremacist groups communicate publicly via seemingly innocuous words and phrases that only other white supremacists will recognize. The term, however, can apply to any use of coded insider language. Text mining may help researchers to identify different groups’ dogwhistles and make this information available to the public. For example, does saying “coronavirus” instead of “COVID” suggest a more laissaz-faire attitude to preventing spread? Does using the phrase “big tech” instead of just “tech companies” suggest a critical or even conspiratorial attitude towards tech corporations? Here, text mining can help us read between the lines.

4. Indexing the future digital archive

If researchers carefully mine the media to document trends in content and language, future historians will be better able to interpret our digital record. Picture this:

It’s 2121. You steer your hoverchair towards the 360-degree hologram station and open Newswebsites.com. (You’ve created a new email account so you can get another free 30-day trial.) Your grandmother told you stories about the 2020 COVID-19 pandemic, stories that sparked your interest and lead you to research it for your dissertation. After all, 2020 is the reason you live on a Mars colony and drink melted ice caps instead of SmartWater.

You want to include diverse voices in your dissertation, but hundreds of thousands of articles clamor for your attention. Who should you listen to?

 Suppose you had a map to guide you from source to source, showing you where to focus your research. For a far-right, populist perspective, try Breitbart. If it’s all about the money, look at WSJ. The New York Times went up in flames, but a blend of CNN and The New Yorker will help you understand what many academics and East Coast professionals were reading.

Text-miners can build a LibGuide for the future archive by indexing topic frequency, key terms, and unique phrases for different media outlets. This bias map would facilitate insightful research that captures a wide array of controversial, complex voices.

Mining the Media I: Clues and Coded Language

Since the 2016 election, oxymorons like “alternative facts” and “fake news” have become part of the American vernacular. Social media, rife with echo chambers and ethically dubious algorithms, has accelerated the spread of misinformation. A record 60% of Americans do not trust the mass media. Concerned citizens like myself often feel overwhelmed by the task of parsing truth from fiction.

No message comes without bias. Once equipped to identify political agendas and predispositions, media consumers can perhaps get a little closer to the truth. (Cue the digital humanists.) Can researchers use text mining to detect media bias?

To answer this question, I downloaded all Twitter posts for five different new sources between October 27 – November 28. I chose to work with Twitter to avoid the complications of selecting and formatting thousands of online news articles. I used Vicinitas to download the Tweets in five Excel spreadsheets (one for each new source), converted each spreadsheet into a Word document, and uploaded the five documents to Voyant as a corpus.

Lines represent the volume of tweets from each source between October 27 – November 28.

CNN posts on Twitter most frequently, while Fox posts rarely. Many of Fox’s posts were just links to online articles or videos, further limiting the potential for text analysis. Ideally, future text mining would also include Facebook posts, online news articles, and/or headlines.

My methodology relies on two main assumptions:

1. Twitter posts represent what news outlets consider to be their most breaking, urgent, interesting, or “clickable” stories, thus making archived Tweets a reasonable corpus for analysis.

2. I assume the political leanings of the five media sources: CNN is a mainstream left news source, Fox is mainstream right, Vox is farther left, and Breitbart is far right. I will discuss Axios, an allegedly more objective and neutral news source, in the next post.

CNN, Fox, Vox, and Breitbart are household names, with political slants that are hardly subtle. I selected them for two main purposes: first, to measure the accuracy of my text mining process in determining bias and second, to identify the specific language and content that communicates bias. Perhaps CNN is left of center, but what makes it so? And how does Vox mark itself as farther left than CNN?

Below are three types of “clues” that may help text miners recognize and examine media bias.

1. Content

Four topics dominated all five sources: Donald Trump, Joe Biden, the 2020 presidential election, and COVID-19.  

Word cloud of frequent terms for entire corpus. I edited the stop words in Voyant to filter out pronouns, hyperlinks, and other words irrelevant to content analysis.

In context of the election, tweets about BLM and law enforcement become especially interesting. Between October 27 and November 28, neither CNN, Vox, nor Axios promoted journalism about Black Lives Matter or police brutality on Twitter. Fox and Breitbart, on the other hand, dedicated a significant portion of their monthly tweets to those subjects.

“Obama’s daughters joined summer protests against police brutality.”

Fox News, November 27, 2020

“Defund-police supporters tell Biden they’re ‘not going away.’”

Fox News, November 26, 2020

“BLM-NBA Woke Update: The Sacramento Kings fired an announcer who said, ‘all lives matter,’ only to replace him by hiring someone who claimed Donald Trump is a ‘white supremacist terrorist.'”

Breitbart News, November 18, 2020

“Antifa and BLM protestors took over roads and harassed drivers in a Portland suburb during a protest Saturday night.”

Breitbart News, November 15, 2020

Fox News tweets aimed to draw connections between the Democratic party and anti-police protests, while Breitbart overtly disparaged BLM. Breitbart also focuses on cultural manifestations of BLM in Hollywood, the NBA, and the NFL, rather than electoral politics.

This content analysis suggests a few key takeaways:

1. Fox News seemed to wield BLM and related protests as weapons against the Democratic Party during the 2020 election.

2. CNN had little interest in covering BLM, Defund the Police, or related topics during the 2020 election cycle.

3. Breitbart News preferred to connect BLM to specific local protests, cultural elites in Hollywood, and an allegedly predatory Antifa movement, rather than the 2020 election.

2. Vocabulary and word choice

The “Summary” function on Voyant lists terms unique to each document in the corpus. After viewing those terms, I used the “Context” and “Word tree” functions to better understand how each source employed their unique terms. For example, only Vox used the word “coup,” as in “Trump is attempting a coup in plain sight.” While CNN critiqued Donald Trump’s response to the election, it avoided the more extreme language of a “coup.”

All five sources discussed COVID-19, but they called the virus by different names. CNN and Vox tended to say “Covid-19,” while Fox and Breitbart were more likely to say “coronavirus.” Given that Fox is more conservative than Vox or CNN and Breitbart is at times overtly anti-mask, this difference in word choice could suggest a pattern. Perhaps “coronavirus” indicates a source that endorses more lenient regulations, while “COVID-19” elevates the language of the CDC and public health officials. This merits further investigation.

3. Insider-outsider language

This is a simple way to identify sources that lean far to the right or left. Major news outlets are less likely to critique “mainstream media.” Text analysis shows an extremely high rate of use for the words “radical” and “establishment” from Breitbart News. Vox uses “radical” to call for major political reforms, while “radical” mainly appears on Fox Twitter in direct quotes from Donald Trump. CNN avoids using all three words, suggesting an aversion to language that separates and alienates. Breitbart, a source that predicates its existence on mistrust of mainstream media, relies heavily on divisive language.

In the following post, I will discuss the potential application of these media-mining clues.