Mining the Media II: Have We Actually Learned Anything?

Continued from previous post.

If my analysis has only served to illustrate media bias, text mining is not worth the trouble. The process of locating, saving, cleaning, uploading, and analyzing a substantial corpus is time-consuming and often tedious. While my experiment with Voyant is only a starting point, I believe that text mining digital media sources on a larger scale can be useful to both researchers and the general public.

Below, I outline four uses for text-mining online news content.

1. Evaluating new media sources

I included Axios in my analysis partially because it claims to be neutral, but also because it is fairly obscure and most people – myself included – do not have preconceptions about its political leaning. Axios’ tweets do relay a nuanced narrative about topics like defunding the police and the 2020 election, one that is often critical of both political parties. More than other news sources, Axios emphasizes technology and business news on Twitter. These tweets include events and interviews with CEOs of organizations like Verizon, Bank of America, and Qualcomm. In fact, Verizon and Qualcomm were among Axios’ unique frequent terms. Despite tweeting far less often, Axios tweeted about the economy almost as many times as CNN during the same 1-month period.

This focus on technology and economy, while not inherently political, suggests that Axios appeals to an audience that is deeply invested in business, finance, and innovation. Such an interest may affect how Axios covers other topics, including domestic and international social unrest. Text analysis allows us to read Axios with a more critical eye, going beyond the claims of nonpartisan neutrality.

2. Uncovering the unexpected

My approach to mining the media has many limitations, not least because I started with presuppositions about the biases in my corpus. Despite this methodological flaw, some of my findings still surprised me. For example, I assumed that CNN Twitter would capitalize on race and immigration during the 2020 election. Instead, I found that all five sources talked about racism at about the same frequency, and CNN mostly avoided the topics of immigration, BLM, and police brutality.

3. Detecting unique language and loaded terms

Many people, including Vice-President elect Kamala Harris, have used the phrase “dogwhistling” this year. Currently, “dogwhistling” often describes how white supremacist groups communicate publicly via seemingly innocuous words and phrases that only other white supremacists will recognize. The term, however, can apply to any use of coded insider language. Text mining may help researchers to identify different groups’ dogwhistles and make this information available to the public. For example, does saying “coronavirus” instead of “COVID” suggest a more laissaz-faire attitude to preventing spread? Does using the phrase “big tech” instead of just “tech companies” suggest a critical or even conspiratorial attitude towards tech corporations? Here, text mining can help us read between the lines.

4. Indexing the future digital archive

If researchers carefully mine the media to document trends in content and language, future historians will be better able to interpret our digital record. Picture this:

It’s 2121. You steer your hoverchair towards the 360-degree hologram station and open (You’ve created a new email account so you can get another free 30-day trial.) Your grandmother told you stories about the 2020 COVID-19 pandemic, stories that sparked your interest and lead you to research it for your dissertation. After all, 2020 is the reason you live on a Mars colony and drink melted ice caps instead of SmartWater.

You want to include diverse voices in your dissertation, but hundreds of thousands of articles clamor for your attention. Who should you listen to?

 Suppose you had a map to guide you from source to source, showing you where to focus your research. For a far-right, populist perspective, try Breitbart. If it’s all about the money, look at WSJ. The New York Times went up in flames, but a blend of CNN and The New Yorker will help you understand what many academics and East Coast professionals were reading.

Text-miners can build a LibGuide for the future archive by indexing topic frequency, key terms, and unique phrases for different media outlets. This bias map would facilitate insightful research that captures a wide array of controversial, complex voices.

Mining the Media I: Clues and Coded Language

Since the 2016 election, oxymorons like “alternative facts” and “fake news” have become part of the American vernacular. Social media, rife with echo chambers and ethically dubious algorithms, has accelerated the spread of misinformation. A record 60% of Americans do not trust the mass media. Concerned citizens like myself often feel overwhelmed by the task of parsing truth from fiction.

No message comes without bias. Once equipped to identify political agendas and predispositions, media consumers can perhaps get a little closer to the truth. (Cue the digital humanists.) Can researchers use text mining to detect media bias?

To answer this question, I downloaded all Twitter posts for five different new sources between October 27 – November 28. I chose to work with Twitter to avoid the complications of selecting and formatting thousands of online news articles. I used Vicinitas to download the Tweets in five Excel spreadsheets (one for each new source), converted each spreadsheet into a Word document, and uploaded the five documents to Voyant as a corpus.

Lines represent the volume of tweets from each source between October 27 – November 28.

CNN posts on Twitter most frequently, while Fox posts rarely. Many of Fox’s posts were just links to online articles or videos, further limiting the potential for text analysis. Ideally, future text mining would also include Facebook posts, online news articles, and/or headlines.

My methodology relies on two main assumptions:

1. Twitter posts represent what news outlets consider to be their most breaking, urgent, interesting, or “clickable” stories, thus making archived Tweets a reasonable corpus for analysis.

2. I assume the political leanings of the five media sources: CNN is a mainstream left news source, Fox is mainstream right, Vox is farther left, and Breitbart is far right. I will discuss Axios, an allegedly more objective and neutral news source, in the next post.

CNN, Fox, Vox, and Breitbart are household names, with political slants that are hardly subtle. I selected them for two main purposes: first, to measure the accuracy of my text mining process in determining bias and second, to identify the specific language and content that communicates bias. Perhaps CNN is left of center, but what makes it so? And how does Vox mark itself as farther left than CNN?

Below are three types of “clues” that may help text miners recognize and examine media bias.

1. Content

Four topics dominated all five sources: Donald Trump, Joe Biden, the 2020 presidential election, and COVID-19.  

Word cloud of frequent terms for entire corpus. I edited the stop words in Voyant to filter out pronouns, hyperlinks, and other words irrelevant to content analysis.

In context of the election, tweets about BLM and law enforcement become especially interesting. Between October 27 and November 28, neither CNN, Vox, nor Axios promoted journalism about Black Lives Matter or police brutality on Twitter. Fox and Breitbart, on the other hand, dedicated a significant portion of their monthly tweets to those subjects.

“Obama’s daughters joined summer protests against police brutality.”

Fox News, November 27, 2020

“Defund-police supporters tell Biden they’re ‘not going away.’”

Fox News, November 26, 2020

“BLM-NBA Woke Update: The Sacramento Kings fired an announcer who said, ‘all lives matter,’ only to replace him by hiring someone who claimed Donald Trump is a ‘white supremacist terrorist.'”

Breitbart News, November 18, 2020

“Antifa and BLM protestors took over roads and harassed drivers in a Portland suburb during a protest Saturday night.”

Breitbart News, November 15, 2020

Fox News tweets aimed to draw connections between the Democratic party and anti-police protests, while Breitbart overtly disparaged BLM. Breitbart also focuses on cultural manifestations of BLM in Hollywood, the NBA, and the NFL, rather than electoral politics.

This content analysis suggests a few key takeaways:

1. Fox News seemed to wield BLM and related protests as weapons against the Democratic Party during the 2020 election.

2. CNN had little interest in covering BLM, Defund the Police, or related topics during the 2020 election cycle.

3. Breitbart News preferred to connect BLM to specific local protests, cultural elites in Hollywood, and an allegedly predatory Antifa movement, rather than the 2020 election.

2. Vocabulary and word choice

The “Summary” function on Voyant lists terms unique to each document in the corpus. After viewing those terms, I used the “Context” and “Word tree” functions to better understand how each source employed their unique terms. For example, only Vox used the word “coup,” as in “Trump is attempting a coup in plain sight.” While CNN critiqued Donald Trump’s response to the election, it avoided the more extreme language of a “coup.”

All five sources discussed COVID-19, but they called the virus by different names. CNN and Vox tended to say “Covid-19,” while Fox and Breitbart were more likely to say “coronavirus.” Given that Fox is more conservative than Vox or CNN and Breitbart is at times overtly anti-mask, this difference in word choice could suggest a pattern. Perhaps “coronavirus” indicates a source that endorses more lenient regulations, while “COVID-19” elevates the language of the CDC and public health officials. This merits further investigation.

3. Insider-outsider language

This is a simple way to identify sources that lean far to the right or left. Major news outlets are less likely to critique “mainstream media.” Text analysis shows an extremely high rate of use for the words “radical” and “establishment” from Breitbart News. Vox uses “radical” to call for major political reforms, while “radical” mainly appears on Fox Twitter in direct quotes from Donald Trump. CNN avoids using all three words, suggesting an aversion to language that separates and alienates. Breitbart, a source that predicates its existence on mistrust of mainstream media, relies heavily on divisive language.

In the following post, I will discuss the potential application of these media-mining clues.

Message in a Bottle

Connecting educators and students to online museum resources during COVID-19

Museums are often the first institutions voted off the funding island when money gets tight. Many of us have fond memories of grade-school field trips (mine is riding a tricycle across a wire at the Witte Museum to learn about the physics of balance) but, during a recession, outings like these vanish from school budgets. Between 2008-2011, over 40% of school administrators reported eliminating field trips. As of 2015, only 12% of schools had returned to their pre-recession field trip levels. The COVID-19 pandemic has not only provoked a new recession; it has also temporarily suspended the physical operation of both schools and museums.

Recession and pandemic aside, access to museums has never been equal for all students. Museum trips can be time-consuming and expensive for underfunded school districts. One museum educator conducted a regional study of the three countries surrounding the Wellin Museum of Art in Clinton, NY to learn why local teachers, who overwhelmingly agreed that museums were valuable institutions, did not actually take their students on any museum field trips. Surveys showed three main obstacles: lack of time, lack of funds, and difficulty justifying a museum trip to their supervising administrator. Even with a willing administration, distance alone makes museum visits impossible for schools in some rural areas.

Research suggests that museums help school-age children to not only learn information, but also to develop critical thinking skills and empathy. These effects are even stronger for students in rural and low-income districts. Because educators have long recognized the impact museums can have on students’ intellectual and personal development, they have developed creative strategies to overcome obstacles like budget and distance. Some museums, like the Boston Museum of Science, have developed programs in which museum educators visit classrooms to teach students about topics like paleontology and space through hands-on activities. In the age of COVID, however, these kinds of visits are impossible.

Physical distancing requirements have demanded unprecedented creativity from both classroom and museum educators and, in the process, produced new opportunities for collaboration that can and should outlast the pandemic. Take, for example, the Carnegie Museum System in Western Pennsylvania, composed of four separate institutions: Carnegie Museum of Art, Carnegie Science Museum, Carnegie Museum of Natural Science, and the Andy Warhol Museum. The Carnegie Museum system has long offered resources for schools and educators, but has added new resources for online classrooms since the onset of COVID. Given that many school districts in the U.S. are still at least partially online, many parents also find themselves in the role of “educator.”

Physical distancing requirements have demanded unprecedented creativity from both classroom and museum educators and, in the process, produced new opportunities for collaboration that can and should outlast the pandemic.

            Carnegie Museum educational resources include an in-home learning page for all four museums, as well as online classroom materials from the Carnegie Science Center and the Museum of Natural History. Resources for in-home learning focus mainly on kids. These activities vary in topic and are meant to fill time, rather than follow a curriculum or lesson plan. For instance: Every day at 8:00am, the Carnegie Science Center posts something to read, something to watch, and something to do on all its social media platforms. [insert screenshots from Twitter, Instagram, and Facebook] In contrast, the Online Educator Resources are structured as lesson plans and activities for grades K-2, 3-5, and 6-12. Many of these “lesson plans” include activities and videos from the Read, Watch, Do series.

Many of the online resources generated by the Carnegie Museum system could be better integrated into curriculum-type structures to make them accessible and useful for classroom educators. For example, the Ask a Scientist Series from the Carnegie Museum of Natural History (which went viral on TikTok this summer) contains 37 informative videos and counting. Currently, the video gallery is filed under “Research>Science Videos” on the website menu and consist of four pages with about nine video tiles each. This layout lends itself to the purposes of curious individual visitors. The videos are not labeled or structured for classroom use. Incorporating materials like these videos into online lesson plans and tagging them as such would enhance the Carnegie Museum System’s digital classroom offerings.

Museums have to work to make their resources accessible and useful to schools; simply producing resources is not enough. When staff from Montana’s Museum of the Rockies wanted to reach students in remote areas, they collaborated with teachers and educators from around the state to create a rich 181-page dinosaur curriculum packed with lesson themes, standards, images, and activities. This curriculum came from a larger survey project that asked over 400 educators across Montana about how the Museum of the Rockies could meet their needs. Because the museum designed this curriculum in thoughtful cooperation with teachers, it has now reached 9,067 educators in 32 counties. While these numbers do not show how many educators actually used the lesson plans in their classrooms, they do demonstrate that an exhibit-based curriculum traveled to thousands of teachers in a traceable, verifiable way.

Too many museums are sending out COVID-era digital resources like messages in a bottle, not knowing who they will reach or how they will be used. Digital materials can help democratize school-museum relationships in an exciting way, but they will be most effective when designed with and for actual educators. Careful rather than haphazard collaboration will help museums stay relevant during a global pandemic.

Works Cited

Geary, Amber. “What Makes K-12 Public School Educators Choose to Use a Museum as Part of Their Curriculum?” American Alliance of Museums, May 6, 2019.

Greene, Jay P., Brian Kisida, and Daniel H. Bowen. “The Educational Value of Field Trips.” Education Next, August 21, 2020.

Horsley, Scott. “It’s Official: U.S. Economy Is In A Recession.” NPR. NPR, June 8, 2020.

Lewin, Tamar. “Museums Take Their Lessons to the Schools.” The New York Times. The New York Times, April 22, 2010.  

“Map: Where Schools Are Reopening in the US.” CNN. Cable News Network. Accessed October 5, 2020.

“MOR School Outreach Participation.” Museum of the Rockies inspires lifelong learning in science, history, culture, and art; and presents engaging, vibrant exhibits, and programming. Accessed October 5, 2020.

Patterson, Emily. “How One Montana Museum Doubled Field Trip Attendance.” Institute of Museum and Library Services, February 4, 2019.

Reeves, Richard V., and Edward Rodrigue. “Fewer Field Trips Mean Some Students Miss More than a Day at the Museum.” Social Mobility Memos. Brookings, August 2, 2016.