Media Cloud is an open-source platform for media analysis.

Media Cloud is a consortium research project across multiple institutions, including the University of Massachusetts Amherst, Northeastern University, and the Berkman Klein Center for Internet & Society at Harvard University.

Why Media Cloud?

Researchers have long studied the production and dissemination of news to understand its impact on democracy, beliefs, and behaviors. While the digitization of news has broadly made many sorts of research newly possible, the volume of content and newer trends toward proprietary platforms have significantly increased the barriers for researchers hoping to build on the scholarly traditions of studying attention, representation, influence, and language in online news.

In addition, the democratization of content authoring has significantly expanded the number of different news sources reporting about any topic or geographic area. The process of discovering which online media sources to study has become more involved; how does one identify the set of media outlets that exist and decide which are influential? Furthermore, once a set of media sources has been created, collecting their content introduces many technological difficulties.

About Media Cloud

Media Cloud is an open source data corpus and suite of web-based analytic tools that support research into open web media ecosystems across the globe. Since 2008, Media Cloud has collected a total of over 1.7 billion stories. These data are available to the public via an open source code base available on GitHub, a suite of free web tools, and an extensive open API. Media Cloud is limited by copyright restrictions from sharing the full text of news articles with external partners. Instead, we surface metadata, data about text content of the documents and URLs of documents indexed.

Researchers have used Media Cloud to study:

  • Mapping national political discourse
  • Online mis/dis-information
  • Media self-analysis
  • Human rights and social justice media impact
  • Media-based art and advocacy.
lightbulb_outline View ideas list


  • javascript
  • python
  • react
  • docker
  • postgresql


  • Other
  • media
  • news-media
  • media-analytics
  • civic-tech
  • research
mail_outline Contact email

Media Cloud 2021 Projects