What is Media Cloud?

Media Cloud is an open source and open data platform for storing, retrieving, visualizing, and analyzing online news.

What type of data does Media Cloud collect?

The bulk of the data that we collect are news stories from media sites around the web. In order to allow for insightful analyses of media ecosystems we also optionally collect data such as hyperlinks, author bylines, and publication metadata.

How does Media Cloud get the data?

Media Cloud collects most of its content through the RSS feeds of the media sources we follow. We only have data for a source from the time we started scraping its RSS feeds. Additionally, we complement the RSS data with sitemap data.

What languages does Media Cloud support?

Media Cloud supports searching for content in approximately 20 different languages. See the list of languages we currently support. Supported search means that we have stopword libraries in effect for those languages.

What tools exist so that I can explore this data?

At this moment, we support two main tools.

  • Search is the tool that allows you to search our database of global news, visualize the results of your search, and download a CSV file with the urls of the stories in our database that match your query.
  • Directory is the tool with which to explore the different sources and media collections from which we collect data, and add new ones.
How do I get data?

Our tools are designed to visualize in different ways all the data we have, but also to allow you to download and transfer it to other tools. The analysis widgets that we display, such as attention over time, total attention, top words, etc, have download options to obtain a CSV of the findings data.

What data can I have access to?

We are committed to sharing as much data as we possibly can, so you can access all the data that we have and download it to your own computer. Due to copyright restrictions we cannot release the actual text of a story.

Can I download the content of the stories?

Due to copyright restrictions we cannot provide the actual news content, but we can give you a complete list of urls so you can check the content yourself.

Can I add sources to the database?

If a source or a set of sources is not already part of our database, you can suggest its addition through the Directory tool, and we will carefully consider your suggestion. Our first inclination is to say yes to suggestions.

Can you do media research for me?

Media Cloud has a nonprofit research arm, the Media Ecosystems Analysis Group (MEAG). Please contact info@mediaecosystems.org to discuss partnering on your research project.

What is the quota limit for Media Cloud Search? Is there rate-limiting on API requests?

There is a quota of 4000 requests per week. Certain API endpoints are rate-limited to 2 requests per minute.

How do I cite Media Cloud?

Citation: Roberts, H., Bhargava, R., Valiukas, L., Jen, D., Malik, M. M., Bishop, C. S., Ndulue, E. B., Dave, A., Clark, J., Etling, B., Faris, R., Shah, A., Rubinovitz, J., Hope, A., D’Ignazio, C., Bermejo, F., Benkler, Y., & Zuckerman, E. (2021). Media Cloud: Massive Open Source Collection of Global News on the Open Web. Proceedings of the International AAAI Conference on Web and Social Media, 15(1), 1034-1045. https://doi.org/10.1609/icwsm.v15i1.18127

Still have questions?

Send us an email at support@mediacloud.org or fill out our support form.