Media Cloud

Getting Started Guide

Getting Started Guide

This page provides guide information for doing basic research with Media Cloud using our Explorer and Source Manager tools. The Topic Mapper tool documentation will be added in future. For a more thorough and downloadable version of this guide, download the Getting Started with Media Cloud PDF

TABLE OF CONTENTS


General search approach

image4.png

1. Register for a Media Cloud account.

On the landing page at explorer.mediacloud.org, scroll down to the registration/ log in box, and click REGISTER NOW. Registering allows you to save your searches and other work. After submitting the registration form, you’ll be e-mailed an activation link. Click the link to finish your registration.

 

2. Enter your search terms into the search box on the Explorer landing page, and click Search. 

From the Explorer landing page at explorer.mediacloud.org, enter your search terms into the search box, then click Search.

image7.png

The page will refresh and take you into the Explorer tool, which shows a query pane above an analysis pane. The query pane, shown below, displays your search term (in this case, "sundance,"), a default media collection of U.S. Top Online News 2017 (top news websites as of August 2017 according to comScore, Activate, and Alexa), and a default date range of one month from the date of searching.

image6.png

3. Modify the query parameters of media and dates as appropriate. 

Change or add media

If your research question is not based in the United States, or is known mostly on a local or regional level, click the Add Media button to add new source media and edit your source media. (For help on selecting the right sources for your search, see the section on Determining the right sources.) After you click the "+ Add Media" button a pop-up will come up. You can then: 

  • Choose a listed Featured source collection, if appropriate, by clicking the + button to the right of the description; or
  • Search for a collection or sources by:
    • Choosing a filter for your source search (Geographic, All, or Individual Sources) on the left, then
    • Entering a keyword for the collection or source name on the right; and
    • Clicking Search. When you find a source or collection you want to add, click the + button to the right of the description.
  • You can edit your selections by clicking the X button to remove unwanted selections in the lower left corner of the dialog window.
  • Click the OK button to add your source selections to your query.

After you click OK, the popup will close and you will be back on the query page. You will see your newly selected or added media sources listed in the query pane. If you want to remove any of the media sources you added, you can click the X button to remove them.

Change dates

If you’re interested in a longer or shorter timeframe, you can edit the default date range under For dates. Media Cloud can search as far back as 2008, though for the most accurate and comprehensive results, it is recommended to use a timeframe no farther back than one year from present. 

4. Click the Search button to run the search again with your edited source media and dates.

If you want to, you can then click the Save Search button to save your search. You’ll then be able to use the Load Saved Search button to load this search again any time.

5. Analyze your results.

Scroll down past the query pane to the analysis pane, and view the Attention, Language, and People & Places tabs to view the three categories of outputs. More information about using the data visualization for analysis are available in the online help by clicking "Learn more" under each section’s introduction on the left, and are also available in the
Analyzing your results section of this guide.  

image8.png

Question and answer approach

The more interesting analyses in Media Cloud tend to be ones conducted to investigate and answer specific news media questions. The question and answer approach also helps you determine what to focus on in the many slices of analysis available in your Media Cloud results. To determine if this approach is right for you, it first helps to get an idea of the types of questions Media Cloud can help answer. There are two basic types of research questions Media Cloud can help answer: comparative questions, and questions focused on one aspect of analysis.

Types of research questions

Comparative questions  

Comparative questions are probably the most common type of research question: investigating news coverage using two or more different points of comparison, such as different countries, different timeframes, different entities (or subjects), different U.S. partisan readership sources, or different subtopics. Some examples:

  • Media Cloud question: How does media “hype” from the last two years compare with the truth and reality around artificial intelligence in the U.S.?
    • To answer this: You might run two queries: one using the U.S. Top Online News collection, which searches mainstream, “general interest” articles and op-eds, most likely to include emotionally provocative (and sometimes deceptive) content, and one query on scientifically-focused publications.
  • Media Cloud question: Which kinds of issues, angles and slants related to my topic has the target audience for my documentary been exposed to?
    • To answer this: You might run separate queries for your topic on different media types, such as traditional print publication versus digital-native publications, or source collections by communities, if applicable (for example, African American, Tech blogs, Parenting blogs). Similarly, looking at coverage in ideologically diverse media such as Left versus Right partisan collections in the U.S. can help you compare differences in agenda and framing driven by varying audiences. You can often get more information on audience data for different publications from resources outside Media Cloud.
  • Media Cloud question: How does the coverage of climate change compare between conservative media in the U.S. and the rest of the world?
    • To answer this: You might write several queries with different keyword aspects of climate change, each run using a different media source as the collection being searched into: one run them using the U.S. conservative U.S. partisan-affiliated collections, and others run identical ones using different major media sources around the world. It might be interesting to compare and contrast media in nations affected most by recent climate-related disasters versus ones who haven’t, or developing vs. developed nations.

Questions focused on one aspect of analysis

Other kinds of questions focus on one aspect of analysis offered in Media Cloud, such as Representation. Some examples:

  • Media Cloud question: Which people, organizations, and places are being covered most often in news media about poverty?
    • To answer this: You can run one query using the keyword “poverty” (possibly coupled with other search terms, which you’ll learn about in the section on How to structure your query) and focus on the results in the Representation tab.

Forming your own research question USING A RESEARCH WORKSHEET

The examples above may have given you an idea for a research question, or you may still need help formulating one that you can answer with Media Cloud. Either way, completing a research worksheet, as described in the next section, can help you finalize a question to answer in Media Cloud, and turn that question into a query that will help you get the best results. Writing good search queries, selecting the right sources, and knowing what to look for in the outputs are the most challenging aspects of using Media Cloud. The research worksheet can help. You’ll be able to get much more meaningful outputs when you put a bit of preparation into your search beforehand, by brainstorming and listing details and key terms on various aspects of the topic you’re interested in. These details will also help you create or finalize a research question, and translate that question into a query you’ll enter in Media Cloud.

  1. Open the research worksheet. You’ll see an example filled in for you in the second column.
  2. Start filling out the worksheet at the top of the third column by naming your topic of interest.
  3. Type in a question in the next row down, if you have one in mind. It can be broad or vague, or you can skip this step if you’re not sure of a point of focus yet.
  4. Follow the instructions in the left column of the worksheet for each row. In row 10, you’ll probably want to browse Source Manager for interesting sources and source collections that you’ll want to select for your query. See the section on Determining the right sources for details.
  5. Use the brainstorming you did in rows 4-10 to finalize the research question you’d like to answer, and type it in row 11. Questions consist of some combination of people, places, locations, time ranges, media types, readership types (such as U.S. partisan-focused readerships), events, or responses. Read through the question and answer examples and combine different elements of what you’ve brainstormed. In row 12, you’ll translate this question into a query search string that you’ll enter into Media Cloud. Note that part of your query inputs will include date ranges (row 8) and sources (row 10). Row 12 refers to the query search string only, which you’ll enter in the query text box in Explorer. (Important: see the section on Writing your search string for instructions on how to form your query.)
  6. Set up and run your query in Media Cloud. Then, proceed with analyzing your results and refining your query as needed.
  7. (Optional) After you’ve answered your research question using Media Cloud, you can revisit this worksheet to type it in row 13, and add supporting evidence and screenshots of the visualizations in row 14.

Writing your query

You probably have heard of Boolean logic, which defines connectors like AND, OR, and NOT to focus a search on retrieving the results you really want. Crafting the right query using Boolean standards is critical in Media Cloud, and to get the best results, you’ll probably have to try different queries with different combinations of keywords using Boolean logic. You’re looking to be as broad in your search as possible without picking up too much “noise”—that is, unrelated content. The following is a guide to Boolean logic query terms. Example queries are provided and written in [brackets]. The bracket symbol is not part of the query itself and should not be used in your queries.

Boolean Connectors

  • Search for stories that match any of a list of keywords by using OR

This is the default connector for queries. That is, if you enter a list of words without a Boolean connector, or a list of words connected by OR, the query will retrieve stories that contain any of the words in your list. Example: [Donald Trump] will return the same results as [Donald OR Trump]. These results will include any stories that match the word “Donald” or the word “Trump.” For example, you may get a story about Donald Duck, or about Ivanka Trump, that does not mention Donald Trump.

  • Search for stories that match all of your listed keywords by using AND

Using AND to connect terms in your query will allow you to find stories in which all the terms appear. This is particularly useful when the terms you are looking for do not always appear in the same order. If the terms do appear in a certain order, quotation marks should be used. If you want to find stories in which two terms are used close to each other, you can use a proximity search, detailed in the next section. Example: Searching for [child AND marriage] will return all stories that match the word “child” and the word “marriage,” which is likely to cover a broad range of subtopics (e.g., news of famous people having children, articles about contraception, lifestyle editorials). If you want to research the issue of child marriage specifically, you should search for [“child marriage”], to return only stories that have 'marriage' immediately following "child".

  • Search for stories that exclude your keyword(s) by using NOT.

In order to refine a search, it is often useful to eliminate from your query results those stories in which a certain term appears. When you use NOT between your terms, the tool will return stories that match the first term but do not include the second term.  You can also use a minus sign to negate a given phrase as an alternative to NOT. Example: If you are interested in finding information about the Zika virus, but not about its incidence in Brazil, you can search for [zika NOT Brazil] or [zika -Brazil].

Other CHARACTERS USED IN SEARCH

  • Search for stories that include an exact phrase by using "" (QUOTATIONS)

When searching for a phrase (a series of words that always appear in the same order) it is necessary to use quotation marks. If you cut and paste a query that uses quotation marks from a program like Word, Media Cloud will not understand them. We recommend directly typing your query into the search, or pasting from a text editor like Notepad. Example: To search about the topic of climate change, search [“climate change”] to find stories that match that phrase (i.e., match the words in that order, without any words in between them). Otherwise, if you only search [climate change], you will get stories that contain the word climate or the word change anywhere in the story.

  • Search for different forms of a stem word by using * or ? (ASTERISK OR QUESTION MARK)

If you are interested in searching for all the different forms of a word, you should use the wildcard symbol *. This will return stories that match any word beginning with the stem keyword you searched. If you want to search for only one wildcard character, you can use the ? character to represent any single character. Note that you can only use the wildcard symbol at the end of a word, not the beginning (i.e., you can search key* but not *key). Example: Searching for [key*] will retrieve stories that contain a word of any length that begins with “key,” such as key, keys, keyboard, keystone, keynote, keynesian, keywords, etc. Searching for [key?] will return stories that match only 4-letter words that begin with “key,” such as keys.  

  • Run a complex query by using () (PARENTHESES)

If your query is somewhat complex you will probably need to use parentheses to structure it and nest search terms. All of the above rules and the Boolean connectors still apply within the parentheses. Example: A query such as [(“illegal immigration”) AND (politics OR economy OR campaign)] will retrieve any story containing the phrase “illegal immigration” AND any of the other three terms.

CHARACTERS THAT DO NOT WORK IN SEARCH

Media Cloud does not support searching any other punctuation other than those listed above. Any other punctuation, such as the @ symbol or the # symbol, will not be recognized by the search feature. Queries are not case sensitive. Hyphens are not recognized by the system. If you include a hyphen in your query, Media Cloud will convert the hyphen into a blank space and treat the term’s words as separate. You should consider though that searching for a hyphenated word between quotation marks will retrieve that hyphenated word but also the consecutive occurrences of the two words. Example: If you want to search on the term “well-being,” you should use the query [“well-being”], which will retrieve stories that have the term “well-being” or consecutive use of the terms “well” and “being.” If you were to instead search [well AND being], you would get stories that had the words “well” and “being” anywhere in the story.

SEARCHING IN ANOTHER LANGUAGE

Media Cloud supports searching for stories by the primary language in which each story was written. To run a search query for stories written in a specific language, type your term and then use the Boolean connector AND to add the language search tag of “language:” followed by the two-character code for the language you want to search. Example: The query [queso AND language:es] will retrieve stories that contain the word “queso” that have also been detected by our system as being written in Spanish.

SEARCHING FOR HEADLINES

You can search only in the headline or title of an article by using the “title:” search tag. Example: To find articles that have “stem cell” or “stem cells” in the title, use the query [title:(“stem cell*”)].

PROXIMITY SEARCH

A normal search in Media Cloud will return stories that match your query. If you want to search at a more narrow level and find only sentences that match your query, we recommend using a proximity search. Enter your keywords in quotation marks and then include a tilde ~, followed by the number of words you want to limit your search to in terms of proximity. For a typical sentence search, we recommend using the number 10. Learn more about proximity searches. Please note that we do support searching for different forms of a word in proximity search through the use of the wildcard symbol *. Example: To find stories that contain the words “Trump” and a reference to the Republican party in the same sentence (which could be “Republican,” or “Republicans”), use the query [“Trump republican*”~10]. This will return stories in which Trump and any word of any length beginning with “republican” are within 10 words of each other.

FREQUENCY SEARCH

If you want to find stories that mention a certain keyword multiple times, you can search for frequency by hacking the proximity search method. Simply enter your keyword in quotation marks, repeating the keyword as many times as you want the story to contain the keyword. Close the quotation marks and then include a tilde ~, followed by the number 1000; this is the word length and should cover most news stories. Example: To find stories that mention children multiple times, use the query ["children children children"~1000]. 

Advanced Search Options

Media Cloud uses a Solr cluster for its text searching.  For a full description of the Solr search syntax see the full Solr query documentation.  In addition to the features described above, we index the following fields in the index, any of which can be searched using the 'field:' syntax: 

  • stories_id
  • processed_stories_id
  • text, title
  • publish_date
  • publish_day
  • publish_week
  • publish_month
  • publish_year
  • tags_id_stories
  • tags_id_media_id
  • timespans_id

Example: To find stories published in 2016 in the overall timespan of the 2016 U.S. Presidential Election topic by the New York Times that mention Trump or Clinton and immigration within 10 words of each other, you can search [publish_year:2016 and timespans_id:74322 and media_id:1 and "trump immigra*"~10 or "clinton immigra*"~10]


Using a comparative query

If you’ve set up a general search query in Explorer, you know the basics of query crafting. But if you have a comparative question you’re trying to answer, you’ll want to set up two or more comparative queries that run simultaneously, so you can compare and contrast the outputs easily. From the query pane, use the Add query function to run multiple queries at the same time. The screenshot below shows two queries that have been set up, plus the Add query button.

image10.png

When you set up and run multiple queries this way, you’ll get one analysis below the query pane, with interesting compare-and-contrast features, such as the one shown below (in the Language tab). It displays the top words found from one query on the left, the other on the right, and top words common to both queries in the middle.

image11.png

Determining the right sources

Being confident that your search query is broad enough without including too much “noise” (unrelated aspects of a broad topic) will get you the best results in Media Cloud’s Explorer. You accomplish this partly by writing a good query, and partly by selecting the right sources. On your research worksheet, you may have brainstormed events that take place in locations whose media you’d like to research. You may also like to compare media conversations by communities or special interest (such as Indigenous, parenting, U.S. partisanship). Use the brainstorming you’ve done on your worksheet to narrow your sources selection by thinking about considerations around dates covered, countries of publication, countries of readership, state and local perspectives, and community perspectives, or media type (such as digital native or print).

Source Manager lets you browse Media Cloud’s full database of sources and collections, which are groups of sources. To browse sources:

image12.jpg

1. Open Source Manager by clicking its name in the menu bar at the top of the Explorer application window.

2. Click the triangle next to the Source Manager menu to reveal menu options:

  • To browse collections of sources by country, use the Browse Geographic Collections menu option. These source collections are based on country of publication.
  • To browse specially curated collections, based on topics, communities, blogs, and more, use the Browse Other Collections menu option. Note that popular, curated U.S. media collections are listed in the Other Collections section; these are based on U.S. readership and include sources published in countries outside the U.S.  
  • If you’re looking for sources in languages other than English, sources of a certain media type, or are interested in other advanced search parameters, use the Search menu option.

After you’ve completed any of the step options above, click the name of any listed source or collection to review its details (such as sources included, date ranges, top words). Then, if you want to save the source or collection to your own list of Starred Sources and Collections, click the star next to the name.  If you’re still not finding what you want, click Suggest a Source to request the addition of a source to the Media Cloud database.  

It’s better to select multiple collections for your search if you have doubts about being comprehensive enough with your sources. However, including collections that don’t have enough relevance could introduce “noise” into your findings. So selecting sources is a balancing act. Major national collections are usually the best start for your search. For any other collection, make sure you click the name of the collection to open that collection’s page and read about the kind of sources and content you’ll be searching.  


Analyzing your results

After your results load, use Media Cloud’s categories of analysis to start exploring ways to interpret your results.

Attention tab

The Attention tab will be selected by default when your query finishes running, displaying results for the Attention analysis category.

Attention over time

image13.png

Compare key event dates with amount of coverage. Do you see the peaks you might expect in coverage around these dates? Try clicking a valley or peak before or after your key dates. Wait for new media details about that date to load below the graph upon your click (scroll down a bit to see these). Browse the stories and the word cloud, which shows the most frequently used words. What was the media saying before and after your key dates (whether they have corresponding peaks or not)?  If you see a peak that’s not related to a key event, what else could be driving coverage? Again, to answer this, click the peak to browse stories and frequently used words to see if you can find clues in the ways the narrative was shaping, or discover other events that weren’t an obvious focus for your topic. If you don’t get peaks or the level of peaks you might expect, it might be time to go back and brainstorm other key events that you may not be capturing—or simply widen your date range, if you’re unsure of key event dates related to your topic.  If you’re getting story samples that don’t seem to fit with your topic, try to think of ways to exclude that kind of content in your query using Boolean logic. You may also need to narrow your topic. If you’ve run multiple queries to answer a comparative question, you’ll be able to view Attention Over Time peak lines with one another all in one graph, to compare and contrast amount of coverage in your defined time ranges.

Total attention

Total Attention tells you the total number of stories matching your query in the sources and date range you’ve defined. View these numbers by mousing over the circles. They give you an idea of how important your topic was to the media in general during the timeframe you specified. Don’t be discouraged by lack of stories and data; often, this is a finding in itself. If you’ve run multiple queries to answer a comparative question, you can compare Total Attention (total number of stories) visually for each query by looking at the size of the circles, and then reveal the exact number of stories for each by mousing over the circles.

Top themes

The results in this section are based on an automated categorical classifier based on a large sampling of stories in the New York Times (which is considered a broad resource). It auto-detects “themes,” or categories of similar content that exist in the group of stories you’ve found. Mouse over the circles to view the percentage of stories you’ve found with that detected theme. These can be helpful when an unexpected, or perhaps, seemingly unrelated theme is detected. In this case, you could infer that there was a significant amount of attention paid to something that was slightly off-topic, which may warrant further investigation of the stories. It may also be helpful to view themes you may have expected that weren’t detected. You can browse the full list of themes here.

Sample stories

This section lists a random sampling of the stories for your query, giving you easy access to browsing stories right from the screen. If you want full access to all the stories that have been found in Media Cloud for your query, use the Download Options function—you’ll be able to view a CSV file with a URL for each story. You may find that browsing the headlines of the full set of stories in the CSV gives you enough clues to answer your research question. But often you’ll want to read some of the text of the story itself.

Language Tab

Under the Language tab, you’ll find results for Top Words, Word Space, and a comparison of Top Words if you’ve completed comparative queries.

Top Words

Browse the word cloud for unexpected terms, or terms that might reveal more clues to help answer your question. Are there words that seem to express conflicting biases or slants? Often, more “contentious” angles reveal themselves in the word clouds. Clicking a word in the word cloud adds it to your query with the Boolean connector, AND. You can think of this modified query as a sort of subtopic or one aspect of your overall topic, and may want to save it separately for comparison.

Word Space

This visualization helps you analyze patterns in the use of top words in the found stories. Here’s a key to the visualization:

  • Size + opacity = frequency of word appearance
  • Distance from the center = Words that show up closer together= have a high probability of being used in similar word contexts. Words that are far apart are not likely to be
    used in the same contexts
  • Orange/ cone = words that are used in similar contexts or narratives

The Word Space visualization can help give you a sense for angles that may dominate the set of stories you’ve found.

Representation (The People & Places TAB)

Top People and Top Organizations

Under the this tab, you’ll find results on the people and organizations being discussed the most for your query. These are often subjects of stories, or are being referenced or quoted in your area of interest. Note that some figures, such as the President of the United States, often appear at the top of any list, simply because there is so much news in general where this person’s name will be mentioned at least once.  Some users of Media Cloud like to use these results to make lists of accounts to follow on social media.

Geographic coverage

Mouse over the map to view the percentage of stories mentioning each country. (Note: this map does not refer to origin of story publication.) What inferences might you draw by the countries the media has focused on for your area of interest?


Furthering your research

Refining your query

The process of refining a query can include many steps we’ve already reviewed:

  • Combining different elements of your worksheet data (date ranges for different events, topical keywords, sources, comparative issues, media types, key people, and so on) to create and explore different queries with different sources.
  • Browsing stories for additional key terms to add to your query, once you’ve discovered different potential angles.
  • Clicking a word in the word cloud, under Language, to add an interesting new term with the Boolean connector AND to your query.

Supporting your conclusions

When you’ve settled on an answer to your research question, write it out. Use the data visualizations in Explorer and citations from your story results to support your conclusions. You can also further elucidate your answers by creating your own layered Attention peak graphs with dates, or network maps using a tool such as Gephi.

Using Other Media Cloud tools

Explorer is a great tool to practice searching and analyzing online news media. Once you’ve developed a basic level of skill and confidence with techniques of querying and analysis, and you have more than a few days to wait for results, you can try Topic Mapper, which provides a way to dive even more deeply into online data and work with more types of outputs for analysis, including the analysis of influence using Facebook shares and inlink data.