Media Cloud

Query Guide

Query Guide

This guide will help you structure queries using the Media Cloud tools of Explorer and Topic Mapper. Example queries are provided and written in [brackets]. The bracket symbol is not part of the query itself and should not be used in your queries.

Keywords

Building a query consists of choosing the words or phrases to search for and entering them into the “search for” field. Media Cloud’s tools search at the story level for the keywords you enter.

  • Example: [Refugees] will return all stories that include the term refugees anywhere in the story content.

Boolean Connectors

OR

This is the default connector for queries. That is, if you enter a list of words without a Boolean connector, or a list of words connected by OR, the query will retrieve stories that contain any of the words in your list.

  • Example: [Donald Trump] will return the same results as [Donald OR Trump]. These results will include any stories that match the word “Donald” or the word “Trump.” For example, you may get a story about Donald Duck, or about Ivanka Trump, that does not mention Donald Trump.

AND

Using AND to connect terms in your query will allow you to find stories in which all the terms appear. This is particularly useful when the terms you are looking for do not always appear in the same order. If the terms do appear in a certain order, quotation marks should be used. If you want to find stories in which two terms are used close to each other, you can use a proximity search, detailed in the next section.

  • Example: Searching for [child AND marriage] will return all stories that match the word “child” and the word “marriage,” which is likely to cover a broad range of subtopics (e.g., news of famous people having children, articles about contraception, lifestyle editorials). If you want to research the issue of child marriage specifically, you should search for [“child marriage”], to return only stories that have 'marriage' immediately following "child".

NOT

In order to refine a search, it is often useful to eliminate from your query results those stories in which a certain term appears. When you use NOT between your terms, the tool will return stories that match the first term but do not include the second term.  You can also use a minus sign to negate a given phrase as an alternative to NOT.

  • Example: If you are interested in finding information about the Zika virus, but not about its incidence in Brazil, you can search for [zika NOT Brazil] or [zika -Brazil].

Other Search Parameters

Capital letters

Queries are NOT case sensitive, so using lowercase or capital letters does not make a difference.

  • Example: Searching for [West] will produce the same results as [west]. If you are looking for articles about Kanye West and do not want articles about the direction west, you should search [“Kanye West”] or [Kanye AND West].

Quotations

When searching for a phrase (a series of words that always appear in the same order) it is necessary to use quotation marks. If you cut and paste a query that uses quotation marks from a program like Word, Media Cloud will not understand them. We recommend directly typing your query into the search, or pasting from a text editor like Notepad.

  • Example: To search about the topic of climate change, search [“climate change”] to find stories that match that phrase (i.e., match the words in that order, without any words in between them). Otherwise, if you only search [climate change], you will get stories that contain the word climate or the word change anywhere in the story.

Hyphens

When searching for a multi-word term written with a hyphen (well-being, for instance), place the term between quotation marks. Otherwise, Media Cloud will convert the hyphen into a blank space and treat the term’s words as separate. You should consider though that searching for a hyphenated word between quotation marks will retrieve that hyphenated word but also the consecutive occurrences of the two words.

  • Example: If you want to search on the term “well-being,” you should use the query [“well-being”], which will retrieve stories that have the term “well-being” or consecutive use of the terms “well” and “being.” If you were to instead search [well AND being], you would get stories that had the words “well” and “being” anywhere in the story.

Different forms of words

If you are interested in searching for all the different forms of a word, you should use the wildcard symbol *. This will return stories that match any word beginning with the stem keyword you searched. If you want to search for only one wildcard character, you can use the ? character to represent any single character. Note that you can only use the wildcard symbol at the end of a word, not the beginning (i.e., you can search key* but not *key). 

  • Example: Searching for [key*] will retrieve stories that contain a word of any length that begins with “key,” such as key, keys, keyboard, keystone, keynote, keynesian, keywords, etc. Searching for [key?] will return stories that match only 4-letter words that begin with “key,” such as keys.  

Parentheses

If your query is somewhat complex you will probably need to use parentheses to structure it and nest search terms. All of the above rules and the Boolean connectors still apply within the parentheses.  

  • Example: A query such as [(“illegal immigration”) AND (politics OR economy OR campaign)] will retrieve any story containing the phrase “illegal immigration” AND any of the other three terms.

Other characters

Media Cloud does not support searching any other punctuation other than those listed - parenthesis, hyphen, asterisk, and quotation mark. Any other punctuation, such as the @ symbol or the # symbol, will not be recognized by the search feature.

  • Example: Searching for @POTUS or #POTUS is equivalent to searching for POTUS, even if you place the term between quotation marks.

Searching in another language

Media Cloud supports searching for stories by the primary language in which each story was written. To run a search query for stories written in a specific language, type your term and then use the Boolean connector AND to add the language search tag of “language:” followed by the two-character code for the language you want to search.

  • Example: The query [queso AND language:es] will retrieve stories that contain the word “queso” that have also been detected by our system as being written in Spanish.

Searching for headlines

You can search only in the headline or title of an article by using the “title:” search tag.

  • Example: To find articles that have “stem cell” or “stem cells” in the title, use the query [title:(“stem cell*”)].

Proximity search

A normal search in Media Cloud will return stories that match your query. If you want to search at a more narrow level and find only sentences that match your query, we recommend using a proximity search. Enter your keywords in quotation marks and then include a tilde ~, followed by the number of words you want to limit your search to in terms of proximity. For a typical sentence search, we recommend using the number 10. Learn more about proximity searches. Please note that we do support searching for different forms of a word in proximity search through the use of the wildcard symbol *.

  • Example: To find stories that contain the words “Trump” and a reference to the Republican party in the same sentence (which could be “Republican,” or “Republicans”), use the query [“Trump republican*”~10]. This will return stories in which Trump and any word of any length beginning with “republican” are within 10 words of each other.

Advanced Search Options

Media Cloud uses a Solr cluster for its text searching.  For a full description of the Solr search syntax see the full Solr query documentation.  In addition to the features described above, we index the following fields in the index, any of which can be searched using the 'field:' syntax: 

  • stories_id
  • processed_stories_id
  • text, title
  • publish_date
  • publish_day
  • publish_week
  • publish_month
  • publish_year
  • tags_id_stories
  • tags_id_media_id
  • timespans_id

Example: To find stories published in 2016 in the overall timespan of the 2016 U.S. Presidential Election topic by the New York Times that mention Trump or Clinton and immigration within 10 words of each other, you can search [publish_year:2016 and timespans_id:74322 and media_id:1 and "trump immigra*"~10 or "clinton immigra*"~10]