Media Cloud

Source List CSV File

Source List CSV File

About

In the Media Cloud tools, you search inside of collections of sources. You can download a source list CSV file to see more information about the media sources in any collection. Users with the correct permissions can also use a blank version of the source list CSV to change the media sources in a collection, edit sources in batch, or add new sources.  Below is a list of the columns in the source list CSV, their descriptions, and how to use them when uploading sources.

We recommend downloading the sample source list CSV below to follow along. 

Source List CSV Columns

media_id

  • Column Description: This is an internal unique id number for every source in Media Cloud’s system. It is automatically assigned by the system and uniquely identifies this source.
  • For Source Upload: If you are adding in rows with new sources, leave this field blank.

url

  • Column Description: This is the base url of the source.
  • For Source Upload: This is the most important field. Fill in the main/homepage url of the source. If the url exists in our system already, we will modify it by adding in any of the other columns you have specified.  If it doesn't exist in our system already, we will check this website to discover any and all RSS feeds we can then use to import stories regularly.

name

  • Column Description: This is the human name of the source. In general, we pull this from the title of the homepage of this source.
  • For Source Upload: Leave this field blank.

pub_country

  • Column Description: This is the country the source is published in. Countries are listed using codes from the "alpha3" ISO-3166-1 standard.
  • For Source Upload: Fill out this field if possible by browsing the website to determine what country the source is published in. You can find the code for a country with the ISO search tool at: https://www.iso.org/obp/ui/#search. Enter the 3-letter code for the country in capital letters.

pub_state

  • Column Description: This indicates the subdivision (i.e., state or province) the source is published in. Values in this column are listed in codes from the  ISO 3166-2 standard.
  • For Source Upload: Fill out this column if possible by browsing the website to determine what subdivision the source is published in. Search for the country using the ISO  tool at: https://www.iso.org/obp/ui/#search. Click on the country to open a page with a list of all the subdivisions in that country and their codes. Subdivision codes will start with a 2-letter code for the country, and be followed by a hyphen and then up to three numbers and/or letters that signify the state/province/region. For example, the code for the metropolitan region of "Corse" in France is "FR-COR". Enter the full code in capital letters.

primary_language

  • Column Description: This is the main language that the source published in. This is encoded based on the 2-letter 639-1 standard (N.B.: The 2-letter standard is the second column). This is algorithmically determined by our language detection system. If this column is empty, or says "none", that means we do not have enough stories to make a good judgement about the primary language.
  • For Source Upload: Leave this field blank.

subject_country

  • Column Description: This is the main country that stories from this source are about. This is algorithmically determined by our geo-parsing and geo-location engine (called CLIFF-CLAVIN). Countries are represented by their full official name. If this column is empty, or says "none", that means we do not have enough stories to make a good judgement.
  • For Source Upload: Leave this field blank.

media_type

  • Column Description: This indicates what type of media source this is. This is a fixed taxonomy of sources we created in collaboration with the Media Cloud community. The values are:
    • print_native: This source is primarily a print publication. Use this for newspapers and magazines. Examples: New York Times, The Economist.
    • digital_native: This source is internet-based. Use this for news sources that began on the internet first, organizational websites, and blogs. Examples: CDC, Vox, Scroll.in.
    • video_broadcast: This source is primarily a broadcast TV station (i.e., video transcriptions or closed captions). Examples: CNN, Fox News.
    • audio_broadcast: This source is primarily a broadcast radio station or podcast (i.e., audio transcriptions). Examples: NPR.
    • other: This source doesn't fit in any of the other categories. Examples: AP, Reuters.
  • For Source Upload: Enter the appropriate description based on the list above.

public_notes

  • Column Description: These are any public notes we have made about this source, such as why we added it.
  • For Source Upload: You can enter notes about the source if desired. Please be brief, and keep in mind that descriptions you enter here will be publicly viewable.

EDITOR_NOTES

  • Column Description: These are any internal notes we have made about this source, such as background on custom importing or details about caveats to have in mind while editing it.
  • For Source Upload: You can enter any internal notes here.  Please be brief.  These will not be shown publicly; only people who have permission to edit sources will see this information.

stories_per_day

  • Column Description: This is the average number of stories per day collected from this source during the last 90 days.
  • For Source Upload: Leave this field blank.

first_story

  • Column Description: This is the date of the oldest story we have from this source in our database. For sources that we regularly collect content from, this can be a good indication of how long we have been collecting data from that source.
  • For Source Upload: Leave this field blank.