Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Services: Finding Existing Data Sets

Locating Exitsing Data Sets

The lists below are in no way a comprehensive listing of secondary datasets. Please contact the Data Services Librarian or a subject specialist to assist you in finding appropriate datasets.

This guide provides resources for locating data sets. This is data that already exists and can be repurposed either by downloading or getting access. Everywhere there are datasets

Secondary data is data that a researcher has not collected or created themselves.  Secondary data can encompass an enormous range of highly original and extensive studies, including some of the largest and most careful collections of data.

Secondary data are usually easily accessible to researchers and individuals because they are mostly shared publicly. This, however, means that the data are usually general and not tailored specifically to meet the researcher's needs as primary data does.

Things to consider when using secondary data sets:

  • Who created the data?
  • How is the data Do you understand the metadata

 

General

The lists below are in no way a comprehensive listing of secondary datasets. Please contact the Data Services Librarian or a subject specialist to assist you in finding appropriate datasets.

  • Google Dataset Search: users can discover datasets hosted in thousands of repositories across the Web.
  • data.world : data.world is the enterprise data catalog for the modern data stack
  • Kaggle: Kaggle is an online community platform for data scientists and machine learning enthusiasts. Kaggle allows users to collaborate with other users, find and publish datasets, use GPU-integrated notebooks, and compete with other data scientists to solve data science challenges.
  • Re3 : Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines.
  • Data Hub :a platform for data from the creators of the Comprehensive Knowledge Archive Network (CKAN).
  • Natural Language Processing Datasets : Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP).
  • Pew Research Center Datasets : Datasets from the Pew Research Center.

Global

The lists below are in no way a comprehensive listing of secondary datasets. Please contact the Data Services Librarian or a subject specialist to assist you in finding appropriate datasets.

Federal Datasets

The lists below are in no way a comprehensive listing of secondary datasets. Please contact the Data Services Librarian or a subject specialist to assist you in finding appropriate datasets.

Humaities

The lists below are in no way a comprehensive listing of secondary datasets. Please contact the Data Services Librarian or a subject specialist to assist you in finding appropriate datasets.

Natural Sciences

The lists below are in no way a comprehensive listing of secondary datasets. Please contact the Data Services Librarian or a subject specialist to assist you in finding appropriate datasets.

Social Science

The lists below are in no way a comprehensive listing of secondary datasets. Please contact the Data Services Librarian or a subject specialist to assist you in finding appropriate datasets.

Colby College on Twitter Colby College on Facebook Colby College on YouTube Colby College on Vimeo Colby College RSS Feed Search Previous Next