Wondering where to find free and open data sets for your next data project? There’s no need to look far…
If you’re looking for a job in data analytics, you’ll need a portfolio to demonstrate your expertise. Of course, if you’re new to data analysis, you may not have much expertise! No worries. The fact that you may not yet be working on a paid project doesn’t mean you can’t create an attractive portfolio using some practice data sets.

Fortunately, the Internet is flooded with these, most of them are downloaded completely for free (thanks to the open data initiative). In this post, we’ll highlight some first-class archives where you can find data on everything from business to finance, planet science, and crime.

Prepare to learn, and we begin to:

  1. Search Google data set
    Data type: Other
    Data compiled by Google
    Access: Free to search, but includes some paid search results
    Sample Data Set: Global Coffee Prices, 1990 to Present

It looks like we switched to Google for everything today, and data is no exception. Launched in 2018, Google Data Set Search is like Google’s standard search engine, but entirely for data. While it’s not the best tool if you want to browse through, if you have a specific topic or keyword, it won’t disappoint you. Google Dataset Search synthesizes data from external sources, provides a clear summary of what’s available, a description of the data, who it was provided with, and when it was last updated. It’s a great place to start.

  1. Kaggle
    Data type: Other
    Data synthesized by: Kaggle
    Access: Free, but registration required
    Sample data set: Daily temperatures of major cities

Like Google Data Set Search, Kaggle offers aggggable data set, but it is a community center rather than a search engine. Kaggle debuted in 2010 with several machine learning competitions, which later solved problems for NASA and Ford. Since then, it has evolved into a well-known open data platform, providing cloud-based collaboration for data scientists, as well as educational tools to teach artificial intelligence and data analysis techniques… In addition, of course, a lot of great data sets cover almost every topic you can imagine.

  1. Data.Gov
    Data type: Government
    Data compiled by U.S. Federal Government
    Access: Free, no registration required
    Sample Data Set: Lobster Report on Feeder and Sales

In 2015, the U.S. Government made all of its data public. With over 200,000 data sets covering everything from climate change to crime, you can immerse yourself in the database for hours. For government websites, it has some surprisingly user-friendly search functions, including the ability to view details by geographic region, organization type, and file format. Search results are also clearly labeled at the federal, state, county, and city levels. If you’re interested in more general data about the U.S. population, you can also view the U.S. Bureau of Population Surveys, which offers a wide selection of data about U.S. citizens, geography, education, and their population growth.

    Data type: Mainly the business and financial
    Data synthesized by Datahub
    Access: Most free, no registration required
    Sample data set: Monthly gold price since 1950

The goal of many data analysts is to help drive wise business decisions. Therefore, the use of economic or business data sets for your portfolio project may be worth considering. While Datahub covers a variety of topics from climate change to entertainment, it mainly focuses on areas such as stock market data, property prices, inflation, and logistics. Since much of the data on the portal is updated monthly (or even daily), you’ll always have new information to work with, as well as data that covers wide periods of time.

  1. UCI Machine Learning Archive
    Data type: Machine learning
    Data compiled by University of California Irvine
    Access: Free, no registration required
    Sample data set: Behavior of urban traffic in Sao Paulo, Brazil

The general repository is great if you’re happy to browse. But if you’re looking for something more appropriate, why not specialize? Enter the UCI Machine Learning Repository. Launched 30 years ago by the University of California Irvine, don’t let the 90s vibe fool you —the highly prestigious UCI repository among students, teachers, and researchers is home to machine learning data. The data set is clearly classified by task (i.e. classification, recess, or clustering), properties (i.e. classification, number), data type, and field of expertise. This makes it easy to find something suitable, whatever machine learning project you’re working on.

  1. Earth Data
    Data Type: Earth Sciences
    Data synthesized by NASA
    Access: Free, no registration required
    Sample data set: Environmental conditions during the fall elk hunting season in Alaska, 2000-2016

If you think space is great (let’s face it, space is great!), then it’s no further than Earth Data. Publicly available since 1994, this repository provides access to all NASA satellite observation data for our tiny green planet. As you can imagine, there is a lot to study, from weather and climate measurements to atmospheric observations, ocean temperatures, vegetation mapping, etc. If Earth-based data isn’t your thing, NASA’s Planet data system will go one step further with data from inters planet-based missions, such as the Cassini probe (which orbited Saturn between 2004 and 2017). Maybe, you can even make a scientific discovery…

  1. CERN Open Data Gateway
    Data type: Particle physics
    Data synthesized by CERN
    Access: Free, no registration required
    Sample data set: Higgs candidate collision events from 2011 and 2012

Want to demonstrate the ability to work with complex data sets? Go to cern open data port. It provides access to more than two petabytes of information, including data sets from the Large Hadron Collider particle express machine. Honestly, these data are not for the weak heart but if you are interested in particle physics, they are worth a look. While even the names of these data set are quite complex, each entry has a useful analysis of what is included, as well as related data set and how to analyze them. In many cases, they even offer sample code to get you started (thanks, CERN!)

  1. Global Health Observatory Data Warehouse
    Data Type: Health
    Data compiled by World Health Organization (UN)
    Access: Free, no registration required
    Sample data set: Estimate polio vaccination coverage by region

The Global Health Observation Database is the gateway for the UN WHO to collect health-related statistics globally. If you’re looking to penetrate the healthcare industry (the primary focus of many data scientists, especially in the field of machine learning), then these data sets are a good choice for your portfolio. Covering everything from malaria to HIV/AIDS, antibiotic resistance, and vaccination rates, the portal even has an interesting little feature that lets you preview the data tables before downloading. Not absolutely necessary, but definitely good to have!

  1. BFI film industry statistics
    Data type: Entertainment and movies
    Data compiled by the British Film Institute
    Access: Free, no registration required
    Sample data set: Weekend box office figures from 2001 to present

If you’re looking for some data that’s a little easier to digest, then some of the next data will be right on your way. First: industry statistics of the British Film Institute. Throughout the year, BFI accumulates and publishes data on everything from box office figures in the UK to audience demographics, home entertainment, film production costs, etc. The best part though is their annual statistical yearly. This breaks down the year’s data with some great statistical analysis and visual reporting —great if you’re new to data analytics and want to test your work with reality.

  1. Taxi trip data in NYC
    Data Type: Transportation
    Data is synthesized by New York City Taxi and Limousine Commission
    Access: Free, no registration required
    Sample data set: Make your choice!

This is a strangely fascinating event… Since 2009, the NYC Limousine and Taxi Commission has collected transportation data from across New York City. Find a data set that includes pick-up/departure times and locations, trip distance, fares, rates, and payment types, number of passengers, etc. It is interesting to compare the difference in figures from 2009 to the present day, especially in such a small geographical area. The site also offers a number of additional tools, including manuals, taxi area maps, data dictionaries (to explain spreadsheet labels), and annual industry reports. All are very intuitive and are a pretty useful tutorial if you are new to data analysis.

  1. FBI Crime Data Discovery
    Data type: Crime and drugs
    Data compiled by Federal Bureau of Investigation
    Access: Free, no registration required
    Sample Data Set: Number of Homicides in Point Pleasant, 2008-2018

If you’re fascinated by crime, then the FBI Crime Data Discoverer is the app for you. It offers an extensive collection of crime statistics from many state institutions (universities and local law enforcement agencies) and governments (at the local, regional, and state levels). Get data on hate crimes, assaults on officers, murders, etc. Like some of the last entries on our list, it also includes some useful manuals to support data navigation. Each data set also has some pretty good visual analysis and analysis, so you can see if it has the features you’re looking for before downloading.

Next step
If you’re like us, it will take you hours just to browse these vast archives. From quirky to shameful eccentric, there is no better evidence of the prevalent stock of data in our lives. So what do you do when you find your data set and analyze it? If you want to highlight your analysis as a project in your portfolio, you’ll need to take certain steps —you can learn how to create your data analytics portfolio in this tutorial.

If you’re completely new to data analytics, why not try a free, introductory short course that lasts 5 days? You will be given a practical introduction to the field, complete with access to a viable data set. And, if you want to learn more about what’s needed to shape your data career, see the following:

