Businesses are well aware of the data they possess on their customers, marketing campaigns, social media, and more.
This type of data is from internal sources like CRM software, ERP systems, marketing automation tools, databases, and other repositories. Through data analytics, business gather and analyze this information to make more calculated decisions.
But as the world becomes more data-driven and as volumes of big data increase, it’s important to consider the data that lies outside the scope of your organization. A sizable portion of it is considered "open data."
What is open data?
Open data are large datasets that are available to you, me, anyone with an internet connection.
This type of data stems from external sources around the world. It can be anything from public data collected by government agencies to economic trend roundups from banks and financial conglomerates.
Why is open data important? Well, open data is publicly accessible knowledge for anyone to use. In terms of business, this data can be used for predictive intelligence and forecasting, unveiling buying patterns of demographic groups, finding new opportunities for innovation, and so much more.
With the advent of big data, businesses shouldn’t just be consumed in their own data. That’s why we compiled the top 50 open data sources ready to be used right now.
50 open data sources
During the data analysis process, part of generating accurate insights is pulling data from relevant places. Click on one of the categories below to find an open dataset that’s relevant to your business.
Government and global data
1.Data.gov – From science and research to manufacturing and climate, data.gov is one of the most comprehensive data sources around the globe. Datasets are available in typical formats such as CSV, JSON, and XML. Metadata is frequently updated as well, giving the user complete transparency and clarity.
2.U.S. Census Bureau – For demographical data on U.S. inhabitants, this open data source is extremely useful. The sources of census bureaus are federal, state, and local governments, as well as commercial entities.
3.Data.gov.uk – Similar to the data.gov’s source for U.S. data, there’s also one for the entire United Kingdom. Reports contain data on everything from crime and justice to defense and government spending.
4.UK Data Service – A perfect complement to data.gov.uk is the UK Data Service, a search engine for recent datasets on social media trends, politics, finance, international relations, and more happening in the UK.
5.European Union Open Data Portal – With almost 14,000 datasets available, EUROPA is one of the best open data providers in the EU for insights on energy, education, commerce, agriculture, international issues, and much more.
6.Open Data Network – This source allows users to look for data using a robust search engine. Apply advanced filters to your searches, and pull data on everything from public safety, finance, infrastructure, housing and development, and more.
7.UNICEF – These valuable open datasets monitor and report on the situations of children and women everywhere. Latest updates on disease outbreaks, gender and education, attitudes on social norms, and other datasets are widely available through UNICEF, as well as data visualizations.
Financial and economic data
8.World Bank Open Data – This is one of the most frequently updated and complete open data sources for information on GDP rates, logistics, global energy consumption, disbursement and management of global funds, and much more. There are even visualization tools for some datasets.
9.Financial Times – The Financial Times may look like an online newspaper but is actually one of the most robust data sources for global markets, the Americas, Europe and Africa, and Asia-Pacific.
10.Global Financial Data – With a free subscription, users can access GFD’s complete datasets and research to analyze major global markets and economies. Sources are periodicals, books, and numerous archives.
11.UN Comtrade Database – Curated by Comtrade Labs, this free access database holds mountains of datasets on global trade and is accessible via API. There are also data visualization and data extraction tools available.
12.International Monetary Fund – For insights on the global economic outlook, financial stability, fiscal monitoring, and more, IMF datasets should have you covered.
13.Bureau of Economic Analysis – Curated by the U.S. Department of Commerce, this wide-ranging open data source is frequently updated with datasets on GDP, international trades of goods and services, international transactions, and more.
16.Federal Reserve Economic Database – Nearly 530,000 U.S. and international datasets are generated by the Federal Reserve. Some examples include consumer price indexes, GDP, industrial production indexes, foreign exchange rates, and others.
Crime and drug data
17.Uniform Crime Reporting Program – Curated by the FBI, the UCR Program aggregates data points from more than 18,000 cities, universities and colleges, counties, states, tribes, and federal law enforcement agencies.
18.Bureau of Justice Statistics – While the UCR Program has more crime-specific statistics, this open data source collects data on everything from arrest-related deaths and CPDO consensus to emergency room stats and annual firearm inquiries.
19.National Archive of Criminal Justice Data – The NACJD is a comprehensive resource for discovering both public and restricted access datasets on recidivism, gang violence, terrorism, hate crimes, and much more.
21.United Nations Office on Drugs and Crime – For datasets on drug production and trafficking, global studies on homicide rates, organized crime, corruption, and more, the UNODC has frequently updated publications.
Health and scientific data
22.World Health Organization – One of the most complete open data repositories for global mortality rates, disease outbreaks, mental illnesses, health financing, and more is the World Health Organization.
23.Food and Drug Administration – Commonly known as the FDA, this open data source serves as an educational library on everything from foodborne illness and contaminants to dietary supplement news and recalls in the U.S.
24.HealthData.gov – Containing over 3,000 datasets over a 125-plus year span, this open data source is dedicated to making high-value data accessible to entrepreneurs, researchers, and policymakers.
25.Broad Institute – The Broad Institute is a clear-cut open data source with health and scientific research specifically on the many types of cancers.
26.National Cancer Institute – A complement to the Broad Institute would be NIH. With advanced filters, users can create hyper-targeted search results for a variety of open datasets relating to cancer.
27.Center for Disease Control – Access a wide variety of open datasets on chronic illnesses, cancers, heart diseases, birth defects, and much more through the CDC.
28.NHS Digital – For high-quality datasets on the state of health and social care systems in England, NHS Digital is an easy-to-use free service to consider.
29.Open Science Data Cloud – With more than a petabyte of big datasets on-hand, the OSDC enables scientific researchers to easily manage, share, and analyze open data.
30.NASA Planetary Data System – Require planetary data? Well, NASA has you covered. Whether you’re a researcher, educator, student, or just part of the general public, search thousands of open datasets on our solar system’s planets.
31.NASA Earth Data – Want to scale it back to just planet Earth? Access NASA’s complete open data source for Earth science. Monitor the atmosphere, the cryosphere, land, ocean, calibrated radiance, and solar radiance.
32.Google Scholar – In search engine fashion, Google Scholar lets users search for datasets like they would with any other Google search. Find educational, peer-reviewed sources of data on just about any topic!
33.Pew Research Center – Pew is one of the largest open data sources in the U.S. with datasets aggregated through high-quality surveys. Data from surveys are typically released two years after reports are issued. You’ll have to create a free login to access Pew Research Center.
34.National Center for Education Statistics – Open datasets like the NCES are being widely used in educational institutions today to improve student retention rates, degree attainment, understand learning habits, and much more.
35.Climate Data Online – For historical and near-real-time climate datasets around the globe, the CDO acts as a great open data source. Search daily summaries, marine data, weather radars, and more.
36.National Center for Environmental Health – Curated by the CDC, this open data source highlights major data systems with a national scope where public health and environmental data can be collected.
37.IEA Atlas of Energy – When it comes to global energy and electricity consumption rates, IEA has comprised open datasets and map visualizations for everyone to access.
Business directory data
38.Glassdoor – The review site for jobs also has a wealth of open data ready for analysis. Some examples include Glassdoor’s frequently updated gender pay analysis, monthly salary reports, local pay reports, and more.
39.Yelp – Tap into the millions of existing business reviews using Yelp’s open datasets to gain a deeper understanding of sentiment toward businesses, as well as any patterns and trends.
40.Open Corporates – One of the largest open databases of companies in the world holds hundreds-of-millions of datasets in essentially any country.
Media and journalism data
41.FiveThirtyEight – One of the most comprehensive and high-quality data sources on everything from politics to sports is FiveThirtyEight.
42.The New York Times Developer Network – By creating an account and registering your app, you can tap NYT abstracts, links, multimedia, books, listings, stories, and other media dating all the way back to 1851.
43.Associated Press Developer – Similar to the NYT dev network, you can build powerful integrations with Associated Press’ services for developers. This consists of news content, polling data, metadata, and more.
Marketing and social media data
44.Graph API – Curated by Facebook, Graph API is the primary way for apps to read and write to the Facebook social graph. It is essentially a representation of all information on Facebook now and in the past.
45.Social Mention – Acquire real-time data on social sentiment, keyword usage, users, and hashtags using the Social Mention search engine.
46.Google Trends – Search what the world is searching using Google Trends datasets on latest search trends. Marketers can pinpoint timely campaigns using this data.
47.Kaggle – Under the supervision of Google, Kaggle is an online community of data scientists who publish seemingly random datasets on everything from tracking the frequency of internet memes to “last words of death row inmates.”
48.Datasets Subreddit – Reddit is a vast online community, and this particular source is comprised of Redditors who scrape the web for interesting datasets in the R programming language.
49.DBpedia – Think of Wikipedia, except only for databases. With DBpedia, users can explore the millions of entries on Wikipedia and each relationship. This has helped companies like Apple, Google, and IBM support artificial intelligence projects.
50.Google Public Data Explorer – Many of the sources included on this list are actually consolidated on the Google Public Data Explorer. If you’re not sure where to start pulling data from, this could be a good starting point. There’s also free access to the Google Dataset Search engine.
Using open data
Whether you’re running an exploratory analysis or just pulling data for fun, open data allows for seamless passing on of valuable information. Consider a few of the sources above for your next analysis.
Curious where open data fits in the grand scheme of things? We asked 10 data experts for their insight on open data, artificial intelligence, machine learning, and how it all contributes to analytics trends now and in the future.
Devin is a former senior content specialist at G2. Prior to G2, he helped scale early-stage startups out of Chicago's booming tech scene. Outside of work, he enjoys watching his beloved Cubs, playing baseball, and gaming. (he/him/his)