Before you begin searching for data to mine on data.gov, the federal data portal, you must understand one thing: There is no data on the site. Data.gov is home to a data catalog, a list of dataset names with details such as descriptions, formats, and urls for obtaining data and additional information. The data itself is hosted and shared by the individual government agencies that create it, and each agency does things in its own way.
Although a lot of data (more than 100,000 datasets) is cataloged on data.gov, it still covers only a fraction of what's available from government agencies. Agencies are required to maintain some new data in electronic form and list it on data.gov, but they may have additional resources that are not listed there. So data.gov is a good starting point in your search for government data, but it's not a comprehensive source.
Here's how to begin:
-
Go to the data.gov home page, and in the box that says "Get Started," enter keywords for the type of data you need.
You'll get a list of datasets whose descriptions mention your keywords. The list may include thousands of results.
On the left side of the screen, you will see options for narrowing your search based on tags, data formats, the agency that produced the data, and other factors. There is even an interactive map that lets you indicate the geographic area you have in mind.
-
Narrow your search to get you a shorter, more relevant list of datasets.
-
When you find a dataset description looks appropriate for your needs, click on the name.
You'll get a more detailed description of the dataset. In some cases, this information will include the location of a data dictionary, documentation which explains the data fields, the email address of a contact person or other information you may need.
Note that download buttons don't always take you straight to the data. Often, these link to another web page, on data.gov or an agency site. You may find yourself navigating a number of pages on an agency site before actually reaching the data itself.