Many governments throughout the world are participating in “Open Data” initiatives. Open Data are data collected, usually at taxpayer expense, to help inform
policy and decision making. Once you have identified a government with open data, find an area or theme that interests you. For example, you might be
interested in response times of the fire department, crimes, how much government employees are paid, or the health ratings of restaurants. Some cities
organize data by theme, such as Finance, Public Safety, and so on. You might have to do some poking around to find a data set that is interesting.
Your goal is to download a data file. But not just any data file; we need one that your statistical software will understand. To do this, you will want to look for
some sort of option to “Export” or “Download” a file. In some cases, these words might not be used and you are expected to simply click on a link.
There are three things you need to pay attention to: the format, the structure, and the existence of metadata.
Format refers to the types of files provided and the software that can open them. For example, files that end in
“.txt” or “.csv” are text files and can usually be open by many types ofsoftware. StatCrunch wants files that end in .xlsx, .xls (Excel spreadsheets), .ods
(OpenOfficespreadsheet), or .csv, .txt, or .tsc (text files).
Data files often use a delimiter to determine where one entry ends and another begins. For example, if a data file contains the text string “1345” you do not
know whether it is the number one thousand three hundred forty-five or the four digits 1, 3, 4, 5 or the two numbers 13 and 45 or anything else. A delimiter
tells you when one value ends and the next begins. A comma-delimited file, which usually ends in the extension .csv, uses commas to display “1, 34, 5” for the
values one, thirty-four, and five. A tab-delimited file (.tsv or .txt) uses tabs as shown.
1 34 5
Metadata is documentation that helps you understand the data. Sometimes it is called a “codebook” or “data dictionary.” It should answer the who, what,
when, where, why, and how of the data. But most important, it should tell you what the variable names mean and what the values mean. With any luck, it will
also tell you the data type (for example, whether it is a number or a character).
Warning: you might have to look at a great many data files before you find one that is useful.
How to Submit
Put your answers to the following questions and upload the document here.
What is the URL of the page from which you downloaded your data?
How many variables are there? How many observations?
Do you have 2 categorical and 2 quantitative variables in your data set? If so, please choose two each and list them.
What does a row in the data set represent? In other words, what is an observation or case in context?
Why were these data collected? What is their purpose? How were they collected? Why were you interested in this data set? Give some examples of what you
want to know from these data.
Try to upload the data to StatCrunch. Describe what happens. Did you get what you expected? Did you get an error? Describe the outcome. If you succeed in
uploading the data set, please share in our class StatCrunch group.