Changing the World with Open Data
A few weeks ago, I attended a conference called Brooklyn Beta. In its second year, the conference is basically a collection of designers, developers, entrepreneurs, and crazy ne’er-do-wells. This conference usually leaves me feeling inspired, excited, intimidated, and ready to conquer the world, and this year was no different.
While there were a few underlying themes running through the conference, the main theme was “Make Something You Love.” An intriguing and simple statement, but how many of us really are making something we love? How many of us are using our powers for Awesome versus just Meh? Creating something different, listening to your heart when all others think you’re doing it wrong, helping society and others around you, and changing the lives of others, were all common mentions at this conference. Whatever tools you choose or language you code in, it doesn’t matter. What matters is that at the end of the day, you made a difference. You left your mark on society. You made the world a better place to be in. You created something that wasn’t there the day before.
Admittedly, I’m a bit of a data geek so when I saw a wonderful speaker named Todd Park give a talk on healthdata.gov, it really got my attention. (As well, Todd is a superbly dynamic speaker who can make even the topic of healthcare in America interesting and engaging.) Healthdata.gov is an open initiative that takes data from Medicaid/Medicare, hospitals, pharmacies, doctors offices and other related agencies, and releases it out into the public. To clarify, of course they aren’t releasing personal, private information about you and your medical history. That is illegal. But they have been able to release aggregated information such as:
- HIV/AIDS facilities and research
- Foster care and adoption directory and statistics
- Fertility statistics
- Eldercare locator data
- Foodborne illness data
- Organ transplant data
- Designated Health Professional Shortage Areas
- FDA Drug Label data
- FDA recall data
- National Health Expenditures data
- Clinical trials for patients in need
- Toxic chemicals and environment data
- Medicare costs
- Cancer research
- Health and Well-Being statistical data
And this is only a sampling. Currently, there are 248 sources of data at this one site, just waiting for you to have a look. The best part is, they’re working hard to encourage people to do cool things with the data they provide. Check out the Apps Expo on their site for some amazing apps that have been written so far, and their list of developer challenges and Code-A-Thons.
You might think some of this data sounds kind of boring. But what if you took it one step further and cross-matched it with other, unrelated data? Are there trends, connections, or correlations? For example, could a hospital prepare in advance for flu cases based on weather reports and trends? Could you track cost of living expenses with where there is a shortage of health professionals to help nurses in school figure out where they should live? What other kinds of seemingly unrelated data could we cross-match? For that matter, what other data is out there?
When we think of open data, we usually think about commercially released APIs from sites like Twitter and Facebook. But, there are hundreds of terabytes of other data just waiting for us to twist and turn and mold into something that wasn’t there before. Like a sculptor creating a masterpiece from a block of stone or clay, we have a block of raw numbers that we can put together in new ways.
For instance, did you know that besides the government’s health data, we also have at our disposal data on the following?
- NOAA weather and climate patterns
- Current NASA Space Mission Instrument and science data
- Consumer Broadband speed test statistics
- US Per Capita Food Availability data
- Demographic statistics (age/sex/race/geography) and historical data
- Air Quality data
- Disaster management for emergency networks
- Airport service and status data in realtime
- Railroad inspection, accident, and incident data
- Automobile safety ratings data
- Patent applications and assignments
- Vegetation and Drought data
- Threatened and Endangered Species data
- Historical climate, precipitation, freeze/frost data
There are over 100 XML feeds, APIs, and other data sources at http://www.usgovxml.com/, including other topics such as court findings, treasury information, social security data, etc. The list above is just a sampling.
Besides data available at the government level, let’s not also forget public records data at the state and county levels. Publicly available information includes:
- Abandoned and Unclaimed Properties
- Grants
- Criminal records and data
- Real estate sales, transfers, and deeds
- Marriage/divorce/childbirth records
- Most Wanted and Fugitive data
- Obituaries
- Stolen vehicles/boats data
- Historical Newspaper Headline data
- Missing persons
- Economical data (exports, demographics, property values, salaries, etc.)
- Tourism data
A link to each individual state government website can be found on USA.gov.
What happens if we cross reference traffic patterns with age of people driving, or motorcycle licenses with crash reports, so we can better educate our motorcycle riding population and reduce the number of injuries? What kinds of motorcycle crashes are the most common? What part of the country has the lowest motorcycle fatality rate? Does average age of riders or weather have an effect on this? What makes them successful? What can we learn from that?
What happens if we overlap sports statistics with education? Or what if we look at health records and divorce records versus marriage licenses? Does a population with a high divorce rate also die earlier? Does it go to the hospital more? Are there diseases that are more common in married people versus single people? What if we referenced dog ownership with divorce rates? Or use health records as an indicator of where dog therapy programs should be?
What if we wanted to write an app that would help veterans or homeless advocates find abandoned and unclaimed properties, and match them with available grants? Or what if we wanted to help photographers get that perfect shot by matching up sunset times and beach locations with weather patterns, so they know when and where they can find a perfect cloud-free beach sunset?
What things can we predict? What correlations can be made? How can we help people do their jobs better or live better lives? How can we use our skills to make data available to others? What other data should we have access to, and who can we talk to about that?
It should also be known that there are hundreds of other data sources for other countries, not just the US. Some other resources for you, if you’re interested in looking at international data:
Knowledge is power, especially when it can be backed by looking at data in a new and exciting way. It’s open and waiting for all of us. What will you do with it?
Share your thoughts with @engineyard on Twitter