Read an Excerpt
Open Data Now
The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation
By Joel Gurin
McGraw-Hill EducationCopyright © 2014 Joel Gurin
All rights reserved.
An Opportunity as Big as the Web
In November 2012 I was sitting in a packed conference room at the brand-new Open Data Institute (ODI) in London, a public-private partnership launched with a 10-million-pound grant from the British government. The ODI has all the look and feel of a well-funded tech startup. The Institute is situated in London's equivalent of Silicon Valley, known as Silicon Roundabout for the Old Street roundabout where the nearest tube station is. It's an area much like the tech hives in San Francisco's South of Market neighborhood or Silicon Alley in lower Manhattan—underdeveloped neighborhoods with ample space for new companies to stake out their territory, innovate like mad, and drive up the cost of real estate. While this international gathering happened to be in London, it could have been in either of those technology centers.
I was at the ODI with two dozen colleagues from business, government, and nonprofit organizations who had come together to talk about Open Data—the new movement to make large amounts of data available for public use. On this November Monday, we were in the second of two back-to-back meetings at the ODI, put together with funding from the MacArthur Foundation. We were meeting as representatives of the White House; 10 Downing Street; the U.K. government's Cabinet Office and its Department of Business Development; the World Bank; a major British retailer; two high-tech consulting firms; a leading tech publisher; university departments of law, computer science, artificial intelligence, physics, and cognitive neuroscience; two foundations; and nonprofits working on corporate transparency, civic engagement, and green business practices.
We were greeted by the Institute's CEO, Gavin Starks, an entrepreneur who has done Internet development for two decades in places as diverse as Virgin, Google, the British government, and UNICEF. He was making the provocative case that Open Data would have the same impact as the invention of the World Wide Web.
Open Data today, he said, "is very much like the web was for me in 1994, when I was still trying to convince people that e-mail was a good thing or that they might want to launch a website. Everyone was excited about its potential, but no one knew quite what shape it would take. Over the last 20 years we've seen a lot of innovation, creativity, and disruption. Today, we don't know exactly where Open Data will lead, but we do know that it will be transformative. Some ways of doing business will start, some will evolve, and learning how to navigate that will be a challenge for all of us. But the potential that we saw in the early days of the web is what I see now with Open Data."
This isn't the first time a new tech development has been compared to the dawn of the web. But the leaders of the Open Data movement have the credentials to make that claim. The president and cofounder of the Open Data Institute is Sir Tim Berners-Lee, whose bio says, simply and accurately, that he "invented the World Wide Web" while at CERN, the European Particle Physics Laboratory. The ODI's chairman, Sir Nigel Shadbolt, is a pioneer in web science and artificial intelligence and was instrumental in creating the British government's Open Data policies. They are typical of the visionaries in the United States, the United Kingdom, and other countries who are developing Open Data now.
Open Data can best be described as accessible public data that people, companies, and organizations can use to launch new ventures, analyze patterns and trends, make data-driven decisions, and solve complex problems. It's very different from Big Data—more on that in a minute—although the two overlap. Open Data is data with a mission: it's designed to provide free, open, transparent data that can transform the way we do business, run government, and manage all kinds of transactions. Like our gathering at the ODI, the people behind Open Data are a diverse group, including leaders from the corporate world, technology, government, academia, nonprofits, and fields such as health, education, and environmental science.
The Open Data movement began with democratic goals, fueled by the idea that governments should make the data they collect available to the taxpayers who've paid to collect it. But in addition to its social benefits, Open Data has created tremendous new business opportunities, which are the focus of this book. It's worth remembering that the Internet itself began as a government-funded initiative, the ARPANET, created by the Advanced Research Projects Agency that President Eisenhower launched as a response to Sputnik. That government research project became one of the major economic drivers of our time. In a similar way, government's drive to release Open Data is creating a major economic resource and the infrastructure to manage it.
The Open Data policies developed by the U.S. and U.K. governments are driven by a push for economic growth and job creation. President Obama made this clear when he announced his administration's new Open Data Policy in May 2013. This policy, which will make unprecedented amounts of federal data available in highly usable forms, has a business agenda first and foremost. Significantly, the president didn't make his announcement at a Washington press conference or in the Rose Garden but on a visit to a technology center in Austin, Texas. There he promised that government Open Data is going to help launch new businesses of all kinds in ways "that we haven't even imagined yet."
The Open Data Policy includes a detailed description of the criteria for government data to be released as Open Data, drawing on work done by the Open Knowledge Foundation in the United Kingdom, the Washington-based Sunlight Foundation, and others. This book goes further: the Open Data I'm writing about includes data from other sources as well as government.
I use "Open Data" to include data from any source that's made available in an "open" form that anyone can access and that meets a few specific conditions. All Open Data must be licensed in a way that allows for its reuse. It should be in a form that can be easily read by computers, although here there are gradations of "openness." And there's general agreement that Open Data should be free of charge or cost just a minimal amount.
Open Data includes federal, state, and local data; scientific data released by researchers; data that companies release about their own operations; user reviews and tweets written by ordinary people; and any kind of data that can be found through Google or scraped from websites. By using these many kinds of Open Data:
Entrepreneurs are building new businesses that generate many millions of dollars in revenue. Open Data released by the National Oceanic and Atmospheric Administration beginning in the 1970s and GPS data released more recently spawned new industries that do billions of dollars in business each year. New businesses using open health data may soon match that, and opportunities in energy, finance, education, and other fields are increasing as well.
Governments are providing new, centralized data resources for business development. Data.gov, a website launched by the Obama administration, now makes hundreds of thousands of government datasets open and available for anyone to use for free. The United Kingdom has launched its own version, Data.gov.uk, and other countries are using a platform distributed by the United States as "Data.gov in a box" to start their own data hubs.
Companies are developing new marketing strategies, evaluating competitors and partners more accurately, and building their brands' value. The new technique of sentiment analysis gathers information from Twitter, blogs, news feeds, and other public sources, uses text analysis to turn this information into Open Data, and turns the mass of public opinion into quantifiable business insights.
Investors are finding companies with the greatest promise and avoiding those that pose high levels of risk. Through new data-driven websites, investors can quickly get in-depth information on large and small companies. Open Data is giving investors new insights into companies ranging from innovative startups to globally traded public corporations through websites that provide online tools and data visualizations.
Companies are becoming more transparent about their operations, to their benefit. Between government-required disclosures and voluntary reporting, companies are making more Open Data available about their environmental, social, and governance practices. By releasing this data, a company can attract new investment, recruit more effectively, and improve its corporate image.
Scientists and researchers are accelerating the pace of new discoveries. In the physical sciences and biomedicine, researchers are taking the bold step of both sharing their data early and openly so that online networks of both experts and amateurs can work with their data to achieve new breakthroughs. Even the secretive world of drug research is beginning to make more data public.
Websites are helping consumers make better, more informed choices for all kinds of products and services. New businesses are developing online and mobile "choice engines" that give consumers the data they need to make complex, important decisions. They help consumers access detailed, interactive Open Data to choose the options that are best for them, whether they're choosing healthcare, a mortgage, a credit card, or a college education.
Open Data vs. Big Data: Related but Very Different
Open Data should not be confused with Big Data, one of the most talked-about developments in information science over the last few years. Big Data involves processing very large datasets to identify patterns and connections in the data. It's made possible by the incredible amount of data that is generated, accumulated, and analyzed every day with the help of ever-increasing computer power and ever-cheaper data storage. It uses the "data exhaust" that all of us leave behind through our daily lives. Our mobile phones' GPS systems report back on our location as we drive; credit card purchase records show what we buy and where; Google searches are tracked; smart meters in our homes record our energy usage. All are grist for the Big Data mill.
While Big Data and Open Data each have important commercial uses, they are very different in philosophy, goals, and practice. For example, large companies may use Big Data to analyze customer databases and target their marketing to individual customers, while they use Open Data for market intelligence and brand building. National governments may use Big Data to track citizens in the name of security, while they use Open Data to engage with their citizens and foster participatory democracy. It's telling that the recent book Big Data, the best general presentation of the field, devotes only two-and-a-half pages to Open Data. The two are not the same.
With Big Data, the data sources are generally passive, and the data is often kept private. Big Data usually comes from sources that passively generate data without purpose, without direction, or without even realizing that they're creating it. And the companies and organizations that use Big Data usually keep the data private for business or security reasons. This includes the data that large retailers hold on customers' buying habits, that hospitals hold about their patients, that banks hold about their credit card holders, or that government agencies collect about millions of cell-phone calls.
At this writing in the fall of 2013, I've found that every time I mention the word data it triggers a discussion about the National Security Agency and its PRISM program. We're still trying to figure out exactly what data the NSA has collected, how much, and why. The NSA revelations have rekindled a national debate about data privacy, which is a good thing (more on that in Chapter 11). PRISM is a prime example of the disturbing side of Big Data: it's a massive collection of data without the participation, or even the awareness, of the people whose data is being collected, and it's been kept hidden from the public until recently. It's also the antithesis of Open Data. In fact, even the idea of Open Data for national security is an oxymoron.
In contrast to most Big Data, Open Data is public and purposeful. It's data that is consciously released in a way that anyone can access, analyze, and use as he or she sees fit. (I don't count Edward Snowden's revelations as Open Data; to be truly open, data should be released by someone who has the authority to do so, not by someone who has pilfered it.) Open Data is also often released with a specific purpose in mind—whether the goal is to spur research and development, fuel new businesses, improve public health and safety, or achieve any number of other objectives.
Having said all that, Big Data and Open Data do overlap, and when they do, the result can be powerful. Some government agencies have made very large amounts of data open with major economic benefits. National weather data and GPS data are the most often-cited examples. U.S. census data and data collected by the Securities and Exchange Commission are others. And nongovernmental research has produced large amounts of data, particularly in biomedicine, that is now being shared openly to accelerate the pace of scientific discovery.
While Open Data is related to Big Data on one hand, it's also related to the Open Government movement on the other. Open Government includes collaborative strategies to engage citizens in governing as well as the government releasing Open Data to the public. This book's Appendix B, "Defining Data Categories," gives a more detailed analysis of how Big Data, Open Government, and Open Data are related, complete with a Venn diagram.
The Open Business Opportunities
Although there's widespread agreement that both Big Data and Open Data will be important business resources, no one is sure exactly what they'll be worth. Determining the overall value of Open Data is far from easy. Many companies that use it are so new that it's too early to measure their success. On the other hand, many established companies use open government data as just one resource for their work, making it hard to figure out how much it contributes to their business.
The Open Data 500 study, which I'm now directing at the GovLab at New York University, will give economists and other researchers a new information base to help assess Open Data's value. This study, which is funded by the Knight Foundation, is the first real-world, comprehensive study of American companies that use government Open Data in health, finance, education, energy, and other sectors. We're identifying 500 of these companies and surveying them to see how they use government Open Data and how they think government agencies can make their data more useful. We plan to make our findings available on a website by early 2014 where researchers can download our data, new companies can complete our survey, and members of the Open Data community can suggest future research.
To identify different kinds of Open Data companies, my colleagues and I began by looking at other research that had already been done. Since 2012, the Open Data Institute in London has been working with the consulting firm Deloitte to study Open Data's potential. In a series of studies led by Harvey Lewis, a research director in Deloitte's Insight Team, the firm has identified five Open Data business "archetypes":
Suppliers publish their data as Open Data that can be easily used. While they don't charge for the data—if they did, it wouldn't be Open Data—they increase customer loyalty and enhance their reputations by releasing it.
Aggregators collect Open Data, analyze it, and charge for their insights or make money from the data in other ways.
Developers "design, build, and sell web-based, tablet, or smart-phone applications" using Open Data as a free resource.
Enrichers are "typically large, established businesses" that use Open Data to "enhance their existing products and services," for example by using demographic data to understand their customers better.
Enablers charge companies to make it easier for them to use Open Data.
I've found these categories useful and have also come up with two simple categories of my own.
Excerpted from Open Data Now by Joel Gurin. Copyright © 2014 Joel Gurin. Excerpted by permission of McGraw-Hill Education.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.