Beautiful Data: The Stories Behind Elegant Data Solutions

In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video.

With Beautiful Data, you will:

  • Explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web
  • Learn how to visualize trends in urban crime, using maps and data mashups
  • Discover the challenges of designing a data processing system that works within the constraints of space travel
  • Learn how crowdsourcing and transparency have combined to advance the state of drug research
  • Understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data
  • Learn about the massive infrastructure required to create, capture, and process DNA data

That's only small sample of what you'll find in Beautiful Data. For anyone who handles data, this is a truly fascinating book. Contributors include:

  • Nathan Yau
  • Jonathan Follett and Matt Holm
  • J.M. Hughes
  • Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava
  • Jeff Hammerbacher
  • Jason Dykes and Jo Wood
  • Jeff Jonas and Lisa Sokol
  • Jud Valeski
  • Alon Halevy and Jayant Madhavan
  • Aaron Koblin with Valdean Klump
  • Michal Migurski
  • Jeff Heer
  • Coco Krumme
  • Peter Norvig
  • Matt Wood and Ben Blackburne
  • Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen
  • Lukas Biewald and Brendan O'Connor
  • Hadley Wickham, Deborah Swayne, and David Poole
  • Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza
  • Toby Segaran
  • 1110833422
    Beautiful Data: The Stories Behind Elegant Data Solutions

    In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video.

    With Beautiful Data, you will:

    • Explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web
    • Learn how to visualize trends in urban crime, using maps and data mashups
    • Discover the challenges of designing a data processing system that works within the constraints of space travel
    • Learn how crowdsourcing and transparency have combined to advance the state of drug research
    • Understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data
    • Learn about the massive infrastructure required to create, capture, and process DNA data

    That's only small sample of what you'll find in Beautiful Data. For anyone who handles data, this is a truly fascinating book. Contributors include:

  • Nathan Yau
  • Jonathan Follett and Matt Holm
  • J.M. Hughes
  • Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava
  • Jeff Hammerbacher
  • Jason Dykes and Jo Wood
  • Jeff Jonas and Lisa Sokol
  • Jud Valeski
  • Alon Halevy and Jayant Madhavan
  • Aaron Koblin with Valdean Klump
  • Michal Migurski
  • Jeff Heer
  • Coco Krumme
  • Peter Norvig
  • Matt Wood and Ben Blackburne
  • Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen
  • Lukas Biewald and Brendan O'Connor
  • Hadley Wickham, Deborah Swayne, and David Poole
  • Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza
  • Toby Segaran
  • 35.99 In Stock
    Beautiful Data: The Stories Behind Elegant Data Solutions

    Beautiful Data: The Stories Behind Elegant Data Solutions

    Beautiful Data: The Stories Behind Elegant Data Solutions

    Beautiful Data: The Stories Behind Elegant Data Solutions

    eBook

    $35.99 

    Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
    WANT A NOOK?  Explore Now

    Related collections and offers


    Overview

    In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video.

    With Beautiful Data, you will:

    • Explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web
    • Learn how to visualize trends in urban crime, using maps and data mashups
    • Discover the challenges of designing a data processing system that works within the constraints of space travel
    • Learn how crowdsourcing and transparency have combined to advance the state of drug research
    • Understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data
    • Learn about the massive infrastructure required to create, capture, and process DNA data

    That's only small sample of what you'll find in Beautiful Data. For anyone who handles data, this is a truly fascinating book. Contributors include:

  • Nathan Yau
  • Jonathan Follett and Matt Holm
  • J.M. Hughes
  • Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava
  • Jeff Hammerbacher
  • Jason Dykes and Jo Wood
  • Jeff Jonas and Lisa Sokol
  • Jud Valeski
  • Alon Halevy and Jayant Madhavan
  • Aaron Koblin with Valdean Klump
  • Michal Migurski
  • Jeff Heer
  • Coco Krumme
  • Peter Norvig
  • Matt Wood and Ben Blackburne
  • Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen
  • Lukas Biewald and Brendan O'Connor
  • Hadley Wickham, Deborah Swayne, and David Poole
  • Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza
  • Toby Segaran

  • Product Details

    ISBN-13: 9781449379292
    Publisher: O'Reilly Media, Incorporated
    Publication date: 07/14/2009
    Sold by: Barnes & Noble
    Format: eBook
    Pages: 386
    File size: 18 MB
    Note: This product may take a few minutes to download.

    About the Author

    Toby Segaran is the author of Programming Collective Intelligence, a very popular O'Reilly title. He was the founder of Incellico, a biotech software company later acquired by Genstruct. He currently holds the title of Data Magnate at Metaweb Technologies and is a frequent speaker at technology conferences.

    Jeff Hammerbacher is the Vice President of Products and Chief Scientist at Cloudera. Jeff was an Entrepreneur in Residence at Accel Partners immediately prior to joining Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the statistics and machine learning applications at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The team produced several academic papers and two open source projects: Hive, a system for offline analysis built above Hadoop, and Cassandra, a structured storage system on a P2P network. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor's Degree in Mathematics from Harvard University.

    Table of Contents

    Preface xi

    1 Seeing Your Life in Data Nathan Yau 1

    Personal Environmental Impact Report (PEIR) 2

    your.flowingdata (YFD) 3

    Personal Data Collection 3

    Data Storage 5

    Data Processing 6

    Data Visualization 7

    The Point 14

    How to Participate 15

    2 The Beautiful People: Keeping Users in Mind When Designing Data Collection Methods Jonathan Follett Matthew Holm 17

    Introduction: User Empathy Is the New Black 17

    The Project: Surveying Customers About a New Luxury Product 19

    Specific Challenges to Data Collection 19

    Designing Our Solution 21

    Results and Reflection 31

    3 Embedded Image Data Processing on Mars J. M. Hughes 35

    Abstract 35

    Introduction 35

    Some Background 37

    To Pack or Not to Pack 40

    The Three Tasks 42

    Slotting the Images 43

    Passing the Image: Communication Among the Three Tasks 46

    Getting the Picture: Image Download and Processing 48

    Image Compression 50

    Downlink, or, It's All Downhill from Here 52

    Conclusion 52

    4 Cloud Storage Design in a Pnutshell Brian F. Cooper Raghu Ramakrishnan Utkarsh Srivastava 55

    Introduction 55

    Updating Data 57

    Complex Queries 64

    Comparison with Other Systems 68

    Conclusion 71

    5 Information Platforms and the Rise of the Data Scientist Jeff Hammerbacher 73

    Libraries and Brains 73

    Facebook Becomes Self-Aware 74

    A Business Intelligence System 75

    The Death and Rebirth of a Data Warehouse 77

    Beyond the Data Warehouse 78

    The Cheetah and the Elephant 79

    The Unreasonable Effectiveness of Data 80

    New Tools and Applied Research 81

    MAD Skills and Cosmos 82

    Information Platforms As Dataspaces 83

    The Data Scientist 83

    Conclusion 84

    6 The Geographic Beauty of aPhotographic Archive Jason Dykes Jo Wood 85

    Beauty in Data: Geograph 86

    Visualization, Beauty, and Treemaps 89

    A Geographic Perspective on Geograph Term Use 91

    Beauty in Discovery 98

    Reflection and Conclusion 101

    7 Data Finds Data Jeff Jonas Lisa Sokol 105

    Introduction 105

    The Benefits of Just-in-Time Discovery 106

    Corruption at the Roulette Wheel 107

    Enterprise Discoverability 111

    Federated Search Ain't All That 111

    Directories: Priceless 113

    Relevance: What Matters and to Whom? 115

    Components and Special Considerations 115

    Privacy Considerations 118

    Conclusion 118

    8 Portable Data In Real Time Jud Valeski 119

    Introduction 119

    The State of the Art 120

    Social Data Normalization 128

    Conclusion: Mediation via Gnip 131

    9 Surfacing the Deep Web Alon Halevy Jayant Madhaven 133

    What Is the Deep Web? 133

    Alternatives to Offering Deep-Web Access 135

    Conclusion and Future Work 147

    10 Building Radiohead's House of Cards Aaron Koblin Valdean Klump 149

    How It All Started 149

    The Data Capture Equipment 150

    The Advantages of Two Data Capture Systems 154

    The Data 154

    Capturing the Data, aka "The Shoot" 155

    Processing the Data 160

    Post-Processing the Data 160

    Launching the Video 161

    Conclusion 164

    11 Visualizing Urban Data Michal Migurski 167

    Introduction 167

    Background 168

    Cracking the Nut 169

    Making It Public 174

    Revisiting 178

    Conclusion 181

    12 The design of sense.us Jeffrey Heer 183

    Visualization and Social Data Analysis 184

    Data 186

    Visualization 188

    Collaboration 194

    Voyagers and Voyeurs 199

    Conclusion 203

    13 What Data Doesn't do Coco Krumme 205

    When Doesn't Data Drive? 208

    Conclusion 217

    14 Natural Language Corpus Data Peter Norvig 219

    Word Segmentation 221

    Secret Codes 228

    Spelling Correction 234

    Other Tasks 239

    Discussion and Conclusion 240

    15 Life in Data: The Story of DNA Matt Wood Ben Blackburne 243

    DNA As a Data Store 243

    DNA As a Data Source 250

    Fighting the Data Deluge 253

    The Future of DNA 257

    16 Beautifying Data in the Real World Jean-Claude Bradley Rajarshi Guha Andrew Lang Pierre Lindenbaum Cameron Neylon Antony Williams Egon Willighagen 259

    The Problem with Real Data 259

    Providing the Raw Data Back to the Notebook 260

    Validating Crowdsourced Data 262

    Representing the Data Online 263

    Closing the Loop: Visualizations to Suggest New Experiments 271

    New Experiments 271

    Building a Data Web from Open Data and Free Services 274

    17 Superficial Data Analysis: Exploring Millions of Social Stereotypes Brendan O'Connor Lukas Biewald 279

    Introduction 279

    Preprocessing the Data 280

    Exploring the Data 282

    Age, Attractiveness, and Gender 285

    Looking at Tags 290

    Which Words Are Gendered? 294

    Clustering 295

    Conclusion 300

    18 Bay Area Blues: The Effect of the Housing Crisis Hadley Wickham Deborah F. Swayne David Poole 303

    Introduction 303

    How Did We Get the Data? 304

    Geocoding 305

    Data Checking 305

    Analysis 306

    The Influence of Inflation 307

    The Rich Get Richer and the Poor Get Poorer 308

    Geographic Differences 311

    Census Information 314

    Exploring San Francisco 318

    Conclusion 319

    19 Beautiful Political Data Andrew Gelman Jonathan P. Kastellec Yair Ghitza 323

    Example 1: Redistricting and Partisan Bias 324

    Example 2: Time Series of Estimates 326

    Example 3: Age and Voting 328

    Example 4: Public Opinion and Senate Voting on Supreme Court Nominees 328

    Example 5: Localized Partisanship in Pennsylvania 330

    Conclusion 332

    20 Connecting Data Toby Segaran 335

    What Public Data Is There, Really? 336

    The Possibilities of Connected Data 337

    Within Companies 338

    Impediments to Connecting Data 339

    Possible Solutions 343

    Conclusion 348

    Contributors 349

    Index 357

    From the B&N Reads Blog

    Customer Reviews