Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump

Organizations invest incredible amounts of time and money obtaining and then storing big data in data stores called data lakes. But how many of these organizations can actually get the data back out in a useable form? Very few can turn the data lake into an information gold mine. Most wind up with garbage dumps. Data Lake Architecture will explain how to build a useful data lake, where data scientists and data analysts can solve business challenges and identify new business opportunities. Learn how to structure data lakes as well as analog, application, and text-based data ponds to provide maximum business value. Understand the role of the raw data pond and when to use an archival data pond. Leverage the four key ingredients for data lake success metadata, integration mapping, context, and metaprocess. Bill Inmon opened our eyes to the architecture and benefits of a data warehouse, and now he takes us to the next level of data lake architecture.

1126722023
Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump

Organizations invest incredible amounts of time and money obtaining and then storing big data in data stores called data lakes. But how many of these organizations can actually get the data back out in a useable form? Very few can turn the data lake into an information gold mine. Most wind up with garbage dumps. Data Lake Architecture will explain how to build a useful data lake, where data scientists and data analysts can solve business challenges and identify new business opportunities. Learn how to structure data lakes as well as analog, application, and text-based data ponds to provide maximum business value. Understand the role of the raw data pond and when to use an archival data pond. Leverage the four key ingredients for data lake success metadata, integration mapping, context, and metaprocess. Bill Inmon opened our eyes to the architecture and benefits of a data warehouse, and now he takes us to the next level of data lake architecture.

24.95 In Stock
Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump

Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump

by Bill Inmon
Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump

Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump

by Bill Inmon

Paperback

$24.95 
  • SHIP THIS ITEM
    In stock. Ships in 1-2 days.
  • PICK UP IN STORE

    Your local store may have stock of this item.

Related collections and offers


Overview

Organizations invest incredible amounts of time and money obtaining and then storing big data in data stores called data lakes. But how many of these organizations can actually get the data back out in a useable form? Very few can turn the data lake into an information gold mine. Most wind up with garbage dumps. Data Lake Architecture will explain how to build a useful data lake, where data scientists and data analysts can solve business challenges and identify new business opportunities. Learn how to structure data lakes as well as analog, application, and text-based data ponds to provide maximum business value. Understand the role of the raw data pond and when to use an archival data pond. Leverage the four key ingredients for data lake success metadata, integration mapping, context, and metaprocess. Bill Inmon opened our eyes to the architecture and benefits of a data warehouse, and now he takes us to the next level of data lake architecture.


Product Details

ISBN-13: 9781634621175
Publisher: Technics Publications
Publication date: 04/29/2016
Pages: 168
Product dimensions: 6.00(w) x 8.90(h) x 0.40(d)
Language: Spanish

Table of Contents

Introduction 1

Chapter 1 Data Lakes 5

Enter Big Data 5

Enter the Data Lake 6

"One Way" Data Lake 7

In Summary 10

Chapter 2 Transforming the Data Lake 13

Metadata 13

Integration Mapping 14

Context 15

Metaprocess 16

Data Scientist 17

General Usability 18

In Summary 19

Chapter 3 Inside the Data lake 21

Analog Data 22

Application Data 24

Textual Data 26

Another Perspective 28

In Summary 29

Chapter 4 Data Ponds 31

Conditioning Data 32

Raw Data Pond 32

Analog Data Pond 34

Application Data Pond 34

Textual Data Pond 34

Data Passing Directly Into the Data Ponds 35

Archival Data Pond 36

In Summary 36

Chapter 5 Generic Structure of the Data Pond 39

Pond Descriptor 40

Pond Target 41

Pond Data 42

Pond Metadata 43

Pond Metaprocess 44

Pond Transformation Criteria 45

In Summary 46

Chapter 6 Analog Data Pond 47

Analog Data Issues 47

Data Descriptor 48

Capturing Raw Data/Transforming Raw Data 49

Transforming/Conditioning Raw Analog Data 50

Data Excision 53

Clustering Data 54

Data Relationships 55

Probability of Future Usage 57

Outliers 58

Specialized Ad Hoc Analysis 60

In Summary 61

Chapter 7 Application Data Pond 63

DNA of Data 63

Descriptors 64

Standard Database Format 65

Basic Organization of Data 66

Integration of Data 67

Data Model 67

Necessity of Integration 69

Pointing From one Application to the Next 71

Intersecting Applications 72

Subsets of Data in the Application Data Pond 73

In Summary 74

Chapter 8 Textual Data Pond 77

Uniform Data and the Computer 77

Valuable Text 78

Textual Disambiguation 78

Text Sent to the Data Pond 79

Output of Textual Disambiguation 80

Inherent Complexity 82

Textual Disambiguation Functionality 84

Taxonomies and Ontologies 84

Value of Text and Context 86

Tracing Text Back to the Source 87

Mechanics of Disambiguation 87

Analyzing the Database 89

Visualizing the Results 90

In Summary 91

Chapter 9 Comparing the Ponds 93

Similarities Across the Data Ponds 93

Dissimilarities Across the Data Ponds 94

Relational Format for Final State Data 94

Technology Differences 95

Total Expected Volume of Data in the Data Pond 95

Moving Data From Pond to Pond 96

Doing Analysis From Multiple Ponds 97

Using Metadata to Relate Data From Different Ponds 98

What if…? 98

In Summary 100

Chapter 10 Using the Infrastructure 101

"One Way" Data Lake 102

Transforming the Data Lake 102

Transformation Technology 103

Some Analytical Questions 104

Querying Textual Data 107

Real Analysis 108

In Summary 109

Chapter 11 Search and Analysis 111

Confusion Spread by the Vendors 117

In Summary 118

Chapter 12 Business Value in the Data Ponds 119

Business Value in the Analog Data Pond 119

Business Value in the Application Data Pond 121

Business Value in the Textual Data Pond 122

Percent of Records That Have Business Value 124

In Summary 125

Chapter 13 Additional Topics 127

High System Level Documentation 128

Detailed Data Pond Level Documentation 128

What Data Flows Into the Data Lake/Data Pond? 128

Where Does Analysis Occur? 130

The age of Data 133

Security of Data 134

In Summary 134

Chapter 14 Analytical and Integration Tools 137

Visualization 137

Search and Qualify 138

Textual Disambiguation 139

Statistical Analysis 140

Classical ETL Processing 140

In Summary 141

Chapter 15 Archiving Data Ponds 143

Criteria for Removal 143

Structural Alteration 144

Creating Independent Indexes for Archival Data 145

In Summary 145

Glossary 147

References 151

Index 155

From the B&N Reads Blog

Customer Reviews