Data Science with Java: Practical Methods for Scientists and Engineers
Data Science is booming thanks to R and Python, but Java brings the robustness, convenience, and ability to scale critical to today’s data science applications. With this practical book, Java software engineers looking to add data science skills will take a logical journey through the data science pipeline. Author Michael Brzustowicz explains the basic math theory behind each step of the data science process, as well as how to apply these concepts with Java.

You’ll learn the critical roles that data IO, linear algebra, statistics, data operations, learning and prediction, and Hadoop MapReduce play in the process. Throughout this book, you’ll find code examples you can use in your applications.

  • Examine methods for obtaining, cleaning, and arranging data into its purest form
  • Understand the matrix structure that your data should take
  • Learn basic concepts for testing the origin and validity of data
  • Transform your data into stable and usable numerical values
  • Understand supervised and unsupervised learning algorithms, and methods for evaluating their success
  • Get up and running with MapReduce, using customized components suitable for data science algorithms
1124400398
Data Science with Java: Practical Methods for Scientists and Engineers
Data Science is booming thanks to R and Python, but Java brings the robustness, convenience, and ability to scale critical to today’s data science applications. With this practical book, Java software engineers looking to add data science skills will take a logical journey through the data science pipeline. Author Michael Brzustowicz explains the basic math theory behind each step of the data science process, as well as how to apply these concepts with Java.

You’ll learn the critical roles that data IO, linear algebra, statistics, data operations, learning and prediction, and Hadoop MapReduce play in the process. Throughout this book, you’ll find code examples you can use in your applications.

  • Examine methods for obtaining, cleaning, and arranging data into its purest form
  • Understand the matrix structure that your data should take
  • Learn basic concepts for testing the origin and validity of data
  • Transform your data into stable and usable numerical values
  • Understand supervised and unsupervised learning algorithms, and methods for evaluating their success
  • Get up and running with MapReduce, using customized components suitable for data science algorithms
59.99 In Stock
Data Science with Java: Practical Methods for Scientists and Engineers

Data Science with Java: Practical Methods for Scientists and Engineers

by Michael Brzustowicz
Data Science with Java: Practical Methods for Scientists and Engineers

Data Science with Java: Practical Methods for Scientists and Engineers

by Michael Brzustowicz

Paperback

$59.99 
  • SHIP THIS ITEM
    In stock. Ships in 1-2 days.
  • PICK UP IN STORE

    Your local store may have stock of this item.

Related collections and offers


Overview

Data Science is booming thanks to R and Python, but Java brings the robustness, convenience, and ability to scale critical to today’s data science applications. With this practical book, Java software engineers looking to add data science skills will take a logical journey through the data science pipeline. Author Michael Brzustowicz explains the basic math theory behind each step of the data science process, as well as how to apply these concepts with Java.

You’ll learn the critical roles that data IO, linear algebra, statistics, data operations, learning and prediction, and Hadoop MapReduce play in the process. Throughout this book, you’ll find code examples you can use in your applications.

  • Examine methods for obtaining, cleaning, and arranging data into its purest form
  • Understand the matrix structure that your data should take
  • Learn basic concepts for testing the origin and validity of data
  • Transform your data into stable and usable numerical values
  • Understand supervised and unsupervised learning algorithms, and methods for evaluating their success
  • Get up and running with MapReduce, using customized components suitable for data science algorithms

Product Details

ISBN-13: 9781491934111
Publisher: O'Reilly Media, Incorporated
Publication date: 06/25/2017
Pages: 233
Product dimensions: 7.00(w) x 9.10(h) x 0.60(d)

About the Author

Michael Brzustowicz is a physicist turned data scientist. After a PhD from Indiana University, Michael spent his post doctoral years at Stanford Universitywhere he shot high powered Xrays at tiny molecules. Jumping ship from academia, he worked at many startups (including his own) and has been pioneering big data techniques all the way. Michael specializes in building distributed data systems and extracting knowledge from massive data. He spends most of his time writing customized, multithreaded code for statistical modeling and machine learning approaches to everyday big data problems. Michael now teaches Big Data, parttime, at the University of San Francisco.

Table of Contents

1 Data I/O 1

What Is Data, Anyway? 1

Data Models 2

Univariate Arrays 2

Multivariate Arrays 2

Data Objects 3

Matrices and Vectors 4

JSON 4

Dealing with Real Data 5

Nulls 5

Blank Spaces 5

Parse Errors 5

Outliers 6

Managing Data Files 7

Understanding File Contents First 7

Reading from a Text File 8

Reading from a JSON File 11

Reading from an Image File 12

Writing to a Text File 13

Mastering Database Operations 16

Command-Line Clients 16

Structured Query Language 17

Java Database Connectivity 19

Visualizing Data with Plots 22

Creating Simple Plots 22

Plotting Mixed Chart Types 26

Saving a Plot to a File 28

2 Linear Algebra 31

Building Vectors and Matrices 32

Array Storage 33

Block Storage 34

Map Storage 34

Accessing Elements 35

Working with Submatrices 36

Randomization 37

Operating on Vectors and Matrices 38

Scaling 38

Transposing 39

Addition and Subtraction 40

Length 41

Distances 42

Multiplication 43

Inner Product 45

Outer Product 45

Entrywise Product 46

Compound Operations 47

Affine Transformation 48

Mapping a Function 49

Decomposing Matrices 52

Cholesky Decomposition 52

LU Decomposition 53

QR Decomposition 53

Singular Value Decomposition 54

Eigen Decomposition 54

Determinant 56

Inverse 56

Solving Linear Systems 57

3 Statistics 59

The Probabilistic Origins of Data 60

Probability Density 60

Cumulative Probability 61

Statistical Moments 61

Entropy 63

Continuous Distributions 64

Discrete Distributions 75

Characterizing Datasets 81

Calculating Moments 81

Descriptive Statistics 83

Multivariate Statistics 89

Covariance and Correlation 90

Regression 92

Working with Large Datasets 95

Accumulating Statistics 96

Merging Statistics 98

Regression 99

Using Built-in Database Functions 100

4 Data Operations 103

Transforming Text Data 103

Extracting Tokens from a Document 104

Utilizing Dictionaries 104

Vectorizing a Document 106

Scaling and Regularizing Numeric Data 110

Scaling Columns 110

Scaling Rows 112

Matrix Scaling Operator 113

Reducing Data to Principal Components 115

Covariance Method 118

SVD Method 120

Creating Training, Validation, and Test Sets 122

Index-Based Resampling 122

List-Based Resampling 123

Mini-Batches 124

Encoding Labels 125

A Generic Encoder 125

One-Hot Encoding 126

5 Learning and Predictions 129

Learning Algorithms 129

Iterative Learning Procedure 130

Gradient Descent Optimizer 131

Evaluating Learning Processes 133

Minimizing a Loss Function 134

Minimizing the Sum of Variances 142

Silhouette Coefficient 142

Log-Likelihood 144

Classifier Accuracy 144

Unsupervised Learning 146

k-Means Clustering 147

DBSCAN 149

Gaussian Mixtures 153

Supervised Learning 158

Naive Bayes 158

Linear Models 165

Deep Networks 175

6 Hadoop MapReduce 183

Hadoop Distributed File System 183

MapReduce Architecture 184

Writing MapReduce Applications 185

Anatomy of a MapReduce Job 186

Hadoop Data Types 186

Mappers 189

Reducers 190

The Simplicity of a JSON String as Text 191

Deployment Wizardry 192

MapReduce Examples 194

Word Count 194

Custom Word Count 195

Sparse Linear Algebra 196

A Datasets 201

Index 211

From the B&N Reads Blog

Customer Reviews