Table of Contents
Chapter 1
Introduction
DATA SCIENCE
BIG DATA
JULIA
JULIA PACKAGES
R PACKAGES
DATASETS
Overview
Beer Data
Coffee Data
Leptograpsus Crabs Data
Food Preferences Data
x Data
Iris Data
OUTLINE OF THE CONTENTS OF THIS
MONOGRAPH
Chapter 2
Core Julia
VARIABLE NAMES
TYPES
Numeric
Floats
Strings
Tuples
DATA STRUCTURES
Arrays
Dictionaries
CONTROL FLOW
Compound Expressions
Conditional Evaluation
Loops
Basics
Loop termination
Exception Handling
FUNCTIONS
Chapter 3
Working with Data
DATAFRAMES
CATEGORICAL DATA
IO
USEFUL DATAFRAME FUNCTIONS
SPLIT-APPLY-COMBINE STRATEGY
QUERYJL
Chapter 4
Visualizing Data
GADFLYJL
VISUALIZING UNIVARIATE DATA
DISTRIBUTIONS
VISUALIZING BIVARIATE DATA
ERROR BARS
FACETS
SAVING PLOTS
Chapter 5
Supervised Learning
INTRODUCTION
CROSS-VALIDATION
Overview
K-Fold Cross-Validation
K-NEAREST NEIGHBOURS CLASSIFICATION
CLASSIFICATION AND REGRESSION TREES
Overview
Classification Trees
Regression Trees
Comments
BOOTSTRAP
RANDOM FORESTS
GRADIENT BOOSTING
Overview
Beer Data
Food Data
COMMENTS
Chapter 6
Unsupervised Learning
INTRODUCTION
PRINCIPAL COMPONENTS ANALYSIS
PROBABILISTIC PRINCIPAL COMPONENTS
ANALYSIS
EM ALGORITHM FOR PPCA
Background: EM Algorithm
E-step
M-step
Woodbury Identity
Initialization
Stopping Rule
Implementing the EM Algorithm for
PPCA
K-MEANS CLUSTERING
MIXTURE OF PPCAS
Model
Parameter Estimation
Illustrative Example: Coffee Data
Chapter 7
R Interoperability
ACCESSING R DATASETS
INTERACTING WITH R
EXAMPLE: CLUSTERING AND DATA REDUCTION
FOR THE COFFEE DATA
Coffee Data
PGMM Analysis
VSCC Analysis
EXAMPLE: FOOD DATA
Overview
Random Forests