Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance.

Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more
Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot
Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment
Tie everything together into a repeatable machine learning operations pipeline
Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka
Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

1137894249

Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines

Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more
Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot
Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment
Tie everything together into a repeatable machine learning operations pipeline
Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka
Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

79.99 In Stock

Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines

Add to Wishlist

Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines

Paperback

$79.99

View All Available Formats & Editions

Paperback
$79.99

View All Available Formats & Editions

SHIP THIS ITEM

In stock. Ships in 1-2 days.
PICK UP IN STORE

Your local store may have stock of this item.

Available within 2 business hours

Want it Today?
Check Store Availability

Related collections and offers

Overview

Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more
Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot
Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment
Tie everything together into a repeatable machine learning operations pipeline
Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka
Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

Product Details

ISBN-13:	9781492079392
Publisher:	O'Reilly Media, Incorporated
Publication date:	05/11/2021
Pages:	521
Product dimensions:	7.00(w) x 9.19(h) x 1.05(d)

About the Author

Chris Fregly is a Developer Advocate for AI and Machine Learning at AWS, based in San Francisco, California. He is also the founder of the Advanced Spark, TensorFlow, and KubeFlow Meetup Series based in San Francisco. Chris regularly speaks at AI and Machine Learning conferences across the world including the O’Reilly AI, Strata, and Velocity Conferences. Previously, Chris was Founder at PipelineAI where he worked with many AI-first startups and enterprises to continuously deploy ML/AI Pipelines using Apache Spark ML, Kubernetes, TensorFlow, Kubeflow, Amazon EKS, and Amazon SageMaker. He is also the author of the O’Reilly Online Training Series “High Performance TensorFlow in Production with GPUs”

Antje Barth is a Developer Advocate for AI and Machine Learning at AWS, based in Düsseldorf, Germany. She is also co-founder of the Düsseldorf chapter of Women in Big Data Meetup. Antje frequently speaks at AI and Machine Learning conferences and meetups around the world, including the O’Reilly AI and Strata conferences. Besides ML/AI, Antje is passionate about helping developers leverage Big Data, container and Kubernetes platforms in the context of AI and Machine Learning. Prior to joining AWS, Antje worked in technical evangelist and solutions engineering roles at MapR and Cisco.

Preface xiii

1 Introduction to Data Science on AWS 1

Benefits of Cloud Computing 1

Data Science Pipelines and Workflows 4

MLOps Best Practices 7

Amazon AI Services and AutoML with Amazon SageMaker 10

Data Ingestion, Exploration, and Preparation in AWS 13

Model Training and Tuning with Amazon SageMaker 18

Model Deployment with Amazon SageMaker and AWS Lambda Functions 21

Streaming Analytics and Machine Learning on AWS 21

AWS Infrastructure and Custom-Built Hardware 23

Reduce Cost with Tags, Budgets, and Alerts 26

Summary 26

2 Data Science Use Cases 29

Innovation Across Every Industry 29

Personalized Product Recommendations 30

Detect Inappropriate Videos with Amazon Rekognition 36

Demand Forecasting 38

Identify Fake Accounts with Amazon Fraud Detector 42

Enable Privacy-Leak Detection with Amazon Macie 43

Conversational Devices and Voice Assistants 44

Text Analysis and Natural Language Processing 45

Cognitive Search and Natural Language Understanding 50

Intelligent Customer Support Centers 51

Industrial AI Services and Predictive Maintenance 52

Home Automation with AWS IoT and Amazon SageMaker 53

Extract Medical Information from Healthcare Documents 54

Self-Optimizing and Intelligent Cloud Infrastructure 55

Cognitive and Predictive Business Intelligence 56

Educating the Next Generation of AI and ML Developers 60

Program Nature's Operating System with Quantum Computing 65

Increase Performance and Reduce Cost 70

Summary 73

3 Automated Machine Learning 75

Automated Machine Learning with SageMaker Autopilot 76

Track Experiments with SageMaker Autopilot 78

Train and Deploy a Text Classifier with SageMaker Autopilot 78

Automated Machine Learning with Amazon Comprehend 91

Summary 95

4 Ingest Data into the Cloud 97

Data Lakes 98

Query the Amazon S3 Data Lake with Amazon Athena 105

Continuously Ingest New Data with AWS Glue Crawler 109

Build a Lake House with Amazon Redshift Spectrum 111

Choose Between Amazon Athena and Amazon Redshift 118

Reduce Cost and Increase Performance 119

Summary 126

5 Explore the Dataset 127

Tools for Exploring Data in AWS 128

Visualize Our Data Lake with SageMaker Studio 129

Query Our Data Warehouse 142

Create Dashboards with Amazon QuickSight 150

Detect Data-Quality Issues with Amazon SageMaker and Apache Spark 151

Detect Bias in Our Dataset 159

Detect Different Types of Drift with SageMaker Clarify 166

Analyze Our Data with AWS Glue DataBrew 168

Reduce Cost and Increase Performance 170

Summary 172

6 Prepare the Dataset for Model Training 173

Perform Feature Selection and Engineering 173

Scale Feature Engineering with SageMaker Processing Jobs 187

Share Features Through SageMaker Feature Store 194

Ingest and Transform Data with SageMaker Data Wrangler 198

Track Artifact and Experiment Lineage with Amazon SageMaker 199

Ingest and Transform Data with AWS Glue DataBrew 204

Summary 206

7 Train Your First Model 207

Understand the SageMaker Infrastructure 207

Deploy a Pre-Trained BERT Model with SageMaker JumpStart 212

Develop a SageMaker Model 214

A Brief History of Natural Language Processing 216

BERT Transformer Architecture 219

Training BERT from Scratch 221

Fine Tune a Pre-Trained BERT Model 223

Create the Training Script 226

Launch the Training Script from a SageMaker Notebook 232

Evaluate Models 239

Debug and Profile Model Training with SageMaker Debugger 245

Interpret and Explain Model Predictions 249

Detect Model Bias and Explain Predictions 255

More Training Options for BERT 259

Reduce Cost and Increase Performance 268

Summary 274

8 Train and Optimize Models at Scale 277

Automatically Find the Best Model Hyper-Parameters 277

Use Warm Start for Additional SageMaker Hyper-Parameter Tuning Jobs 284

Scale Out with SageMaker Distributed Training 288

Reduce Cost and Increase Performance 296

Summary 300

9 Deploy Models to Production 301

Choose Real-Time or Batch Predictions 301

Real-Time Predictions with SageMaker Endpoints 302

Auto-Scale SageMaker Endpoints Using Amazon CloudWatch 310

Strategies to Deploy New and Updated Models 315

Testing and Comparing New Models 319

Monitor Model Performance and Detect Drift 331

Monitor Data Quality of Deployed SageMaker Endpoints 335

Monitor Model Quality of Deployed SageMaker Endpoints 341

Monitor Bias Drift of Deployed SageMaker Endpoints 345

Monitor Feature Attribution Drift of Deployed SageMaker Endpoints 348

Perform Batch Predictions with SageMaker Batch Transform 351

AWS Lambda Functions and Amazon API Gateway 356

Optimize and Manage Models at the Edge 357

Deploy a PyTorch Model with TorchServe 357

TensorFlow-BERT Inference with AWS Deep Java Library 360

Reduce Cost and Increase Performance 362

Summary 367

10 Pipelines and MLOps 369

Machine Learning Operations 369

Software Pipelines 371

Machine Learning Pipelines 371

Pipeline Orchestration with SageMaker Pipelines 375

Automation with SageMaker Pipelines 386

More Pipeline Options 391

Human-in-the-Loop Workflows 400

Reduce Cost and Improve Performance 406

Summary 407

11 Streaming Analytic and Machine Learning 409

Online Learning Versus Offline Learning 410

Streaming Applications 410

Windowed Queries on Streaming Data 411

Streaming Analytics and Machine Learning on AWS 415

Classify Real-Time Product Reviews with Amazon Kinesis, AWS Lambda, and Amazon SageMaker 417

Implement Streaming Data Ingest Using Amazon Kinesis Data Firehose 418

Summarize Real-Time Product Reviews with Streaming Analytics 422

Setting Up Amazon Kinesis Data Analytics 424

Amazon Kinesis Data Analytics Applications 432

Classify Product Reviews with Apache Kafka, AWS Lambda, and Amazon SageMaker 439

Reduce Cost and Improve Performance 440

Summary 442

12 Secure Data Science on AWS 443

Shared Responsibility Model Between AWS and Customers 443

Applying AWS Identity and Access Management 444

Isolating Compute and Network Environments 452

Securing Amazon S3 Data Access 455

Encryption at Rest 463

Encryption in Transit 467

Securing SageMaker Notebook Instances 469

Securing SageMaker Studio 471

Securing SageMaker Jobs and Models 473

Securing AWS Lake Formation 477

Securing Database Credentials with AWS Secrets Manager 478

Governance 478

Auditability 481

Reduce Cost and Improve Performance 483

Summary 485

Index 487

From the B&N Reads Blog

Page 1 of

Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines

Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines

Paperback

Paperback

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Related Subjects

Customer Reviews