Hadoop Security: Protecting Your Big Data Platform

As more corporations turn to Hadoop to store and process their most valuable data, the risk of a potential breach of those systems increases exponentially. This practical book not only shows Hadoop administrators and security architects how to protect Hadoop data from unauthorized access, it also shows how to limit the ability of an attacker to corrupt or modify data in the event of a security breach.

Authors Ben Spivey and Joey Echeverria provide in-depth information about the security features available in Hadoop, and organize them according to common computer security concepts. You’ll also get real-world examples that demonstrate how you can apply these concepts to your use cases.

  • Understand the challenges of securing distributed systems, particularly Hadoop
  • Use best practices for preparing Hadoop cluster hardware as securely as possible
  • Get an overview of the Kerberos network authentication protocol
  • Delve into authorization and accounting principles as they apply to Hadoop
  • Learn how to use mechanisms to protect data in a Hadoop cluster, both in transit and at rest
  • Integrate Hadoop data ingest into enterprise-wide security architecture
  • Ensure that security architecture reaches all the way to end-user access
1120132547
Hadoop Security: Protecting Your Big Data Platform

As more corporations turn to Hadoop to store and process their most valuable data, the risk of a potential breach of those systems increases exponentially. This practical book not only shows Hadoop administrators and security architects how to protect Hadoop data from unauthorized access, it also shows how to limit the ability of an attacker to corrupt or modify data in the event of a security breach.

Authors Ben Spivey and Joey Echeverria provide in-depth information about the security features available in Hadoop, and organize them according to common computer security concepts. You’ll also get real-world examples that demonstrate how you can apply these concepts to your use cases.

  • Understand the challenges of securing distributed systems, particularly Hadoop
  • Use best practices for preparing Hadoop cluster hardware as securely as possible
  • Get an overview of the Kerberos network authentication protocol
  • Delve into authorization and accounting principles as they apply to Hadoop
  • Learn how to use mechanisms to protect data in a Hadoop cluster, both in transit and at rest
  • Integrate Hadoop data ingest into enterprise-wide security architecture
  • Ensure that security architecture reaches all the way to end-user access
42.99 In Stock
Hadoop Security: Protecting Your Big Data Platform

Hadoop Security: Protecting Your Big Data Platform

Hadoop Security: Protecting Your Big Data Platform

Hadoop Security: Protecting Your Big Data Platform

eBook

$42.99 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

As more corporations turn to Hadoop to store and process their most valuable data, the risk of a potential breach of those systems increases exponentially. This practical book not only shows Hadoop administrators and security architects how to protect Hadoop data from unauthorized access, it also shows how to limit the ability of an attacker to corrupt or modify data in the event of a security breach.

Authors Ben Spivey and Joey Echeverria provide in-depth information about the security features available in Hadoop, and organize them according to common computer security concepts. You’ll also get real-world examples that demonstrate how you can apply these concepts to your use cases.

  • Understand the challenges of securing distributed systems, particularly Hadoop
  • Use best practices for preparing Hadoop cluster hardware as securely as possible
  • Get an overview of the Kerberos network authentication protocol
  • Delve into authorization and accounting principles as they apply to Hadoop
  • Learn how to use mechanisms to protect data in a Hadoop cluster, both in transit and at rest
  • Integrate Hadoop data ingest into enterprise-wide security architecture
  • Ensure that security architecture reaches all the way to end-user access

Product Details

ISBN-13: 9781491901342
Publisher: O'Reilly Media, Incorporated
Publication date: 06/29/2015
Sold by: Barnes & Noble
Format: eBook
Pages: 340
File size: 2 MB

About the Author

Ben is currently a Solutions Architect at Cloudera. During his time with Cloudera, he has worked in a consulting capacity to assist customers with their Hadoop deployments. Ben has worked with many Fortune 500 companies across multiple industries, including financial services, retail, and health care. His primary expertise is the planning, installation, configuration, and securing of customers' Hadoop clusters.

In addition to consulting responsibilities, Ben contributes a vast amount of technical writing on customer document deliverables, to include Hadoop best practices, security integration, and cluster administration.

Prior to Cloudera, Ben worked for the National Security Agency and with a defense contractor as a software engineer. During this time, Ben built applications that, among other things, integrated with enterprise security infrastructure to protect sensitive information.

Ben holds a Bachelor’s degree in Computer Science and a Master’s degree in Information Technology, with a focus on Information Assurance. Ben’s final Master’s project was focused on designing an enterprise IT infrastructure with a defense-in-depth approach.


Joey Echeverria is a Software Engineer at ScalingData where he builds the next generation of IT Operations Analytics on the Apache Hadoop platform. Joey is also a committer on the Kite SDK, an Apache-licensed data API for the Hadoop ecosystem. Joey was previously a Software Engineer at Cloudera where he contributed to a number of ASF projects including Apache Flume, Apache Sqoop, Apache Hadoop, and Apache HBase.

While at Cloudera, Joey also served as the Director of Federal FieldTechnical Services, overseeing the public sector Professional Servicesand Systems Engineering teams. Joey started at Cloudera as a SolutionsArchitect, where he helped customers to design, develop, and deployproduction Hadoop applications and clusters. When needed, he has also filled in for Cloudera’s support and training teams and has taughtCloudera’s administrator and Apache HBase courses.



Joey’s background is in building and deploying secure data processing applications, with the last 7 years focused on Hadoop-based applications. In the past, he has worked on resource constrained data processing, a clustered implementation of the Snort intrusiondetection system, and he built a distributed index system on Hadoopwhen he worked for NSA.

Table of Contents

Foreword ix

Preface xi

1 Introduction 1

Security Overview 2

Confidentiality 2

Integrity 3

Availability 3

Authentication, Authorization, and Accounting 3

Hadoop Security: A Brief History 6

Hadoop Components and Ecosystem 7

Apache HDFS 8

Apache YARN 9

Apache MapReduce 10

Apache Hive 12

Cloudera Impala 13

Apache Sentry (Incubating) 14

Apache HBase 14

Apache Accumulo 15

Apache Solr 17

Apache Oozie 17

Apache ZooKeeper 17

Apache Flume 18

Apache Sqoop 18

Cloudera Hue 19

Summary 19

Part I Security Architecture

2 Securing Distributed Systems 23

Threat Categories 24

Unauthorized Access/Masquerade 24

Insider Threat 25

Denial of Service 25

Threat and Risk Assessment 26

User Assessment 27

Environment Assessment 27

Vulnerabilities 28

Defense in Depth 29

Summary 30

3 System Architecture 31

Operating Environment 31

Network Security 32

Network Segmentation 32

Network Firewalls 33

Intrusion Detection and Prevention 35

Hadoop Roles and Separation Strategies 38

Master Nodes 39

Worker Nodes 40

Management Nodes 41

Edge Nodes 42

Operating System Security 43

Remote Access Controls 43

Host Firewalls 44

SELinux 47

Summary 48

4 Kerberos 49

Why Kerberos? 49

Kerberos Overview 50

Kerberos Workflow: A Simple Example 52

Kerberos Trusts 54

MIT Kerberos 55

Server Configuration 58

Client Configuration 61

Summary 63

Part II Authentication, Authorization, and Accounting

5 Identity and Authentication 67

Identity 67

Mapping Kerberos Principals to Usernames 68

Hadoop User to Group Mapping 70

Provisioning of Hadoop Users 75

Authentication 75

Kerberos 76

Username and Password Authentication 77

Tokens 78

Impersonation 82

Configuration 83

Summary 96

6 Authorization 97

HDFS Authorization 97

HDFS Extended ACLs 99

Service-Level Authorization 101

MapReduce and YARN Authorization 114

MapReduce (MR1) 115

YARN (MR2) 117

ZooKeeper ACLs 123

Oozie Authorization 125

HBase and Accumulo Authorization 126

System, Namespace, and Table-Level Authorization 127

Column-and Cell-Level Authorization 132

Summary 132

7 Apache Sentry (Incubating) 135

Sentry Concepts 135

The Sentry Service 137

Sentry Service Configuration 138

Hive Authorization 141

Hive Sentry Configuration 143

Impala Authorization 148

Impala Sentry Configuration 148

Solr Authorization 150

Solr Sentry Configuration 150

Sentry Privilege Models 152

SQL Privilege Model 152

Solr Privilege Model 156

Sentry Policy Administration 158

SQL Commands 159

SQL Policy File 162

Solr Policy File 165

Policy File Verification and Validation 166

Migrating From Policy Files 169

Summary 169

8 Accounting 171

HDFS Audit Logs 172

MapReduce Audit Logs 174

YARN Audit Logs 176

Hive Audit Logs 178

Cloudera Impala Audit Logs 179

HBase Audit Logs 180

Accumulo Audit Logs 181

Sentry Audit Logs 185

Log Aggregation 186

Summary 187

Part III Data Secutiry

9 Data Protection 191

Encryption Algorithms 191

Encrypting Data at Rest 192

Encryption and Key Management 193

HDFS Data-at-Rest Encryption 194

MapReduce2 Intermediate Data Encryption 201

Impala Disk Spill Encryption 202

Full Disk Encyption 202

Filesystem Encryption 205

Important Data Security Consideration for Hadoop 206

Encrypting Data in Transit 207

Transport Layer Security 207

Hadoop Data-In-Transit Encryption 209

Data Destruction and Deletion 215

Summary 216

10 Securing Data Ingest 217

Integrity of Ingested Data 219

Data Ingest Confidentiality 220

Flume Encryption 221

Sqoop Encryption 229

Ingest Workflows 234

Enterprise Architecture 235

Summary 236

11 Data Extraction and Client Assess Security 239

Hadoop Command-Line Interface 241

Securing Applications 242

HBase 243

HBase Shell 244

HBase REST Gateway 245

HBase Thrift Gateway 249

Accumulo 251

Accumulo Shell 251

Accumulo Proxy Server 252

Oozie 253

Sqoop 255

SQL Access 256

Impala 256

Hive 263

WebHDFS/HttpFS 272

Summary 274

12 Cloudera Hue 275

Hue HTTPS 277

Hue Authentication 277

SPNEGO Backend 278

SAML Backend 279

LDAP Backend 282

Hue Authorization 285

Hue SSL Client Configurations 287

Summary 287

Part IV Putting It All Together

13 Case Studies 291

Case Study: Hadoop Data Warehouse 291

Environment Setup 292

User Experience 296

Summary 299

Case Study: Interactive HBase Web Application 300

Design and Architecture 300

Security Requirements 302

Cluster Configuration 303

Implementation Notes 307

Summary 309

Afterword 311

Index 313

From the B&N Reads Blog

Customer Reviews