Kafka in Action

Master the wicked-fast Apache Kafka streaming platform through hands-on examples and real-world projects.

In Kafka in Action you will learn:

Understanding Apache Kafka concepts
Setting up and executing basic ETL tasks using Kafka Connect
Using Kafka as part of a large data project team
Performing administrative tasks
Producing and consuming event streams
Working with Kafka from Java applications
Implementing Kafka as a message queue

Kafka in Action is a fast-paced introduction to every aspect of working with Apache Kafka. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to produce and consume streams of events. Advancing quickly, you’ll soon be ready to use Kafka in your day-to-day workflow, and start digging into even more advanced Kafka topics.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Think of Apache Kafka as a high performance software bus that facilitates event streaming, logging, analytics, and other data pipeline tasks. With Kafka, you can easily build features like operational data monitoring and large-scale event processing into both large and small-scale applications.

About the book
Kafka in Action introduces the core features of Kafka, along with relevant examples of how to use it in real applications. In it, you’ll explore the most common use cases such as logging and managing streaming data. When you’re done, you’ll be ready to handle both basic developer- and admin-based tasks in a Kafka-focused team.

What's inside

Kafka as an event streaming platform
Kafka producers and consumers from Java applications
Kafka as part of a large data project

About the reader
For intermediate Java developers or data engineers. No prior knowledge of Kafka required.

About the author
Dylan Scott is a software developer in the insurance industry. Viktor Gamov is a Kafka-focused developer advocate. At Confluent, Dave Klein helps developers, teams, and enterprises harness the power of event streaming with Apache Kafka.

Table of Contents
PART 1 GETTING STARTED
1 Introduction to Kafka
2 Getting to know Kafka
PART 2 APPLYING KAFK
3 Designing a Kafka project
4 Producers: Sourcing data
5 Consumers: Unlocking data
6 Brokers
7 Topics and partitions
8 Kafka storage
9 Management: Tools and logging
PART 3 GOING FURTHER
10 Protecting Kafka
11 Schema registry
12 Stream processing with Kafka Streams and ksqlDB

Kafka in Action

44.99 In Stock

Kafka in Action

Add to Wishlist

Kafka in Action

Paperback

$44.99

View All Available Formats & Editions

Paperback
$44.99

View All Available Formats & Editions

SHIP THIS ITEM

In stock. Ships in 6-10 days.
PICK UP IN STORE

Your local store may have stock of this item.

Available within 2 business hours

Want it Today?
Check Store Availability

Related collections and offers

Overview

Product Details

ISBN-13:	9781617295232
Publisher:	Manning
Publication date:	02/15/2022
Series:	In Action
Pages:	272
Product dimensions:	7.38(w) x 9.25(h) x 0.60(d)

About the Author

Dylan Scott is a software developer with over ten years of experience in Java and Perl. His experience includes implementing Kafka as a messaging system for a large data migration, and he uses Kafka in his work in the insurance industry.

Viktor Gamov is a developer advocate at Confluent.

Dave Klein is a developer advocate at Confluent, with over 28 years of experience in the technology industry.

Foreword xv

Preface xvi

Acknowledgments xviii

About this book xx

About the authors xxiii

About the cover illustration xxiv

Part 1 Getting Started 1

1 Introduction to Kafka 3

1.1 What is Kafka? 4

1.2 Kafka usage 8

Kafka for the developer 8

Explaining Kafka to your manager 9

1.3 Kafka myths 10

Kafka only works with Hadoop® 10

Kafka is the same as other message brokers 11

1.4 Kafka in the real world 11

Early examples 12

Later examples 13

When Kafka might not be the right fit 14

1.5 Online resources to get started 15

References 15

2 Getting to know Kafka 17

2.1 Producing and consuming a message 18

2.2 What are brokers? 18

2.3 Tour of Kafka 23

Producers and consumers 23

Topics overview 26

ZooKeeper usage 27

Kafka's high-level architecture 28

The commit log 29

2.4 Various source code packages and what they do 30

Kafka Streams 30

Kafka Connect 31

AdminClient package 32

ksqlDB 32

2.5 Confluent clients 33

2.6 Stream processing and terminology 36

Stream processing 37

What exactly-once means 38

References 39

Part 2 Applying Kafka 41

3 Designing a Kafka project 43

3.1 Designing a Kafka project 44

Taking over an existing data architecture 44

A first change 44

Built-in features 44

Data for our invoices 47

3.2 Sensor event design 49

Existing issues 49

Why Kafka is the right fit 51

Thought starters on our design 52

User data requirements 53

High-level plan for applying our questions 54

Reviewing our blueprint 57

3.3 Format of your data 57

Plan for data 58

Dependency setup 59

References 64

4 Producers: Sourcing data 66

4.1 An example 67

Producer notes 70

4.2 Producer options 70

Configuring the broker list 71

How to go fast (or go safer) 72

Timestamps 74

4.3 Generating code for our requirements 76

Client and broker versions 84

References 85

5 Consumers: Unlocking data 87

5.1 An example 88

Consumer options 89

Understanding our coordinates 92

5.2 How consumers interact 96

5.3 Tracking 96

Group coordinator 98

Partition assignment strategy 100

5.4 Marking our place 101

5.5 Reading from a compacted topic 103

5.6 Retrieving code for our factory requirements 103

Reading options 103

Requirements 105

References 108

6 Brokers 111

6.1 Introducing the broker 111

6.2 Role of ZooKeeper 112

6.3 Options at the broker level 113

Kafka's other logs: Application logs 115

Server log 115

Managing state 116

6.4 Partition replica leaders and their role 117

Losing data 119

6.5 Peeking into Kafka 120

Cluster maintenance 121

Adding a broker 122

Upgrading your cluster 122

Upgrading your clients 122

Backups 123

6.6 A note on stateful systems 123

6.7 Exercise 125

References 126

7 Topics and partitions 129

7.1 Topics 129

Topic-creation options 132

Replication factors 134

7.2 Partitions 134

Partition location 135

Viewing our logs 136

7.3 Testing with EmbeddedKafkaCluster 137

Using Kafka Testcontainers 138

7.4 Topic compaction 139

References 142

8 Kafka storage 144

8.1 How long to store data 145

8.2 Data movement 146

Keeping the original event 146

Moving away from a batch mindset 146

8.3 Tools 147

Apache Flume 147

Red Hat® Debezium™ 149

Secor 149

Example use case data storage 150

8.4 Bringing data back into Kafka 151

Tiered storage 152

8.5 Architectures with Kafka 152

Lambda architecture 153

Kappa architecture 154

8.6 Multiple cluster setups 155

Scaling by adding clusters 155

8.7 Cloud- and container-based storage options 155

Kubernetes clusters 156

References 156

9 Management: Tools and logging 158

9.1 Administration clients 159

Administration in code with AdminClient 159

kcat 161

Confluent REST Proxy API 162

9.2 Running Kafka as a systemd service 163

9.3 Logging 164

Kafka application logs 164

ZooKeeper logs 166

9.4 Firewalls 166

Advertised listeners 167

9.5 Metrics 167

JMX console 167

9.6 Tracing option 170

Producer logic 171

Consumer logic 172

Overriding clients 173

9.7 General monitoring tools 174

References 176

Part 3 Going Further 179

10 Protecting Kafka 181

10.1 Security basics 183

Encryption with SSL 183

SSL between brokers and clients 184

SSL between brokers 187

10.2 Kerberos and the Simple Authentication and Security Layer (SASL) 187

10.3 Authorization in Kafka 189

Access control lists (ACLs) 189

Role-based access control (RBAC) 190

10.4 ZooKeeper 191

Kerberos setup 191

10.5 Quotas 191

Network bandwidth quota 192

Request rate quotas 193

10.6 Data at rest 194

Managed options 194

References 195

11 Schema registry 197

11.1 A proposed Kafka maturity model 198

Level 0 198

Level 1 199

Level 2 199

Level 3 200

11.2 The Schema Registry 200

Installing the Confluent Schema Registry 201

Registry configuration 201

11.3 Schema features 202

REST API 202

Client library 203

11.4 Compatibility rules 205

Validating schema modifications 205

11.5 Alternative to a schema registry 207

References 208

12 Stream processing with Kafka Streams and ksqlDB 209

12.1 Kafka Streams 210

KStreams API DSL 211

KTable API 215

GlobalKTable API 216

Processor API 216

Kafka Streams setup 218

12.2 ksqlDB: An event-streaming database 219

Queries 220

Local development 220

ksqlDB architecture 222

12.3 Going further 223

Kafka Improvement Proposals (KIPs) 223

Kafka projects you can explore 223

Community Slack channel 224

References 224

Appendix A Installation 227

Appendix B Client example 234

Index 239

From the B&N Reads Blog

Page 1 of

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Related Subjects

Customer Reviews