Getting Started with Natural Language Processing

Hit the ground running with this in-depth introduction to the NLP skills and techniques that allow your computers to speak human.

In Getting Started with Natural Language Processing you’ll learn about:

    Fundamental concepts and algorithms of NLP
    Useful Python libraries for NLP
    Building a search algorithm
    Extracting information from raw text
    Predicting sentiment of an input text
    Author profiling
    Topic labeling
    Named entity recognition

Getting Started with Natural Language Processing is an enjoyable and understandable guide that helps you engineer your first NLP algorithms. Your tutor is Dr. Ekaterina Kochmar, lecturer at the University of Bath, who has helped thousands of students take their first steps with NLP. Full of Python code and hands-on projects, each chapter provides a concrete example with practical techniques that you can put into practice right away. If you’re a beginner to NLP and want to upgrade your applications with functions and features like information extraction, user profiling, and automatic topic labeling, this is the book for you.

About the technology
From smart speakers to customer service chatbots, apps that understand text and speech are everywhere. Natural language processing, or NLP, is the key to this powerful form of human/computer interaction. And a new generation of tools and techniques make it easier than ever to get started with NLP!

About the book
Getting Started with Natural Language Processing teaches you how to upgrade user-facing applications with text and speech-based features. From the accessible explanations and hands-on examples in this book you’ll learn how to apply NLP to sentiment analysis, user profiling, and much more. As you go, each new project builds on what you’ve previously learned, introducing new concepts and skills. Handy diagrams and intuitive Python code samples make it easy to get started—even if you have no background in machine learning!

What's inside

    Fundamental concepts and algorithms of NLP
    Extracting information from raw text
    Useful Python libraries
    Topic labeling
    Building a search algorithm

About the reader
You’ll need basic Python skills. No experience with NLP required.

About the author
Ekaterina Kochmar is a lecturer at the Department of Computer Science of the University of Bath, where she is part of the AI research group.

Table of Contents
1 Introduction
2 Your first NLP example
3 Introduction to information search
4 Information extraction
5 Author profiling as a machine-learning task
6 Linguistic feature engineering for author profiling
7 Your first sentiment analyzer using sentiment lexicons
8 Sentiment analysis with a data-driven approach
9 Topic analysis
10 Topic modeling
11 Named-entity recognition

Getting Started with Natural Language Processing

36.99 In Stock

Getting Started with Natural Language Processing

Add to Wishlist

Getting Started with Natural Language Processing

eBook

$36.99

View All Available Formats & Editions

eBook
$36.99

View All Available Formats & Editions

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.

WANT A NOOK? Explore Now

Buy As Gift

Related collections and offers

Overview

Product Details

ISBN-13:	9781638350927
Publisher:	Manning
Publication date:	11/15/2022
Sold by:	SIMON & SCHUSTER
Format:	eBook
Pages:	456
File size:	11 MB
Note:	This product may take a few minutes to download.

About the Author

Ekaterina Kochmar is an Affiliated Lecturer and a Senior Research Associate at the Natural Language and Information Processing group of the Department of Computer Science and Technology, University of Cambridge. She holds an MA degree in Computational Linguistics, an MPhil in Advanced Computer Science, and a PhD in Natural Language Processing.

Preface xiii

Acknowledgments xv

About this book xvii

About the author xxii

About the cover illustration xxiii

1 Introduction 1

1.1 A brief history of NLP 2

1.2 Typical tasks 5

Information search 5

Advanced information search: Asking the machine precise questions 16

Conversational agents and intelligent virtual assistants 18

Text prediction and language generation 20

Spam filtering 25

Machine translation 26

Spell- and grammar checking 28

2 Your first NLP example 31

2.1 Introducing NLP in practice: Spam Filtering 31

2.2 Understanding the task 36

Step 1 Define the data and classes 37

Step 2 Split the text into words 37

Step 3 Extract and normalize the features 42

Step 4 Train a classifier 43

Step 5 Evaluate the classifier 45

2.3 Implementing your own spam filter 46

Step 1 Define the data and classes 46

Step 2 Split the text into words 49

Step 3 Extract and normalize the features 50

Step 4 Train the classifier 53

Step 5 Evaluate your classifier 62

2.4 Deploying your spam filter in practice 65

3 Introduction to information search 71

3.1 Understanding the task 72

Data and data structures 75

Boolean search algorithm 83

3.2 Processing the data further 87

Preselecting the words that matter: Stopwords removal 87

Matching forms of the same word: Morphological processing 90

3.3 Information weighing 96

Weighing words with term frequency 97

Weighing words with inverse document frequency 100

3.4 Practical use of the search algorithm 103

Retrieval of the most similar documents 104

Evaluation of the results 106

Deploying search algorithm in practice 111

4 Information extraction 114

4.1 Use cases 116

Case 1 116

Case 2 117

Case 3 119

4.2 Understanding the task 120

4.3 Detecting word types with part-of-speech tagging 124

Understanding word types 124

Part-of-speech tagging with spaCy 128

4.4 Understanding sentence structure with syntactic parsing 137

Why sentence structure is important 137

Dependency parsing with spaCy 139

4.5 Building your own information extraction algorithm 144

5 Author profiling as a machine-learning task 151

5.1 Understanding the task 153

Case 1 Authorship attribution 154

Case 2 User profiling 155

5.2 Machine-learning pipeline at first glance 157

Original data 157

Testing generalization behavior 163

Setting up the benchmark 169

5.3 A closer look at the machine-learning pipeline 175

Decision Trees classifier basics 175

Evaluating which tree is better using node impurity 178

Selection of the best split in Decision Trees 184

Decision Trees on language data 185

6 Linguistic feature engineering for author profiling 194

6.1 Another close look at the machine-learning pipeline 196

Evaluating the performance of your classifier 196

Further evaluation measures 197

6.2 Feature engineering for authorship attribution 200

Word and sentence length statistics as features 201

Counts of stopwords and proportion of stopwords as features 207

Distributions of parts of speech as features 212

Distribution of word suffixes as features 219

Unique words as features 223

6.3 Practical use of authorship attribution and user profiling 226

7 Your first sentiment analyzer using sentiment lexicons 229

7.1 Use cases 231

7.2 Understanding your task 234

Aggregating sentiment score with the help of a lexicon 235

Learning to detect sentiment in a data-driven way 237

7.3 Setting up the pipeline: Data loading and analysis 239

Data loading and preprocessing 240

A closer look into the data 243

7.4 Aggregating sentiment scores with a sentiment lexicon 251

Collecting sentiment scores from a lexicon 252

Applying sentiment scores to detect review polarity 255

8 Sentiment analysis with a data-driven approach 263

8.1 Addressing multiple senses of a word with SentiWordNet 266

8.2 Addressing dependence on context with machine learning 277

Data preparation 278

Extracting features from text 284

Sakit-learn's machine-learning pipeline 289

Full-scale evaluation with cross-validation 292

8.3 Varying the length of the sentiment-bearing features 295

8.4 Negation handling for sentiment analysis 298

8.5 Further practice 301

9 Topic analysis 304

9.1 Topic classification as a supervised machine-learning task 307

Data 308

Topic classification with Naive Bayes 312

Evaluation of the results 320

9.2 Topic discovery as an unsupervised machine-learning task 325

Unsupervised ML approaches 325

Clustering far topic discovery 330

Evaluation of the topic clustering algorithm 338

10 Topic modeling 346

10.1 Topic modeling with latent Dirichlet allocation 349

Exercise 10.1: Question 1 solution 349

Exercise 10.1: Question 2 solution 351

Estimating parameters for the LDA 352

LDA as a generative model 356

10.2 Implementation of the topic modeling algorithm 360

Loading the data 361

Preprocessing the data 363

Applying the LDA model 371

Exploring the results 375

11 Named-entity recognition 384

11.1 Named entity recognition: Definitions and challenges 388

Named entity types 388

Challenges in named entity recognition 390

11.2 Named-entity recognition as a sequence labeling task 392

The basics: BIO scheme 393

What does it mean for a task to be sequential? 395

Sequential solution for NER 397

11.3 Practical applications of NER 403

Data loading and exploration 403

Named entity types exploration with spaCy 406

Information extraction revisited 410

Named entities visualization 416

Appendix Installation instructions 422

Index 423

From the B&N Reads Blog

Page 1 of

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Related Subjects

Customer Reviews