Table of Contents
Preface v
Part I introduction
1 The Nature of Language 3
1.1 Syntax versus Semantics 3
1.2 Meaning and Context 8
1.3 The Symbol Grounding Problem 13
Part II Mathematics
2 Relations 19
2.1 Operations with Relations 22
2.2 Homogenous Relations 25
2.3 Order Relations 32
2.4 Lattices. Complete Lattices 34
2.5 Graphical Representation of Ordered Sets 36
2.6 Closure Systems. Galois Connections 38
3 Algebraic Structures 43
3.1 Functions 43
3.2 Binary Operations 46
3.3 Associative Operations. Semigroups 47
3.4 Neutral Elements. Monoids 49
3.5 Morphisms 50
3.6 Invertible Elements. Groups 54
3.7 Subgroups 58
3.8 Group Morphisms 61
3.9 Congruence Relations 64
3.10 Rings and Fields 65
4 Linear Algebra 68
4.1 Vectors 68
4.2 The space Rn 70
4.3 Vector Spaces Over Arbitrary Fields 72
4.4 Linear and Affine Subspaces 74
4.5 Linearly Independent Vectors. Generator Systems. Basis 79
4.5.1 Every vector space has a basis 83
4.5.2 Algorithm for computing the basis of a generated sub-space 90
5 Conceptual Knowledge Processing and Formal Concept Analysis 94
5.1 Introduction 94
5.2 Context and Concept 96
5.3 Many-valued Contexts 106
5.4 Finding all Concepts 107
Part III Knowledge Representation for NLP
6 Measuring Word Meaning Similarity 121
6.1 Introduction 121
6.2 Baseline Methods and Algorithms 122
6.2.1 Intertwining space models and metrics 122
6.2.2 Measuring similarity 126
6.3 Summary and Main Conclusions 128
7 Semantics and Query Languages 129
7.1 Introduction 129
7.2 Baseline Methods and Algorithms 131
7.2.1 The methodology 131
7.2.2 The theory on semantics 133
7.2.3 Automata theory and (query) languages 137
7.2.4 Exemplary algorithms and data structures 145
7.3 Summary and Major Conclusions 160
8 Multi-Lingual Querying and Parametric Theory 162
8.1 Introduction 162
8.2 Baseline Methods and Algorithms 164
8.2.1 Background theory 164
8.2.2 An example 168
8.2.3 An indicative approach 170
8.2.4 An indicative system architecture and implementation 174
8.3 Summary and Major Conclusions 179
Part IV Knowledge Extraction and Engineering for NLP
9 Word Sense Disambiguation 183
9.1 Introduction 183
9.1.1 Meaning and context 184
9.2 Methods and Algorithms: Vectorial Methods in WSD 186
9.2.1 Associating vectors to the contexts 186
9.2.2 Measures of similarity 188
9.2.3 Supervised learning of WSD by vectorial methods 189
9.2.4 Unsupervised approach. Clustering contexts by vectorial method 190
9.3 Methods and Algorithms: Non-vectorial Methods in WSD 192
9.3.1 Naive Bayes classifier approach to WSD 192
9.4 Methods and Algorithms: Bootstrapping Approach of WSD 192
9.5 Methods and Algorithms: Dictionary-based Disambiguation 196
9.5.1 Lesk's algorithms 196
9.5.2 Yarowsky's bootstrapping algorithm 197
9.5.3 WordNet-based methods 198
9.6 Evaluation of WSD Task 207
9.6.1 The benefits of WSD 209
9.7 Conclusions and Recent Research 210
10 Text Entailment 213
10.1 Introduction 213
10.2 Methods and Algorithms: A Survey of RTE-1 and RTE-2 214
10.2.1 Logical aspect of TE 216
10.2.2 Logical approaches in RTE-1 and RTE-2 218
10.2.3 The directional character of the entailment relation and some directional methods in RTE-1 and RTE-2 218
10.2.4 Text entailment recognition by similarities between words and texts 220
10.2.5 A few words about RTE-3 and the last RTE challenges 223
10.3 Proposal for Direct Comparison Criterion 223
10.3.1 Lexical refutation 224
10.3.2 Directional similarity of texts and the comparison criterion 227
10.3.3 Two more examples of the comparison criterion 228
10.4 Conclusions and Recent Research 229
11 Text Segmentation 231
11.1 Introduction 231
11.1.1 Topic segmentation 232
11.2 Methods and Algorithms 233
11.2.1 Discourse structure and hierarchical segmentation 233
11.2.2 Linear segmentation 236
11.2.3 Linear segmentation by Lexical Chains 244
11.2.4 Linear segmentation by FCA 248
11.3 Evaluation 256
11.4 Conclusions and Recent Research 260
12 Text Summarization 262
12.1 Introduction 262
12.2 Methods and Algorithms 267
12.2.1 Summarization starting from linear segmentation 267
12.2.2 Summarization by Lexical Chains (LCs) 271
12.2.3 Methods based on discourse structure 274
12.2.4 Summarization by FCA 275
12.2.5 Summarization by sentence clustering 280
12.2.6 Other approaches 283
12.3 Multi-document Summarization 287
12.4 Evaluation 291
12.4.1 Conferences and Corpora 294
12.5 Conclusions and Recent Research 295
13 Named Entity Recognition 297
13.1 Introduction 297
13.2 Baseline Methods and Algorithms 298
13.2.1 Hand-crafted rules based techniques 298
13.2.2 Machine learning techniques 303
13.3 Summary and Main Conclusions 309
Bibliography 311
Index 331