In both the linguistic and the language engineering community, the creation and use of annotated text collections (or annotated corpora) is currently a hot topic. Annotated texts are of interest for research as well as for the development of natural language pro cessing (NLP) applications. Unfortunately, the annotation of text material, especially more interesting linguistic annotation, is as yet a difficult task and can entail a substan tial amount of human involvement. Allover the world, work is being done to replace as much as possible of this human effort by computer processing. At the frontier of what can already be done (mostly) automatically we find syntactic wordclass tagging, the annotation of the individual words in a text with an indication of their morpho syntactic classification. This book describes the state of the art in syntactic wordclass tagging. As an attempt to give an overall view of the field, this book is of interest to (at least) two, possibly very different, types of reader. The first type consists of those people who are using, or are planning to use, tagged material and taggers. They will want to know what the possibilities and impossibilities of tagging are, but are not necessarily interested in the internal working of automatic taggers. This, on the other hand, is the main interest of our second type of reader, the builders of automatic taggers and other natural language processing software.
Table of ContentsPreface. Contributing Authors. Part I: The User's View. 1. Orientation; A. Voutilainen. 2. A Short History of Tagging; A. Voutilainen. 3. The Use of Tagging; G. Leech, N. Smith. 4. Tagsets; J. Cloeren. 5. Standards for Tagsets; G. Leech, A. Wilson. 6. Performance of Taggers; H. van Halteren. 7. Selection and Operation of Taggers; H. van Halteren. Part II: The Implementer's View. 8. Automatic Taggers: An Introduction; H. van Halteren, A. Voutilainen. 9. Tokenization; G. Grefenstette. 10. Lexicons for Tagging; A. Schiller, L. Karttunen. 11. Standardization in the Lexicon; M. Monachini, N. Calzolari. 12. Morphological Analysis; K. Oflazer. 13. Tagging Unknown Words; E. Brill. 14. Hand-Crafted Rules; A. Voutilainen. 15. Corpus-Based Rules; E. Brill. 16. Hidden Markov Models; M. El-Beze, B. Merialdo. 17. Machine Learning Approaches; W. Daelemans. Appendix A: Example tagsets. References. Index.