Spoken Language Processing: A Guide to Theory, Algorithm and System Development / Edition 1

Hardcover (Print)
Buy New
Buy New from BN.com
$73.83
Used and New from Other Sellers
Used and New from Other Sellers
from $61.13
Usually ships in 1-2 business days
(Save 34%)
Other sellers (Hardcover)
  • All (11) from $61.13   
  • New (4) from $78.46   
  • Used (7) from $61.13   

Overview

  • New advances in spoken language processing: theory and practice
  • In-depth coverage of speech processing, speech recognition, speech synthesis, spoken language understanding, and speech interface design
  • Many case studies from state-of-the-art systems, including examples from Microsoft's advanced research labs

Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond. Starting with the fundamentals, it presents all this and more:

  • Essential background on speech production and perception, probability and information theory, and pattern recognition
  • Extracting information from the speech signal: useful representations and practical compression solutions
  • Modern speech recognition techniques: hidden Markov models, acoustic and language modeling, improving resistance to environmental noises, search algorithms, and large vocabulary speech recognition
  • Text-to-speech: analyzing documents, pitch and duration controls; trainable synthesis, and more
  • Spoken language understanding: dialog management, spoken language applications, and multimodal interfaces

To illustrate the book's methods, the authors present detailed case studies based on state-of-the-art systems, including Microsoft's Whisper speech recognizer, Whistler text-to-speech system, Dr. Who dialog system, and the MiPad handheld device. Whether you're planning, designing, building, or purchasing spoken language technology, this is the state of the art—from algorithms through business productivity.

Read More Show Less

Product Details

  • ISBN-13: 9780130226167
  • Publisher: Prentice Hall
  • Publication date: 4/25/2001
  • Edition description: New Edition
  • Edition number: 1
  • Pages: 1008
  • Sales rank: 578,505
  • Product dimensions: 6.90 (w) x 9.00 (h) x 2.00 (d)

Meet the Author

XUEDONG HUANG is founder and head of the Speech Technology Group at Microsoft Research. He received his Ph.D. from the University of Edinburgh. He is an IEEE Fellow.

ALEX ACERO and HSIAO-WUEN HON are Senior Researchers at Microsoft Research and Senior Members of IEEE. Both received doctorates from Carnegie Mellon University.

Foreword by Dr. Raj Reddy, Carnegie Mellon University

Read More Show Less

Read an Excerpt

Preface

Our primary motivation in writing this book is to share our working experience to bridge the gap between the knowledge of industry gurus and newcomers to the spoken language processing community. Many powerful techniques hide in conference proceedings and academic papers for years before becoming widely recognized by the research community or the industry. We spent many years pursuing spoken language technology research at Carnegie Mellon University before we started spoken language R&D at Microsoft. We fully understand that it is by no means a small undertaking to transfer a state-of-the-art spoken language research system into a commercially viable product that can truly help people improve their productivity. Our experience in both industry and academia is reflected in the context of this book, which presents a contemporary and comprehensive description of both theoretic and practical issues in spoken language processing. This book is intended for people of diverse academic and practical backgrounds. Speech scientists, computer scientists, linguists, engineers, physicists, and psychologists all have a unique perspective on spoken language processing. This book will be useful to all of these special interest groups.

Spoken language processing is a diverse subject that relies on knowledge of many levels, including acoustics, phonology, phonetics, linguistics, semantics, pragmatics, and discourse. The diverse nature of spoken language processing requires knowledge in computer science, electrical engineering, mathematics, syntax, and psychology. There are a number of excellent books on the subfields of spoken language processing, including speech recognition, text-to-speech conversion, and spoken language understanding, but there is no single book that covers both theoretical and practical aspects of these subfields and spoken language interface design. We devote many chapters systematically introducing fundamental theories needed to understand how speech recognition, text-to-speech synthesis, and spoken language understanding work. Even more important is the fact that the book highlights what works well in practice, which is invaluable if you want to build a practical speech recognizer, a practical text-to-speech synthesizer, or a practical spoken language system. Using numerous real examples in developing Microsoft's spoken language systems, we concentrate on showing how the fundamental theories can be applied to solve real problems in spoken language processing.

Read More Show Less

Table of Contents

(NOTE: Each chapter ends with Historical Perspective and Further Reading.)

1. Introduction.

Motivations. Spoken Language System Architecture. Book Organization. Target Audiences.

I. FUNDAMENTAL THEORY.

2. Spoken language Structure.

Sound and Human Speech Systems. Phonetics and Phonology. Syllables and Words. Syntax and Semantics.

3. Probability, Statistics, and Information Theory.

Probability Theory. Estimation Theory. Significance Testing. Information Theory.

4. Pattern Recognition.

Bayes' Decision Theory. How to Construct Classifiers. Discriminative Training. Unsupervised Estimation Methods. Classification and Regression Trees.

II. SPEECH PROCESSING.

5. Digital Signal Processing.

Digital Signals and Systems. Continuous-Frequency Transforms. Discrete-Frequency Transforms. Digital Filters and Windows. Digital Processing of Analog Signals. Multirate Signal Processing. Filterbanks. Stochastic Processes.

6. Speech Signal Representations.

Short-Time Fourier Analysis. Acoustical Model of Speech Production. Linear Predictive Coding. Cepstral Processing. Perceptually Motivated Representations. Formant Frequencies. The Role of Pitch.

7. Speech Coding.

Speech Coders Attributes. Scalar Waveform Coders. Scalar Frequency Domain Coders. Code Excited Linear Prediction (CELP). Low-Brit Speech Coders.

III. SPEECH RECOGNITION.

8. Hidden Markov Models.

The Markov Chain. Definition of the Hidden Markov Model. Continuous and Semicontinuous HMMs. Practical Issues in Using HMMs. HMM Limitations.

9. Acoustic Modeling.

Variability in the Speech Signal. How to Measure Speech Recognition Errors. Signal Processing—Extracting Features. Phonectic Modeling—Selecting Appropriate Units. Acoustic Modeling—Scoring Acoustic Features. Adaptive Techniques—Minimizing Mismatches. Confidence Measures: Measuring the Reliability. Other Techniques. Case Study: Whisper.

10. Environmental Robustness.

The Acoustical Environment. Acoustical Transducers. Adaptive Echo Cancellation (AEC). Multimicrophone Speech Enhancement. Environment Compensation Preprocessing. Environment Model Adaptation. Modeling Nonstationary Noise.

11. Language Modeling.

Formal Language Theory. Stochastic Language Models. Complexity Measure of Language Models. N-Gram Smoothing. Adaptive Language Models. Practical Issues.

12. Basic Search Algorithms.

Basic Search Algorithms. Search Algorithms for Speech Recognition. Language Model States. Time-Synchronous Viterbi Beam Search. Stack Decoding (A Search).

13. Large-Vocabulary Search Algorithms.

Efficient Manipulation of a Tree Lexicon. Other Efficient Search Techniques. N-Best and Multipass Search Strategies. Search-Algorithm Evaluation. Case Study—Microsoft Whisper.

IV. TEXT-TO-SPEECH SYSTEMS.

14. Text and Phonetic Analysis.

Modules and Data Flow. Lexicon. Document Structured Detection. Text Normalization. Linguistic Analysis. Homograph Disambiguation. Morphological Analysis. Letter-to-Sound Conversion. Evaluation. Case Study: Festival.

15. Prosody.

The Role of Understanding. Prosody Generation Schematic. Speaking Style. Symbolic Prosody. Duration Assignment. Pitch Generation. Prosody Markup Languages. Prosody Evaluation.

16. Speech Synthesis.

Attributes of Speech Synthesis. Formant Speech Synthesis. Concatenative Speech Synthesis. Prosodic Modification of Speech. Source-Filter Models for Prosody Modification. Evaluation of TTS Systems.

V. SPOKEN LANGUAGE SYSTEMS.

17. Spoken Language Understanding.

Written vs. Spoken Languages. Dialog Structure. Semantic Representation. Sentence Interpretation. Discourse Analysis. Dialog Management. Response Generation and Rendition. Evaluation. Case Study—Dr. Who.

18. Applications and User Interfaces.

Application Architecture. Typical Applications. Speech Interface Design. Internationalization. Case Study—MIPAD.

Index.

Read More Show Less

Preface

Preface

Our primary motivation in writing this book is to share our working experience to bridge the gap between the knowledge of industry gurus and newcomers to the spoken language processing community. Many powerful techniques hide in conference proceedings and academic papers for years before becoming widely recognized by the research community or the industry. We spent many years pursuing spoken language technology research at Carnegie Mellon University before we started spoken language R&D at Microsoft. We fully understand that it is by no means a small undertaking to transfer a state-of-the-art spoken language research system into a commercially viable product that can truly help people improve their productivity. Our experience in both industry and academia is reflected in the context of this book, which presents a contemporary and comprehensive description of both theoretic and practical issues in spoken language processing. This book is intended for people of diverse academic and practical backgrounds. Speech scientists, computer scientists, linguists, engineers, physicists, and psychologists all have a unique perspective on spoken language processing. This book will be useful to all of these special interest groups.

Spoken language processing is a diverse subject that relies on knowledge of many levels, including acoustics, phonology, phonetics, linguistics, semantics, pragmatics, and discourse. The diverse nature of spoken language processing requires knowledge in computer science, electrical engineering, mathematics, syntax, and psychology. There are a number of excellent books on the subfields of spoken language processing, including speech recognition, text-to-speech conversion, and spoken language understanding, but there is no single book that covers both theoretical and practical aspects of these subfields and spoken language interface design. We devote many chapters systematically introducing fundamental theories needed to understand how speech recognition, text-to-speech synthesis, and spoken language understanding work. Even more important is the fact that the book highlights what works well in practice, which is invaluable if you want to build a practical speech recognizer, a practical text-to-speech synthesizer, or a practical spoken language system. Using numerous real examples in developing Microsoft's spoken language systems, we concentrate on showing how the fundamental theories can be applied to solve real problems in spoken language processing.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)