Text to Speech Synthesis: New Paradigms and Advances


High-quality speech synthesis: new advances, new paradigms

Recent advances in speech synthesis will enable the development of high-quality natural voice systems with broad application in education, business, entertainment, and medicine. Text to Speech Synthesis is the first book to comprehensively document these new research trends and paradigms, balancing coverage of research and applications. It brings together seminal research by leaders in ...

See more details below
Available through our Marketplace sellers.
Other sellers (Hardcover)
  • All (3) from $153.58   
  • New (1) from $1,737.64   
  • Used (2) from $153.58   
Sort by
Page 1 of 1
Showing All
Note: Marketplace items are not eligible for any BN.com coupons and promotions
Seller since 2008

Feedback rating:



New — never opened or used in original packaging.

Like New — packaging may have been opened. A "Like New" item is suitable to give as a gift.

Very Good — may have minor signs of wear on packaging but item works perfectly and has no damage.

Good — item is in good condition but packaging may have signs of shelf wear/aging or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Acceptable — item is in working order but may show signs of wear such as scratches or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Used — An item that has been opened and may show signs of wear. All specific defects should be noted in the Comments section associated with each item.

Refurbished — A used item that has been renewed or updated and verified to be in proper working condition. Not necessarily completed by the original manufacturer.

013145661X New. Looks like an interesting title!

Ships from: Naperville, IL

Usually ships in 1-2 business days

  • Standard, 48 States
  • Standard (AK, HI)
Page 1 of 1
Showing All
Sort by
Sending request ...


High-quality speech synthesis: new advances, new paradigms

Recent advances in speech synthesis will enable the development of high-quality natural voice systems with broad application in education, business, entertainment, and medicine. Text to Speech Synthesis is the first book to comprehensively document these new research trends and paradigms, balancing coverage of research and applications. It brings together seminal research by leaders in the field, drawn from both academic and industrial laboratories worldwide.

The authors and editors offer broad coverage of several key areas, including new unit selection approaches; speech representations and modeling; data-driven synthesis schemes; and expressive speech synthesis. Coverage includes:

  • Unit Selection Methods: reducing discontinuities at synthesis time in corpus-based speech processing, voice quality variation, and join costs
  • Hidden Markov Model (HMM)-Based Synthesis: advanced uses of speech recognition technology, HMM-based multilingual speech synthesis, and new prosody control techniques
  • Expressive Speech Synthesis: challenges, questions, and avenues of research, including diphone transplantation and minimization of pitch modification
  • Speech Representation and Models: a new articulatory modeling paradigm for controlling synthesis quality

This is an essential resource for all researchers working in speech synthesis and related areas such as multimedia signal processing, linguistics, and spoken user interfaces. It will also be valuable to any engineer, developer, or manager who must evaluate the latest speech recognition technologies orintegrate them into practical applications.

Read More Show Less

Product Details

  • ISBN-13: 9780131456617
  • Publisher: Prentice Hall Professional Technical Reference
  • Publication date: 8/3/2004
  • Pages: 288
  • Product dimensions: 7.22 (w) x 9.64 (h) x 0.91 (d)

Table of Contents

1 Reducing discontinuities at synthesis time for corpus-based speech synthesis 1
2 Voice quality variation in a long-term recording of a single speaker speech corpus 19
3 Join cost for unit selection speech synthesis 35
4 Articulatory modeling : a role in concatenative text to speech synthesis 63
5 Minimizing the amount of pitch modification in speech synthesis 89
6 The use of speech recognition technology in speech synthesis 109
7 An HMM-based approach to multilingual speech synthesis 135
8 Prosody control for HMM-based Japanese TTS 155
9 Synthesizing expressive speech overview : challenges, and open questions 175
10 Unit selection synthesis of prosody : evaluation using diphone transplantation 203
11 Toward expressive synthetic speech 219
Read More Show Less


Speech synthesis research has attracted renewed interest worldwide. There have been several recent conferences on speech synthesis where current approaches and advances have been highlighted. The goal of this book is to provide an in-depth exposition of some of the recent trends and novel directions in the field. This book was inspired largely by an IEEE-sponsored workshop held in September 2002 in Santa Monica, California, and was dedicated to the memory of Mike Macon, a speech synthesis researcher who tragically passed away at a young age. The Foreword by Jan Van Santen highlights some of Mike's important contributions to the field.

The chapters in this book attempt to cover a wide range of topics in speech synthesis. They are organized into four sections: Unit Selection Methods, HMM-Based Synthesis Schemes, Expressive Speech Synthesis, and Speech Representations and Models for TTS.

One of the major challenges for corpus-based speech approaches is the reduction of discontinuities at synthesis time. The chapter by Bozkurt, et al., introduces signal processing schemes aimed at addressing concatenation and smoothing of speech units. Another challenge for unit selection synthesis systems is dealing with voice quality variations in the unit inventory. Kawai and Tsukaki address the issue by considering long-term recording of a single speaker corpus. The authors attempt to derive acoustic measures that correlate with perceptual measures of voice quality variations, which in turn could be exploited for optimal unit selection. The third chapter in this section, by Vepa, King, and Taylor, focuses on the issue of defining and calculating the join (or concatenation) cost topredict the perceived discontinuity at concatenation points. They also examine underlying representations to simultaneously compute concatenation costs and smooth acoustic coefficients.

An emerging and promising approach to speech synthesis is based on hidden Markov models (HMMs), which have been used successfully in automatic speech recognition. HMM-based systems are the focus of the second section in this book. This paradigm shift in synthesis is highlighted in the chapter by Ostendorf and Bulyko, where the parallels, potential pitfalls and missing links between synthesis and recognition using HMMs are discussed. The following chapter by Tokuda, Zen, and Black describe how the HMM framework can be used to develop an end-to-end multilingual synthesis system highlighting the benefits and open research challenges. One such challenge relates to prosody control. The third paper, by Iwano, Yamada, Togawa and Furui, addresses this issue so that the rate of the synthesized speech can be continuously and effectively modified. Their system was evaluated subjectively to assess naturalness.

A critical aspect of natural speech is its expressive quality. Recent trends in speech synthesis aim at achieving and improving the expressive nature of synthesized speech by manipulating segmental and suprasegmental properties of the speech signal. An overview of synthesizing expressive speech is provided by Bulut, Narayanan, and Johnson in their chapter. They review both rule-based and data-driven methods, data collection, and evaluation approaches and provide a summary of open questions in emotional speech synthesis. Eide, Bakis, Hamza, and Piterelli discuss and compare methods for generating expressive speech for unlimited and limited resource scenarios. For both cases they show significant differences between expressive and neutral synthetic speech. Prosody control, an important element for achieving expressive quality, is the topic addressed by Prudon, d'Alessandro, and Boula de MareĆ¼il. In their chapter, they focus on prosody synthesis and evaluation using both rule-based and data-driven approaches for diphone synthesis in the French language. The chapter by Klabbers, van Santen, and Wouters considers the problem of prosody control in unit selection systems in a different light. They describe and evaluate an approach of prosodic factorization, to be used while designing a unit selection system, that can help minimize the amount of pitch modification required.

An underlying technical challenge in synthesizing expressive speech by data-driven means is the ability to exercise control over the synthesis quality. The final chapter in the book, by Sondhi and Sinder, presents an alternative paradigm in achieving better parametric control in speech synthesizers by relying on articulatory representation for the speech signal. Notably, they explore the notion of using articulatory units in a corpus-based concatenative speech synthesis set-up.

Shrikanth Narayanan and Abeer Alwan, Editors

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing 1 Customer Reviews
  • Anonymous

    Posted May 22, 2005

    compare to ASR

    The field of TTS has been steadily improving. But still not perfect. If you listen to an extended TTS audio, you are unlikely to imagine it was a single human recording. Here, the editors provide a set of research papers that map out the boundary of TTS. What I found the most interesting was the chapter comparing it with Automatic Speech Recognition. The latter is a much harder problem. Especially if you want speaker independence. And the input audio can have noise. Whereas TTS is effectively noise-free. The input text is always precisely known. But the chapter points out an ironic difference that is somewhat of a mirror image. ASR accuracy can be easily and objectively measured, by comparing the ASR's output text with the text transcribed by a human listener. Whereas the 'goodness' of a TTS audio output is very subjectively determined. This is one major unsolved TTS problem. The chapter goes into some of the ASR methods that have been brought successfully into TTS research. Most notably is the use of Hidden Markov Methods. In ASR work, this was perhaps the biggest innovation in the last 10 years. It also shows some preliminary promise for TTS.

    Was this review helpful? Yes  No   Report this review
Sort by: Showing 1 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)