Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing / Edition 1

Hardcover (Print)
Buy New
Buy New from
Used and New from Other Sellers
Used and New from Other Sellers
from $120.18
Usually ships in 1-2 business days
(Save 28%)
Other sellers (Hardcover)
  • All (4) from $120.18   
  • New (3) from $120.18   
  • Used (1) from $231.86   


Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing is an up-to-date overview of audio and video content analysis. Included is extensive treatment of audiovisual data segmentation, indexing and retrieval based on multimodal media content analysis, and content-based management of audio data. In addition to the commonly studied audio types such as speech and music, the authors have included hybrid types of sounds that contain more than one kind of audio component such as speech or environmental sound with music in the background. Emphasis is also placed on semantic-level identification and classification of environmental sounds. The authors introduce a new generic audio retrieval system on top of the audio archiving schemes. Both theoretical analysis and implementation issues are presented. The developing MPEG-7 standards are explored.
Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing will be especially useful to researchers and graduate level students designing and developing fully functional audiovisual systems for audio/video content parsing of multimedia streams.

Read More Show Less

Editorial Reviews

With Moving Pictures Experts Group (MPEG) technologies, multi- media has become hotter than ever. While previous research on the automatic segmentation, indexing, and retrieval of audiovisual data has focused primarily on the pictorial part, it is becoming more recognized that a fully functioning system for video content parsing requires a proper mix of audio as well as visual information. Ergo, the authors devote much of this monograph to the content-based management of audio data, based on a three-stage hierarchical system. Some of the experimental results and illustrations featured are derived from the video portion of MPEG-7 test data. Zhang and Kuo are with the Integrated Media Systems Center and department of electrical engineering systems at the U. of Southern California, Los Angeles. Annotation c. Book News, Inc., Portland, OR (
Read More Show Less

Product Details

Table of Contents

List of Figures ix
List of Tables xiii
Preface xv
Acknowledgments xix
Part I Introduction
1. Introduction 3
1. Significance of Proposed Research 3
1.1 Video Segmentation and Annotation 3
1.2 Audio and Visual Content Analysis 4
1.3 MPEG-7 Standard Development 6
2. Review of Previous Work 8
2.1 Work on Video Indexing and Retrieval 8
2.2 Work on Audio Content Analysis 9
2.3 Work on Visual Content Analysis 12
3. Summary of the Proposed System 12
3.1 Framework for Video Segmentation and Indexing 12
3.2 Content Analysis of the Audio Stream 13
3.3 Content Analysis of Image Sequences 17
4. Contribution of the Research 17
5. Outline of the Monograph 19
Part II Video Content Modeling
2. Video Content Modeling 23
1. Common Model for Video Content 23
2. Models for Different Video Types 24
2.1 News Bulletin 24
2.2 Variety Show Video 24
2.3 Sports Video 25
2.4 Documentaries 26
2.5 Feature Movies and TV Series 27
3. Proposed Scheme for Video Content Parsing 28
4. Design of Index Table for Non-linear Access 30
4.1 The Primary Index Table 30
4.2 The Secondary Index Tree 30
Part III Audio Content Analysis
3. Audio Feature Analysis 35
1. Audio Features for Coarse-Level Segmentation and Indexing of Generic Data 35
1.1 Short-Time Energy Function 35
1.2 Short-Time Average Zero Crossing Rate 37
1.3 Short-Time Fundamental Frequency 38
1.4 Spectral Peak Track 42
2. Audio Features for Fine-Level Classification and Retrieval of Sound Effects 48
2.1 Timbre Features 48
2.2 Rhythm Features 53
4. Generic Audio Data Segmentation and Indexing 55
1. Detection of Segment Boundaries 55
2. Classification of Each Segment 56
2.1 Detecting Silence 56
2.2 Separating Sounds into with and without Music Components 58
2.3 Detecting Harmonic Environmental Sounds 61
2.4 Distinguishing Pure Music 61
2.5 Distinguishing Songs 62
2.6 Separating Speech with Music Background and Environmental Sound with Music Background 62
2.7 Distinguishing Pure Speech 63
2.8 Classifying Non-harmonic Environmental Sounds 65
3. Post-Processing 65
5. Sound Effects Classification and Retrieval 69
1. Hidden Markov Model and Gaussian Mixture Model 69
1.1 The Gaussian Mixture Model 70
1.2 The Hidden Markov Model 71
1.3 Hidden Markov Model with Continuous Observation Density 72
1.4 Hidden Markov Model with Explicit State Duration Density 73
2. Clustering of Feature Vectors 74
3. Training of HMM Parameter Sets 75
3.1 The Training Process 75
3.2 Implementational Issues 77
3.3 Comparison with the Baum-Welch Method 79
3.4 Incorporation of the Viterbi Algorithm 79
4. Classification of Environmental Sound 80
5. Query-by-Example Retrieval of Environmental Sound 81
Part IV Image Sequence Analysis
6. Image Sequence Analysis 85
1. Histogram Difference value in Image Sequences 85
1.1 Definition of the Metrics 85
1.2 Histogram Difference of the Y-Component 86
1.3 Histogram Difference of the U and V Components 88
1.4 Histogram Difference of the Combined Code 89
2. The Twin-Comparison Approach 91
2.1 The Original Algorithm 91
2.2 Experimental Results and Modifications 92
3. Shot Change Detection Based on Combined Y- and V-Components 97
3.1 Determination of the Lower Threshold 97
3.2 Determination of the Higher Threshold 98
3.3 Framework of the Proposed Scheme 99
4. Adaptive Keyframe Extraction and Associated Feature Analysis 102
4.1 Adaptive Keyframe Extraction 102
4.2 Feature Analysis of Keyframes 103
Part V Experimental Results
7. Experimental Results 107
1. Generic Audio Data Segmentation and Indexing 107
1.1 Audio Database 107
1.2 Coarse-Level Classification Results 108
1.3 Segmentation and Indexing Results 109
2. Environmental Sound Classification and Retrieval 112
2.1 Timbre Retrieval with GMM 112
2.2 Sound Effects Classification Results 112
2.3 Sound Effects Retrieval Results 114
3. Shot Change Detection and Keyframe Extraction 115
3.1 Shot Change Detection Results 115
3.2 Keyframe Extraction Results 116
4. Index Table Generation 117
4.1 Index Table for News Bulletin 117
4.2 Index Table for Documentary 119
Part VI Conclusion
8. Conclusion and Extensions 123
1. Conclusion 123
2. Feature Extraction in the Compression Domain 124
3. System Integration and Applications 125
4. Contributions to MPEG-7 126
References 129
Index 135
Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)