This book constitutes the refereed proceedings of the 5th International Workshop on Machine Learning for Multimodal Interaction, MLMI 2008, held in Utrecht, The Netherlands, in September 2008. The 12 revised full papers and 15 revised poster papers presented together with 5 papers of a special session on user requirements and evaluation of multimodal meeting browsers/assistants were carefully reviewed and selected from 47 submissions. The papers cover a wide range of topics related to human-human communication modeling and processing, as well as to human-computer interaction, using several communication modalities. Special focus is given to the analysis of non-verbal communication cues and social signal processing, the analysis of communicative content, audio-visual scene analysis, speech processing, interactive systems and applications.
Table of ContentsFace, Gesture and Nonverbal Communication.- Visual Focus of Attention in Dynamic Meeting Scenarios.- Fast and Robust Face Tracking for Analyzing Multiparty Face-to-Face Meetings.- What Does the Face-Turning Action Imply in Consensus Building Communication?.- Distinguishing the Communicative Functions of Gestures.- Optimised Meeting Recording and Annotation Using Real-Time Video Analysis.- Ambiguity Modeling in Latent Spaces.- Audio-Visual Scene Analysis and Speech Processing.- Inclusion of Video Information for Detection of Acoustic Events Using the Fuzzy Integral.- Audio-Visual Clustering for 3D Speaker Localization.- A Hybrid Generative-Discriminative Approach to Speaker Diarization.- A Neural Network Based Regression Approach for Recognizing Simultaneous Speech.- Hilbert Envelope Based Features for Far-Field Speech Recognition.- Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis.- Social Signal Processing.- Decision-Level Fusion for Audio-Visual Laughter Detection.- Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings.- Automatic Recognition of Spontaneous Emotions in Speech Using Acoustic and Lexical Features.- Daily Routine Classification from Mobile Phone Data.- Human-Human Spoken Dialogue Processing.- Hybrid Multi-step Disfluency Detection.- Exploring Features and Classifiers for Dialogue Act Segmentation.- Detecting Action Items in Meetings.- Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process.- Time-Compressing Speech: ASR Transcripts Are an Effective Way to Support Gist Extraction.- Meta Comments for Summarizing Meeting Speech.- HCI and Applications.- A Generic Layout-Tool for Summaries of Meetings in a Constraint-Based Approach.- A Probabilistic Model for User Relevance Feedback on Image Retrieval.- The AMIDA Automatic Content Linking Device: Just-in-Time Document Retrieval in Meetings.- Introducing Additional Input Information into Interactive Machine Translation Systems.- Computer Assisted Transcription of Text Images and Multimodal Interaction.- User Requirements and Evaluation of Meeting Browsers and Assistants.- Designing and Evaluating Meeting Assistants, Keeping Humans in Mind.- Making Remote ‘Meeting Hopping’ Work: Assistance to Initiate, Join and Leave Meetings.- Physicality and Cooperative Design.- Developing and Evaluating a Meeting Assistant Test Bed.- Extrinsic Summarization Evaluation: A Decision Audit Task.