This volume contains the proceedings of the International Conference on Advanced Data Mining and Applications (ADMA 2009), held in Beijing, China, during August 17–19, 2009. We are pleased to have a very strong program. Acceptance into the conference proceedings was extremely competitive. From the 322 submissions from 27 countries and regions, the Program Committee selected 34 full papers and 47 short papers for presentation at the conference and inclusion in the proceedings. The c- tributed papers cover a wide range of data mining topics and a diverse spectrum of interesting applications. The Program Committee worked very hard to select these papers through a rigorous review process and extensive discussion, and finally c- posed a diverse and exciting program for ADMA 2009. An important feature of the main program was the truly outstanding keynote spe- ers program. Edward Y. Chang, Director of Research, Google China, gave a talk titled "Confucius and 'Its' Intelligent Disciples". Being right in the forefront of data mining applications to the world's largest knowledge and data base, the Web, Dr. Chang - scribed how Google's Knowledge Search product help to improve the scalability of machine learning for Web-scale applications. Charles X. Ling, a seasoned researcher in data mining from the University of Western Ontario, Canada, talked about his in- vative applications of data mining and artificial intelligence to gifted child education.
Table of ContentsKeynotes.- Confucius and “Its” Intelligent Disciples.- From Machine Learning to Child Learning.- Sensitivity Based Generalization Error for Supervised Learning Problems with Application in Feature Selection.- Data Mining in Financial Markets.- Regular Papers.- Cluster Analysis Based on the Central Tendency Deviation Principle.- A Parallel Hierarchical Agglomerative Clustering Technique for Billingual Corpora Based on Reduced Terms with Automatic Weight Optimization.- Automatically Identifying Tag Types.- Social Knowledge-Driven Music Hit Prediction.- Closed Non Derivable Data Cubes Based on Non Derivable Minimal Generators.- Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions.- Anti-germ Performance Prediction for Detergents Based on Elman Network on Small Data Sets.- A Neighborhood Search Method for Link-Based Tag Clustering.- Mining the Structure and Evolution of the Airport Network of China over the Past Twenty Years.- Mining Class Contrast Functions by Gene Expression Programming.- McSOM: Minimal Coloring of Self-Organizing Map.- Chinese Blog Clustering by Hidden Sentiment Factors.- Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach.- Collaborative Filtering Recommendation Algorithm Using Dynamic Similar Neighbor Probability.- Calculating Similarity Efficiently in a Small World.- A Framework for Multi-Objective Clustering and Its Application to Co-Location Mining.- Feature Selection in Marketing Applications.- Instance Selection by Border Sampling in Multi-class Domains.- Virus Propagation and Immunization Strategies in Email Networks.- Semi-supervised Discriminant Analysis Based on Dependence Estimation.- Nearest Neighbor Tour Circuit Encryption Algorithm Based Random Isomap Reduction.- Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation.- Discovery of Correlated Sequential Subgraphs from a Sequence of Graphs.- Building a Text Classifier by a Keyword and Wikipedia Knowledge.- Discovery of Migration Habitats and Routes of Wild Bird Species by Clustering and Association Analysis.- GOD-CS: A New Grid-Oriented Dissection Clustering Scheme for Large Databases.- Study on Ensemble Classification Methods towards Spam Filtering.- Crawling Deep Web Using a New Set Covering Algorithm.- A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification.- A Potential-Based Node Selection Strategy for Influence Maximization in a Social Network.- A Novel Component-Based Model and Ranking Strategy in Constrained Evolutionary Optimization.- A Semi-supervised Topic-Driven Approach for Clustering Textual Answers to Survey Questions.- An Information-Theoretic Approach for Multi-task Learning.- Online New Event Detection Based on IPLSA.- Short Papers.- Discovering Knowledge from Multi-relational Data Based on Information Retrieval Theory.- A Secure Protocol to Maintain Data Privacy in Data Mining.- Transfer Learning with Data Edit.- Exploiting Temporal Authors Interests via Temporal-Author-Topic Modeling.- Mining User Position Log for Construction of Personalized Activity Map.- A Multi-Strategy Approach to KNN and LARM on Small and Incrementally Induced Prediction Knowledge.- Predicting Click Rates by Consistent Bipartite Spectral Graph Model.- Automating Gene Expression Annotation for Mouse Embryo.- Structure Correlation in Mobile Call Networks.- Initialization of the Neighborhood EM Algorithm for Spatial Clustering.- Classification Techniques for Talent Forecasting in Human Resource Management.- A Combination Classification Algorithm Based on Outlier Detection and C4.5.- A Local Density Approach for Unsupervised Feature Discretization.- A Hybrid Method of Multidimensional Scaling and Clustering for Determining Genetic Influence on Phenotypes.- Mining Frequent Patterns from Network Data Flow.- Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning.- Crawling and Extracting Process Data from the Web.- Asymmetric Feature Selection for BGP Abnormal Events Detection.- Analysis and Experimentation of Grid-Based Data Mining with Dynamic Load Balancing.- Incremental Document Clustering Based on Graph Model.- Evaluating the Impact of Missing Data Imputation.- Discovery of Significant Classification Rules from Incrementally Inducted Decision Tree Ensemble for Diagnosis of Disease.- Application of the Cross-Entropy Method to Dual Lagrange Support Vector Machine.- A Predictive Analysis on Medical Data Based on Outlier Detection Method Using Non-Reduct Computation.- VisNetMiner: An Integration Tool for Visualization and Analysis of Networks.- Anomaly Detection Using Time Index Differences of Identical Symbols with and without Training Data.- An Outlier Detection Algorithm Based on Arbitrary Shape Clustering.- A Theory of Kernel Extreme Energy Difference for Feature Extraction of EEG Signals.- Semantic Based Text Classification of Patent Documents to a User-Defined Taxonomy.- Mining Compressed Repetitive Gapped Sequential Patterns Efficiently.- Mining Candlesticks Patterns on Stock Series: A Fuzzy Logic Approach.- JCCM: Joint Cluster Communities on Attribute and Relationship Data in Social Networks.- Similarity Evaluation of XML Documents Based on Weighted Element Tree Model.- Quantitative Comparison of Similarity Measure and Entropy for Fuzzy Sets.- Investigation of Damage Identification of 16Mn Steel Based on Artificial Neural Networks and Data Fusion Techniques in Tensile Test.- OFFD: Optimal Flexible Frequency Discretization for Naïve Bayes Classification.- Handling Class Imbalance Problems via Weighted BP Algorithm.- Orthogonal Centroid Locally Linear Embedding for Classification.- CCBitmaps: A Space-Time Efficient Index Structure for OLAP.- Rewriting XPath Expressions Depending on Path Summary.- Combining Statistical Machine Learning Models to Extract Keywords from Chinese Documents.- Privacy-Preserving Distributed k-Nearest Neighbor Mining on Horizontally Partitioned Multi-Party Data.- Alleviating Cold-Start Problem by Using Implicit Feedback.- Learning from Video Game: A Study of Video Game Play on Problem-Solving.- Image Classification Approach Based on Manifold Learning in Web Image Mining.- Social Influence and Role Analysis Based on Community Structure in Social Network.- Feature Selection Method Combined Optimized Document Frequency with Improved RBF Network.