- Shopping Bag ( 0 items )
Overview of Knowledge Discovery in Traditional Chinese Medicine
As a complete medical knowledge system other than orthodox medicine, traditional Chinese medicine (TCM) has played an indispensable role in health care for Chinese people for thousands of years. The holistic and systematic ideas of TCM are essentially different from the thinking modes based on reductionism in Western medicine. With the development of modern science, people came to realize the limitations of reductionism and began to lay more emphasis on systematic thinking patterns, such as systems biology. Based on the methodology of holism, TCM plays a unique role in advancing the development of life science and medicine. Meanwhile, with the dramatic increase in the prevalence of chronic conditions, chemical medicines cannot totally satisfy the needs of health maintenance, disease prevention, and treatment. Human health demands the large-scale development and application of natural medicines, to which TCM experiences and knowledge can contribute a lot. The ever-increasing use of Chinese herbal medicine (CHM) and acupuncture worldwide is a good indication of the public interest in TCM.
Countless TCM practices and theoretical research over thousands of years accumulated a great deal of knowledge in the form of ancient books and literature. In China, the domestic collection of ancient books about TCM published before the Xinhai Revolution (1911) reaches 130,000 volumes. Besides, thousands of studies on TCM treatments are published yearly in journals all around the world. There were more than 600,000 journal articles during the period 1984–2005. With such a vast volume of TCM data, there is an urgent need to use these precious resources effectively and sufficiently. Besides, the last decade has been marked by unprecedented growth in both the production of biomedical data and the amount of published literature discussing it. Thus, it is an opportunity, but also a pressing need, to connect TCM with modern life science.
Knowledge discovery in databases (KDD) is one proper methodology to analyze and understand such huge amounts of data. As an interdisciplinary area between artificial intelligence, databases, statistics, and machine learning, the idea of KDD came into being in the late 1980s. The most prominent definition of KDD was proposed by Fayyad et al. in 1996. In that paper, KDD was defined as "the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data." This definition may also be applied to "data mining" (DM). Indeed, in the recent literature of DM and KDD, the terms are often used interchangeably or without distinction. However, according to classical KDD methodologies, DM is the knowledge extraction step in the KDD process, which also involves the selection and preprocessing of appropriate data from various sources and proper interpretation of the mining results. Typical DM methods include concept description, association rule mining, classification and prediction, clustering analysis, time-series analysis, text mining, and so forth. During the last two decades, the field of KDD has attracted considerable interest in numerous disciplines, ranging from telecommunications, banking, and marketing to scientific analysis. It is also the case within medical environments. The discipline of medicine deals with complex organisms, processes, and relations, and KDD methodology is particularly suitable to handle such complexity. Besides, the advent of computer-based patient records (CPRs) and data warehouses contribute greatly to the availability of medical data and offer voluminous data resources for KDD. Also, the need to increase medical knowledge of human beings pushes researchers to carry out knowledge discovery not only in CPRs and clinical warehouses but also in biomedical literature databases. The creation of new medical knowledge with DM techniques is listed as one of the 10 grand challenges of medicine by Altman. As Roddick et al. indicate, the application of KDD to medical datasets is a rewarding and highly challenging area. Due to the ever-increasing accumulation of biomedical data and the pressing demand to explore these resources, the methods of knowledge discovery have been widely applied to analyze medical information over the decades. Reviews of KDD in the medical area from different perspectives can be found in Refs. [9,11–16]. However, the topic of knowledge discovery in TCM (KDTCM) is not covered in these reviews.
Considering the fast-growing number of researches carried out on KDTCM, it is also necessary and helpful to provide an overview of recent KDTCM research. As a complementary medical system, TCM is quite different from Western medicine, both in practice and in theory. In view of the high domain-specificity of KDD technology, it is more necessary to gain an insight into KDTCM. Motivated by these needs, this chapter focuses on the introduction and summarization of existing work about KDTCM. Because a great amount of KDTCM work is reported only in Chinese literature, the literature search is conducted in both English and Chinese publications, and the major KDTCM studies published there are covered in this review. For each work, the KDD methods used in the study are introduced, as well as corresponding results. In particular, some studies with interesting results are highlighted, such as novel TCM paired drugs discovered by frequent itemset analysis, the laboratory-confirmed relationship between CRF gene and kidney YangXu syndrome discovered by text mining, the high proportion of toxic plants in the botanical family Ranunculaceae discovered by statistical analysis, and the association between the M-cholinoceptor blocking drug and Solanaceae discovered by association rule mining. The existing work in KDTCM demonstrates that the usage of KDD in TCM is both feasible and promising. Meanwhile, it should be noted that the TCM field is still nearly a piece of virgin soil with copious amounts of hidden gold as far as KDD methodology is concerned. To ease gold mining in this field, the future directions of KDTCM research are also provided in this article based on a discussion of existing work.
The rest of this chapter is arranged as follows. The prerequisite for applying KDD is the digitalization of the vast amount of data. Thus, an overview of currently available TCM data resources is first presented in Section 1.2. Subsequently, the review of KDTCM work is presented in four research subfields in Section 1.3, including KDD for the research of Chinese medical formulae (CMF), KDD for the research of CHM, KDD for TCM syndrome research, and KDD for TCM clinical diagnosis. Based on a discussion of these KDTCM studies, the current state and main problems of KDTCM work in each subfield are summarized in Section 1.4, and the future directions for each subfield are also presented. Finally, we conclude in Section 1.5.
1.2 The State of the Art of TCM Data Resources
Data availability is the first consideration before any knowledge discovery task can be undertaken. In this section, we introduce the current state of TCM data resources, especially those data resources focusing on TCM in particular.
As a significant part of complementary and alternative medicine (CAM), literature reporting TCM issues can be found in the main CAM databases, such as CAM on PubMed (Complementary and Alternative Medicine subset of PubMed), AMED (Allied and Complementary Medicine Database), CISCOM (Centralized Information Service for Complementary Medicine), and CAMPAIN (Complementary and Alternative Medicine and Pain Database). A more comprehensive list of TCM databases can be found in Ref. . Currently, the primary data resources specific to TCM include China TCM Patent Database (CTCMPD), TradiMed Database, TCM chemical database, and TCM-online Database System. CTCMPD has been established by Patent Data Research and Development Center, a subsidiary of the Intellectual Property Publishing House of the State Intellectual Property Office (SIPO) of China. More than 19,000 patent records and over 40,000 TCM formulae published from 1985 to the present are contained in CTCMPD. TradiMed Database was built by the Natural Product Research Institute at Seoul National University, Republic of Korea. Based on various Chinese and Korean medical classics, TradiMed represents a combination of traditional medicine knowledge and modern medicine. So far, TradiMed contains information of 3199 herbs, 11,810 formulae, 20,012 chemical compositions of herbs, and 4080 diseases. The TCM chemical database was developed by the National Key Laboratory of Bio-chemical Engineering at the Institute of Process Engineering, Chinese Academy of Sciences. This database contains detailed information of 9000 chemicals isolated from nearly 4000 natural sources used in TCM and provides in-depth bioactivity data for many of the compounds.
In this section, we place our emphasis on the TCM-online Database System. To the best of our knowledge, currently the TCM-online Database System is the largest TCM data collection in the world. The prototype of TCM-online was first built in the late 1990s. In 1998, the advanCed Computing aNd sysTem (CCNT) Lab in the College of Computer Science in Zhejiang University and China Academy of Traditional Chinese Medicine (CATCM) began to collaborate in building the scientific databases for TCM and established a unified web-accessible multidatabase query system TCMMDB that integrates 17 branches in the whole country. Through the input from nearly 300 scientists from more than 30 colleges, universities, and academies of TCM, this system has already integrated more than 50 databases, including the Traditional Chinese Medical Literature Analysis and Retrieval System (TCMLARS), Traditional Chinese Drug Database (TCDBASE), and Database of Chinese Medical Formula. TCMMDB was replaced by the Grid-based system TCM-Grid in 2002, which provides more powerful functions, such as dynamic registration, binding, and associated navigation. The TCM-Grid system was further extended to a semantic-based database Grid named DartGrid in 2002. At present, these databases are available as the TCM-online Database System via web site and CD-ROM versions. Besides, a large-scale ontology-based Unified TCM Language System (UTCMLS) has been developed to support concept-based information retrieval and information integration since 2001. All these efforts help to realize the organization, storage, and sharing of TCM data, which provide a feasible environment for the effective implementation of KDD technology.
Today, the TCM-online Database System integrates more than 50 TCM-related databases. The main databases are listed as below.
1.2.1 Traditional Chinese Medical Literature Analysis and Retrieval System
The bibliographic system TCMLARS has two versions. So far, the Chinese version contains over 600,000 TCM periodical articles, while the corresponding number reaches 92,000 in the English version of TCMLARS. The source material for the database is drawn from about 900 biomedical journals published in China since 1984. The main fields included in TCMLARS are similar to MEDLINE, such as title, author, journal title, publication year, and abstract. Besides, some fields specifically existing in TCM are also included, such as pharmacology of Chinese herbs, ingredients and dosage of formulae, drug compatibility, and acupuncture and Tuina points. TCMLARS is considered an important new asset in the literature review and meta-analysis of CHM by McCulloch et al. It also serves as a significant data resource for KDTCM, especially for the methods based on text mining.
Excerpted from Modern Computational Approaches To Traditional Chinese Medicine by Zhaohui Wu, Huajun Chen, Xiaohong Jiang. Copyright © 2012 Zhejiang University Press Co., Ltd.. Excerpted by permission of Elsevier.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.