Bioinformatics: The Machine Learning Approach

Bioinformatics: The Machine Learning Approach

Hardcover(Older Edition)

$44.47 $52.95 Save 16% Current price is $44.47, Original price is $52.95. You Save 16%.

Temporarily Out of Stock Online

Eligible for FREE SHIPPING

Overview

Bioinformatics: The Machine Learning Approach by Pierre Baldi, Soren Brunak

An unprecedented wealth of data is being generated by genome sequencing projects and other experimental efforts to determine the structure and function of biological molecules. The demands and opportunities for interpreting these data are expanding rapidly. Bioinformatics is the development and application of computer methods for management, analysis, interpretation, and prediction, as well as for the design of experiments. Machine learning approaches (e.g., neural networks, hidden Markov models, and belief networks) are ideally suited for areas where there is a lot of data but little theory, which is the situation in molecular biology. The goal in machine learning is to extract useful information from a body of data by building good probabilistic models--and to automate the process as much as possible.

In this book Pierre Baldi and Søren Brunak present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological data. The book is aimed both at biologists and biochemists who need to understand new data-driven algorithms and at those with a primary background in physics, mathematics, statistics, or computer science who need to know more about applications in molecular biology.

This new second edition contains expanded coverage of probabilistic graphical models and of the applications of neural networks, as well as a new chapter on microarrays and gene expression. The entire text has been extensively revised.

Product Details

ISBN-13: 9780262024426
Publisher: MIT Press
Publication date: 02/13/1998
Series: Adaptive Computation and Machine Learning Series
Edition description: Older Edition
Pages: 371
Product dimensions: 7.33(w) x 9.36(h) x 1.06(d)

About the Author

Pierre Baldi Professor Department Information and Computer Science (ICS), University of California at Irvine ( UCI) Director Institute for Genomics and Bioinformatics (IGB) - UCI

Søren Brunak is Director, Center for Biological Sequence Analysis, The Technical University of Denmark.

Table of Contents

Series Foreword ix(2)
Preface xi
1 Introduction
1(38)
1.1 Biological Data in Digital Symbol Sequences
1(4)
1.2 Genomes-Diversity, Size, and Structure
5(7)
1.3 Proteins and Proteomes
12(5)
1.4 On the Information Content of Biological Sequences
17(18)
1.5 Prediction of Molecular Function and Structure
35(4)
2 Machine Learning Foundations: The Probabilistic Framework
39(20)
2.1 Introduction: Bayesian Modeling
39(2)
2.2 The Cox-Jaynes Axioms
41(3)
2.3 Bayesian Inference and Induction
44(8)
2.4 Model Structures: Graphical Models and Other Tricks
52(4)
2.5 Summary
56(3)
3 Probabilistic Modeling and Inference: Examples
59(14)
3.1 The Simplest Sequence Models
59(6)
3.2 Statistical Mechanics
65(8)
4 Machine Learning Algorithms
73(18)
4.1 Introduction
73(1)
4.2 Dynamic Programming
74(1)
4.3 Gradient Descent
75(1)
4.4 EM/GEM Algorithms
76(3)
4.5 Markov Chain Monte Carlo Methods
79(4)
4.6 Simulated Annealing
83(2)
4.7 Evolutionary and Genetic Algorithms
85(1)
4.8 Learning Algorithms: Miscellaneous Aspects
86(5)
5 Neural Networks: The Theory
91(14)
5.1 Introduction
91(5)
5.2 Universal Approximation Properties
96(2)
5.3 Priors and Likelihoods
98(6)
5.4 Learning Algorithms: Backpropagation
104(1)
6 Neural Networks: Applications
105(38)
6.1 Sequence Encoding and Output Interpretation
106(5)
6.2 Prediction of Protein Secondary Structure
111(10)
6.3 Prediction of Signal Peptides and Their Cleavage Sites
121(4)
6.4 Applications for DNA and RNA Nucleotide Sequences
125(18)
7 Hidden Markov Models: The Theory
143(24)
7.1 Introduction
143(5)
7.2 Prior Information and Initialization
148(2)
7.3 Likelihood and Basic Algorithms
150(5)
7.4 Learning Algorithms
155(7)
7.5 Applications of HMMs: General Aspects
162(5)
8 Hidden Markov Models: Applications
167(34)
8.1 Protein Applications
167(18)
8.2 DNA and RNA Applications
185(13)
8.3 Conclusion: Advantages and Limitations of HMMs
198(3)
9 Hybrid Systems: Hidden Markov Models and Neural Networks
201(16)
9.1 Introduction to Hybrid Models
201(1)
9.2 The Single-Model Case
202(6)
9.3 The Multiple-Model Case
208(3)
9.4 Simulation Results
211(4)
9.5 Summary
215(2)
10 Probabilistic Models of Evolution: Phylogenetic Trees
217(12)
10.1 Introduction to Probabilistic Models of Evolution
217(2)
10.2 Substitution Probabilities and Evolutionary Rates
219(2)
10.3 Rates of Evolution
221(1)
10.4 Data Likelihood
222(3)
10.5 Optimal Trees and Learning
225(1)
10.6 Parsimony
225(2)
10.7 Extensions
227(2)
11 Stochastic Grammars and Linguistics
229(22)
11.1 Introduction to Formal Grammars
229(1)
11.2 Formal Grammars and the Chomsky Hierarchy
230(6)
11.3 Applications of Grammars to Biological Sequences
236(4)
11.4 Prior Information and Initialization
240(1)
11.5 Likelihood
241(1)
11.6 Learning Algorithms
242(2)
11.7 Applications of SCFGs
244(1)
11.8 Experiments
245(2)
11.9 Future Directions
247(4)
12 Internet Resources and Public Databases
251(20)
12.1 A Rapidly Changing Set of Resources
251(1)
12.2 Databases over Databases and Tools
252(1)
12.3 Databases over Databases
253(2)
12.4 Databases
255(6)
12.5 Sequence Similarity Searches
261(1)
12.6 Alignment
262(2)
12.7 Selected Prediction Servers
264(2)
12.8 Molecular Biology Software Links
266(2)
12.9 Ph.D. Courses over the Internet
268(1)
12.10 HMM/NN simulator
269(2)
13 A Statistics
271(10)
A.1 Decision Theory and Loss Functions
271(1)
A.2 Quadratic Loss Functions
272(1)
A.3 The Bias/Variance Trade-off
273(1)
A.4 Combining Estimators
274(1)
A.5 Error Bars
275(1)
A.6 Sufficient Statistics
275(1)
A.7 Exponential Family
276(1)
A.8 Gaussian Process Models
276(2)
A.9 Variational Methods
278(3)
B Information Theory, Entropy, and Relative Entropy
281(8)
B.1 Entropy
281(2)
B.2 Relative Entropy
283(1)
B.3 Mutual Information
284(1)
B.4 Jensen's Inequality
285(1)
B.5 Maximum Entropy
285(1)
B.6 Minimum Relative Entropy
286(3)
C Probabilistic Graphical Models
289(10)
C.1 Notation and Preliminaries
289(2)
C.2 The Undirected Case: Markov Random Fields
291(2)
C.3 The Directed Case: Bayesian Networks
293(6)
D HMM Technicalities, Scaling, Periodic Architectures, State Functions, and Dirichlet Mixtures
299(12)
D.1 Scaling
299(2)
D.2 Periodic Architectures
301(3)
D.3 State Functions: Bendability
304(2)
D.4 Dirichlet Mixtures
306(5)
E List of Main Symbols and Abbreviations
311(8)
References 319(28)
Index 347

What People are Saying About This

From the Publisher

"This is a very good book, written with a high level of erudition and insight."Gustavo A. Stolovitzky Physics Today

The MIT Press

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews