Sequence Comparison: Theory and Methods / Edition 1

Sequence Comparison: Theory and Methods / Edition 1

by Kun-Mao Chao, Louxin Zhang
     
 

View All Available Formats & Editions

ISBN-10: 1848003196

ISBN-13: 9781848003194

Pub. Date: 10/17/2008

Publisher: Springer London

Biomolecular sequence comparison is the origin of bioinformatics. Today, powerful sequence comparison methods, together with comprehensive biological databases, have changed the practice of molecular biology and genomics.

This in-depth, state-of-the-art study of sequence alignment and homology search, covers the full spectrum of the field - from alignment methods

Overview

Biomolecular sequence comparison is the origin of bioinformatics. Today, powerful sequence comparison methods, together with comprehensive biological databases, have changed the practice of molecular biology and genomics.

This in-depth, state-of-the-art study of sequence alignment and homology search, covers the full spectrum of the field - from alignment methods to the theory of scoring matrices and alignment score statistics. Following a comprehensive introduction, this useful text/reference focuses on algorithms and techniques, as well as discusses the theory. This easy-to-follow text examines alignment methods and techniques, features a new issue of sequence comparison (the spaced-seed technique), addresses several new flexible strategies for coping with various scoring schemes, and covers the theory on the significance of high-scoring segment pairs between two unalignment sequences. Useful appendices are provided for basic concepts in molecular biology, primer in statistics and software for sequence alignment. The book is written for readers with little or no knowledge of biology, algorithms and probability.

Features:

• Presents a rigorous yet reader-friendly text on the algorithmic techniques and mathematical foundations of sequence alignment and homology search

• Offers a tutorial to aid all levels of readers

• Covers the basic algorithms and methods for sequence alignment

• Introduces popular homology search programs

• Familiarizes readers with multiple sequence alignment

• Deals with the Karlin-Altschul statistics of optimal local alignment scores

• Discusses substitution matrices

• Provides end-of-chapter bibliographic notes and further reading suggestions that report related work and recent progresses

Based on lectures given to students studying bioinformatics and mathematics, this state-of-the-art study of sequence alignment and homology search is an ideal resource and toolkit for undergraduates and will appeal to biologists who wish to know how to use homology search tools more intelligently.

Product Details

ISBN-13:
9781848003194
Publisher:
Springer London
Publication date:
10/17/2008
Series:
Computational Biology Series, #7
Edition description:
2009
Pages:
209
Product dimensions:
6.40(w) x 9.30(h) x 0.80(d)

Table of Contents

Foreword vii

Preface ix

Acknowledgments xiii

About the Authors xv

1 Introduction 1

1.1 Biological Motivations 1

1.2 Alignment: A Model for Sequence Comparison 2

1.2.1 Definition 2

1.2.2 Alignment Graph 3

1.3 Scoring Alignment 7

1.4 Computing Sequence Alignment 8

1.4.1 Global Alignment Problem 9

1.4.2 Local Alignment Problem 10

1.5 Multiple Alignment 11

1.6 What Alignments Are Meaningful? 12

1.7 Overview of the Book 12

1.8 Bibliographic Notes and Further Reading 13

Part I Algorithms and Techniques 15

2 Basic Algorithmic Techniques 17

2.1 Algorithms and Their Complexity 18

2.2 Greedy Algorithms 18

2.2.1 Huffman Codes 19

2.3 Divide-and-Conquer Strategies 21

2.3.1 Mergesort 21

2.4 Dynamic Programming 23

2.4.1 Fibonacci Numbers 24

2.4.2 The Maximum-Sum Segment Problem 25

2.4.3 Longest Increasing Subsequences 27

2.4.4 Longest Common Subsequences 29

2.5 Bibliographic Notes and Further Reading 32

3 Pairwise Sequence Alignment 35

3.1 Introduction 36

3.2 Dot Matrix 37

3.3 Global Alignment 37

3.4 Local Alignment 42

3.5 Various Scoring Schemes 46

3.5.1 Affine Gap Penalties 46

3.5.2 Constant Gap Penalties 48

3.5.3 Restricted Affine Gap Penalties 48

3.6 Space-Saving Strategies 49

3.7 Other Advanced Topics 54

3.7.1 Constrained Sequence Alignment 54

3.7.2 Similar Sequence Alignment 56

3.7.3 Suboptimal Alignment 57

3.7.4 Robustness Measurement 59

3.8 Bibliographic Notes and Further Reading 60

4 Homology Search Tools 63

4.1 Finding Exact Word Matches 64

4.1.1 Hash Tables 64

4.1.2 Suffix Trees 66

4.1.3 Suffix Arrays 67

4.2 FASTA 68

4.3 BLAST 69

4.3.1 Ungapped BLAST 69

4.3.2 Gapped BLAST 72

4.3.3PSI-BLAST 73

4.4 BLAT 74

4.5 PatternHunter 75

4.6 Bibliographic Notes and Further Reading 78

5 Multiple Sequence Alignment 81

5.1 Aligning Multiple Sequences 81

5.2 Scoring Multiple Sequence Alignment 82

5.3 An Exact Method for Aligning Three Sequences 84

5.4 Progressive Alignment 85

5.5 Bibliographic Notes and Further Reading 86

Part II Theory 89

6 Anatomy of Spaced Seeds 91

6.1 Filtration Technique in Homology Search 92

6.1.1 Spaced Seed 92

6.1.2 Sensitivity and Specificity 92

6.2 Basic Formulas on Hit Probability 93

6.2.1 A Recurrence System for Hit Probability 95

6.2.2 Computing Non-Hit Probability 97

6.2.3 Two Inequalities 98

6.3 Distance between Non-Overlapping Hits 99

6.3.1 A Formula for [mu subscript pi] 100

6.3.2 An Upper Bound for [mu subscript pi] 101

6.3.3 Why Do Spaced Seeds Have More Hits? 103

6.4 Asymptotic Analysis of Hit Probability 104

6.4.1 Consecutive Seeds 104

6.4.2 Spaced Seeds 107

6.5 Spaced Seed Selection 110

6.5.1 Selection Methods 110

6.5.2 Good Spaced Seeds 111

6.6 Generalizations of Spaced Seeds 112

6.6.1 Transition Seeds 112

6.6.2 Multiple Spaced Seeds 114

6.6.3 Vector Seed 115

6.7 Bibliographic Notes and Further Reading 115

7 Local Alignment Statistics 119

7.1 Introduction 120

7.2 Ungapped Local Alignment Scores 122

7.2.1 Maximum Segment Scores 123

7.2.2 E-value and P-value Estimation 128

7.2.3 The Number of High-Scoring Segments 130

7.2.4 Karlin-Altschul Sum Statistic 131

7.2.5 Local Ungapped Alignment 132

7.2.6 Edge Effects 133

7.3 Gapped Local Alignment Scores 134

7.3.1 Effects of Gap Penalty 134

7.3.2 Estimation of Statistical Parameters 135

7.3.3 Statistical Parameters for BLOSUM and PAM Matrices 138

7.4 BLAST Database Search 139

7.4.1 Calculation of P-values and E-values 140

7.4.2 BLAST Printouts 142

7.5 Bibliographic Notes and Further Reading 146

8 Scoring Matrices 149

8.1 The PAM Scoring Matrices 150

8.2 The BLOSUM Scoring Matrices 153

8.3 General Form of the Scoring Matrices 155

8.4 How to Select a Scoring Matrix? 157

8.5 Compositional Adjustment of Scoring Matrices 158

8.6 DNA Scoring Matrices 161

8.7 Gap Cost in Gapped Alignments 163

8.8 Bibliographic Notes and Further Reading 164

A Basic Concepts in Molecular Biology 173

A.1 The Nucleic Acids: DNA and RNA 173

A.2 Proteins 174

A.3 Genes 175

A.4 The Genomes 175

B Elementary Probability Theory 177

B.1 Events and Probabilities 177

B.2 Random Variables 178

B.3 Major Discrete Distributions 179

B.3.1 Bernoulli Distribution 179

B.3.2 Binomial Distribution 180

B.3.3 Geometric and Geometric-like Distributions 180

B.3.4 The Poisson Distribution 180

B.3.5 Probability Generating Function 181

B.4 Major Continuous Distributions 182

B.4.1 Uniform Distribution 182

B.4.2 Exponential Distribution 182

B.4.3 Normal Distribution 183

B.5 Mean, Variance, and Moments 183

B.5.1 The Mean of a Random Variable 183

B.5.2 The Variance of a Random Variable 185

B.5.3 The Moment-Generating Function 186

B.6 Relative Entropy of Probability Distributions 187

B.7 Discrete-time Finite Markov Chains 188

B.7.1 Basic Definitions 188

B.7.2 Markov Chains with No Absorbing States 189

B.7.3 Markov Chains with Absorbing States 190

B.7.4 Random Walks 191

B.7.5 High-Order Markov Chains 191

B.8 Recurrent Events and the Renewal Theorem 191

C Software Packages for Sequence Alignment 195

References 197

Index 207

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >