Table of Contents
Preface xv
Acknowledgments xvii
1 A brief introduction 1
1.1 A note on exploratory data analysis 3
1.2 Computing considerations and software 4
1.3 A brief outline of the book 5
1.4 Datasets and case studies 7
2 Genomics basics 11
2.1 Genes 11
2.2 DNA 12
2.3 Gene expression 13
2.4 Hybridization assays and other laboratory techniques 15
2.5 The human genome 16
2.6 Genome variations and their consequences 18
2.7 Genomics 19
2.8 The role of genomics in pharmaceutical and research and clinical practice 20
2.9 Proteins 23
2.10 Bioinformatics 23
3 Microarrays 27
3.1 Types of microarray experiments 28
3.2 A very simple hypothetical microarray experiment 32
3.3 A typical microarray experiment 34
3.4 Multichannel cDNA microarrays 38
3.5 Oligonucleotide microarrays 38
3.6 Bead based arrays 40
3.7 Confirmation of microarray results 40
4 Processing the scanned image 43
4.1 Converting the scanned image to the spotted image 44
4.2 Quality assessment 47
4.3 Adjusting for background 53
4.4 Expression level calculation for twochannel cDNA microarrays 56
4.5 Expression level calculation for oligonucleotide microarrays 58
5 Preprocessing microarray data 65
5.1 Logarithmic transformation 66
5.2 Variance stabilizing transformations 66
5.3 Sources of bias 68
5.4 Normalization 69
5.5 Intensity dependent normalization 70
5.6 Judging the success of a normalization 81
5.7 Outlier identification 83
5.8 Nonresistant rules for outlier identification 83
5.9 Resistant rules for outlier identification 83
5.10 Assessing replicate array quality 84
6 Summarization 95
6.1 Replication 95
6.2 Technical replicates 96
6.3 Biological replicates 100
6.4 Biological replicates 100
6.5 Multiple oligonucleotide arrays 102
6.6 Estimating fold change in twochannel experiments 104
6.7 Bayes estimation of fold change 105
6.8 Estimating fold change Affymetrix data 106
6.9 RMA Summarization of multiple oligonucleotide arrays revisited 107
6.10 FARMS summarization. 108
7 Two group comparative experiments 119
7.1 Basics of statistical hypothesis testing 120
7.2 Fold changes 123
7.3 The two sample t test 123
7.4 Diagnostic checks 127
7.5 Robust t tests 129
7.6 The Mann Whitney Wilcox on rank sum test 130
7.7 Multiplicity 132
7.8 The false discovery rate 135
7.9 Resampling based Multiple Testing Procedures 138
7.10 Small variance adjusted t tests and SAM 140
7.11 Conditional t 146
7.12 Borrowing strength across genes 149
7.13 Twochannel experiments 151
7.14 Filtering 153
8 Model based inference and experimental design considerations 177
8.1 The F test 178
8.2 The basic linear model 179
8.3 Fitting the model in two stages 181
8.4 Multichannel experiments 182
8.5 Experimental design considerations 183
8.6 Miscellaneous issues 187
8.7 Model based analysis of Affymetrix arrays 188
9 Analysis of gene sets 211
9.1 Methods for identifying enriched gene sets 213
9.2 ORA and Fisher’s exact test 217
9.3 Interpretation of results 217
9.4 Example 217
10 Pattern discovery 221
10.1 Initial considerations 222
10.2 Cluster analysis 223
10.3 Seeking patterns visually 241
10.4 Biclustering 254
11 Class prediction 263
11.1 Initial considerations 264
11.2 Linear Discriminant Analysis 269
11.3 Extensions of Fisher’s LDA 275
11.4 Penalized methods 278
11.5 Nearest neighbors 279
11.6 Recursive partitioning 280
11.7 Ensemble methods 285
11.8 Enriched ensemble classifiers 288
11.9 Neural networks 288
11.10 Support Vector Machines 289
11.11 Generalized enriched methods 291
11.12 Integration of genome information 301
12 Protein arrays 307
12.1 Introduction 307
12.2 Protein array experiments 308
12.3 Special issues with protein arrays 310
12.4 Analysis 310
12.5 Using antibody antigen arrays to measure protein concentrations 311
References 317
Index 337