Introduction to Bioinformatics with R: A Practical Guide for Biologists / Edition 1 available in Hardcover, Paperback, eBook

Introduction to Bioinformatics with R: A Practical Guide for Biologists / Edition 1
- ISBN-10:
- 1138495719
- ISBN-13:
- 9781138495715
- Pub. Date:
- 11/03/2020
- Publisher:
- CRC Press
- ISBN-10:
- 1138495719
- ISBN-13:
- 9781138495715
- Pub. Date:
- 11/03/2020
- Publisher:
- CRC Press

Introduction to Bioinformatics with R: A Practical Guide for Biologists / Edition 1
Buy New
$76.99-
SHIP THIS ITEMIn stock. Ships in 1-2 days.PICK UP IN STORE
Your local store may have stock of this item.
Available within 2 business hours
Overview
Key Features:
· Provides a practical course in computational data analysis suitable for students or researchers with no previous exposure to computer programming.
· Describes in detail the theoretical basis for statistical analysis techniques used throughout the textbook, from basic principles
· Presents walk-throughs of data analysis tasks using R and example datasets. All R commands are presented and explained in order to enable the reader to carry out these tasks themselves.
· Uses outputs from a large range of molecular biology platforms including DNA methylation and genotyping microarrays; RNA-seq, genome sequencing, ChIP-seq and bisulphite sequencing; and high-throughput phenotypic screens.
· Gives worked-out examples geared towards problems encountered in cancer research, which can also be applied across many areas of molecular biology and medical research.
This book has been developed over years of training biological scientists and clinicians to analyse the large datasets available in their cancer research projects. It is appropriate for use as a textbook or as a practical book for biological scientists looking to gain bioinformatics skills.
Product Details
ISBN-13: | 9781138495715 |
---|---|
Publisher: | CRC Press |
Publication date: | 11/03/2020 |
Series: | Chapman & Hall/CRC Computational Biology Series |
Pages: | 310 |
Product dimensions: | 6.12(w) x 9.19(h) x (d) |
About the Author
Table of Contents
Acknowledgements xi
1 Introduction 1
1.1 Why informatics is important for biologists 1
1.2 How to use this book 2
2 Introduction to R 5
2.1 Obtaining R 5
2.1.1 Downloading R 5
2.1.2 Installing R 6
2.2 R console 6
2.2.1 Starting the R console 7
2.3 The R workspace 7
2.3.1 Creating/deleting objects 8
2.3.2 The working directory 8
2.4 Data handling 10
2.4.1 Basic data types 10
2.4.2 Vectors 11
2.4.3 Arrays 11
2.4.4 Lists 12
2.4.5 Data frames 14
2.4.6 Data input/output 15
2.5 More advanced concepts: Scripts and functions 16
2.5.1 Simple scripts 16
2.5.2 Functions 17
2.5.3 Using 'apply' 19
2.5.3.1 Apply 19
2.5.3.2 Sapply 20
2.5.3.3 Lapply 22
2.5.3.4 Mapply 23
2.6 Plots 24
2.6.1 Simple scatterplot 24
2.6.2 Arguments of plot () 25
2.6.3 Multiple plots on one graph 25
2.6.4 Scatterplots of multiple variables 25
2.6.5 Box plots 25
2.6.6 Saving images to file 27
2.7 More advanced graphics with ggplot2 27
2.8 Using R help 30
3 An Introduction to LINUX for Biological Research 31
3.1 UNIX 31
3.2 Linux survival guide 32
3.3 Useful dependencies and programs 37
4 Statistical Methods for Data Analysis 39
4.1 What are statistical methods, and why do we use them in biological research? 39
4.1.1 A worked example 40
4.1.2 A brief summary 43
4.2 What do I need to understand statistics? 43
4.2.1 Probability 43
4.2.1.1 Random variables 43
4.2.1.2 Probability distributions 45
4.2.1.3 Hypothesis testing 47
4.2.2 Linear algebra 52
4.2.3 Summary 53
4.3 Normalization: Removing technical variation 53
4.3.1 Centering and scaling 55
4.3.2 An illustrative example 58
4.3.3 Quantile normalization 59
4.3.4 Batch effects 59
4.4 Correlation 60
4.4.1 Pearson correlation coefficient 60
4.4.2 Spearman's rank correlation 61
4.4.3 Examples 61
4.5 Clustering 65
4.5.1 Clustering illustration using R 66
4.6 Linear regression models 69
4.6.1 Limma 72
4.6.1.1 Installing limma 73
4.6.1.2 Categorical explanatory variables 73
4.6.1.3 Continuous explanatory variables 76
4.7 Multiple hypothesis testing 78
4.8 Survival analysis 79
4.8.1 Kaplan-Meier plots 79
4.8.2 Cox proportional hazards regression models 81
4.9 Projection methods 81
4.9.1 PCA 82
4.9.2 PLS 85
4.10 Resampling: Permutation tests and the bootstrap 86
4.11 Stability and robustness 87
4.12 Summary 87
5 Analyzing Generic Tabular Numeric Datasets in R 89
5.1 Introduction 89
5.2 Loading data into R 89
5.3 Data visualisation 92
5.3.1 Scatter plots 92
5.3.2 Box plots 93
5.3.3 Bar charts 94
5.4 Correlation and clustering 94
5.4.1 Correlation 95
5.4.2 Clustering 98
5.4.3 Heatmaps 101
5.5 Statistical analysis using linear models 103
5.5.1 Comparison of two groups 104
5.5.2 Alternative models 106
5.6 Summary 107
6 Functional Enrichment Analysis 109
6.1 Introduction 109
6.2 Loading gene sets into R 109
6.3 Over-representation 112
6.3.1 Online tools 113
6.3.2 Testing gene sets in R 113
6.4 Systematic enrichment 117
6.4.1 Online tools 117
6.4.2 Testing gene sets in R 117
6.5 Summary 120
7 Integrating Multiple Datasets in R 121
7.1 Introduction 121
7.2 Data import 123
7.3 Exploratory data analysis 123
7.4 Integrating multiple datasets 131
7.4.1 Survival analysis 134
7.5 Multiple molecular endpoints 141
7.6 Summary 143
8 Analyzing Microarray Data in R 145
8.1 Bioconductor 146
8.2 Accessing microarray data from GEO 147
8.3 Single-channel array analysis 148
8.4 Loading data 148
8.5 Data visualisation 149
8.5.1 Image plots 150
8.5.2 MA plots 151
8.5.3 Scatterplots 151
8.5.4 Box plots 153
8.6 Normalizing data 155
8.7 Differential expression (linear models) 158
8.7.1 Design matrix 159
8.7.2 Fitting linear models 160
8.7.3 Making use of the results 161
8.7.4 Postscript: Assumptions 164
8.8 Clustering and correlation 164
8.8.1 Expression profiles 164
8.8.2 Correlation 165
8.9 Clustering 169
8.9.1 Filtering 171
8.10 Survival analysis 175
8.10.1 Kaplan-Meier plots 178
8.10.2 Cox proportional hazards regression 183
8.11 Footnote: Correlation to explore associated functions 187
9 Analyzing DNA Methylation Microarray Data in R 189
9.1 Introduction 189
9.2 Importing raw data 190
9.3 Quality control 191
9.4 Normalization and estimating methylation level 193
9.5 Analyzing beta values 194
9.6 Using previously preprocessed data 197
9.7 Further analyses using minfi 200
10 DNA Analysis with Microarrays 203
10.1 Introduction 203
10.2 Genotyping 203
10.2.1 Normalization 204
10.2.2 Genotype calling 205
10.2.3 Downstream analysis: Genome-wide association tests 208
10.3 Copy number analysis 210
10.3.1 Normalization 211
10.3.2 Copy number estimation 212
10.3.3 Segmentation 212
10.3.3.1 Hidden Markov model 213
10.3.3.2 Circular binary segmentation 216
10.3.4 Downstream analysis 217
10.3.4.1 Mapping CNA data to genes 217
10.3.4.2 Finding frequently-mutated genes 220
10.4 Summary 221
11 Working with Sequencing Data 223
11.1 Introduction 223
11.2 Sequence data analysis tasks 224
11.3 Quality control 224
11.3.1 Base call quality filtering 226
11.3.2 Adapter trimming 228
11.4 Alignment 230
11.4.1 Bowtie 231
11.4.2 BWA 232
11.4.3 Post-alignment filtering 233
11.4.4 Removing duplicate reads 233
11.5 Obtaining sequencing data from the SRA 235
12 Genomic Sequence Profiling 239
12.1 Introduction 239
12.2 SNV: Single nucleotide variants 239
12.3 Variant filtering and annotation 241
12.4 Indels: Short insertions and deletions 244
12.5 SV: Structural variants 245
12.6 Making use of variant calls 246
12.7 Summary 256
13 ChIP-seq 259
13.1 Introduction 259
13.2 Cross-correlation 259
13.3 Filtering blacklisted reads 263
13.4 Peak calling 263
13.5 Peak annotation 265
13.6 Quantitative comparisons of ChIP-seq libraries 267
13.7 Summary 270
14 RNA-seq 271
14.1 Introduction 271
14.2 Obtaining RNA-seq data from GEO 272
14.3 Transcript quantification via pseudoalignment 273
14.3.1 Building a transcript index 273
14.3.2 Quantifying transcripts using reads 274
14.3.3 Downstream analysis 275
14.4 Analysis with transcriptome assembly 278
14.4.1 Building the transcriptome directly 279
14.4.2 Transcript quantification 280
14.4.3 Downstream analysis 282
14.5 Summary 285
15 Bisulphite Sequencing 287
15.1 Introduction 287
15.2 Alignment and methylation calls 289
15.3 Downstream analysis 290
15.4 Summary 293
16 Final Notes 295
Index 297