Building Bioinformatics Solutions: with Perl, R and MySQL

Building Bioinformatics Solutions: with Perl, R and MySQL

by Conrad Bessant, Ian Shadforth, Darren Oakly
     
 

View All Available Formats & Editions

ISBN-10: 0199230196

ISBN-13: 9780199230198

Pub. Date: 02/28/2009

Publisher: Oxford University Press

Modern bioinformatics encompasses a broad and ever-changing range of activites involved with the management and analysis of data from molecular biology experiments. Despite the diversity of activities and applications, the basic methodology and core tools needed to tackle bioinformatics problems is common to many projects. Building Bioinformatics Solutions provides a

Overview

Modern bioinformatics encompasses a broad and ever-changing range of activites involved with the management and analysis of data from molecular biology experiments. Despite the diversity of activities and applications, the basic methodology and core tools needed to tackle bioinformatics problems is common to many projects. Building Bioinformatics Solutions provides a comprehensive introduction to this methodology, explaining how to acquire and use the most popular development tools, how to apply them to build processing pipelines, and how to make the results available through visualisations and web-based services for deployment either locally or via the Internet. The main development tools covered in this book are the MySQL database management system, the Perl programming language, and the R language for statistical computing. These industry standard open source tools form the core of many bioinformatics projects, both in academia and industry. The methodologies introduced are platform independent, and all the examples that feature have been tested on Windows, Linux, and Mac OS.

Product Details

ISBN-13:
9780199230198
Publisher:
Oxford University Press
Publication date:
02/28/2009
Pages:
224
Product dimensions:
7.60(w) x 9.70(h) x 0.70(d)

Table of Contents

Acknowledgements v

Preface vii

1 Introduction 1

1.1 From data to knowledge: the aim of bioinformatics 1

1.2 Using this book 2

1.2.1 About the coverage of this book 2

1.2.2 Choice of tools 3

1.2.3 Choice of operating system 3

1.2.4 bixsolutions.net 4

1.2.5 Software engineering in bioinformatics 4

1.3 Principal applications of bioinformatics 5

1.3.1 Sequence analysis 5

1.3.2 Microarray data analysis 6

1.3.3 Proteomics 7

1.3.4 Metabolomics 7

1.3.5 Systems biology 8

1.3.6 Literature mining 8

1.3.7 Structural biology 9

1.4 Building bioinformatics solutions 9

1.5 Publicly available bioinformatics resources 10

1.5.1 Publicly available data 10

1.5.2 Publicly available analysis tools 14

1.6 Some computing practicalities 15

1.6.1 Hardware requirements 15

1.6.2 The command line 16

1.6.3 Case sensitivity 16

1.6.4 Security, firewalls, and administration rights 17

References 18

2 Building biological databases with MySQL 19

2.1 Common database types 20

2.1.1 Flat text files 20

2.1.2 XML 21

2.1.3 Relational databases 24

2.2 Relational database design-the 'natural' approach 27

2.2.1 Steps 1-3: gather, group, and name the data 28

2.2.2 Step 4: data types 33

2.2.3 Step 5: atomicity of data 37

2.2.4 Steps 6 and 7: indexing and linking tables 37

2.2.5 Departure from design 43

2.3 Installing and configuring a MySQL server 44

2.3.1 Download and installation 44

2.3.2 Creating a database and a user account 45

2.4 Alternatives to MySQL 46

2.4.1 PostgreSQL 46

2.4.2 Oracle 47

2.4.3 Microsoft Access 47

2.5 Database access using SQL 48

2.5.1 Compatibility between RDBMSs 48

2.5.2 Error messages 48

2.5.3Creating a database 49

2.5.4 Creating tables and enforcing referential integrity 50

2.5.5 Populating the database 52

2.5.6 Removing data and tables from the database 54

2.5.7 Creating and using source files 55

2.5.8 Querying the database 56

2.5.9 Transaction handling 63

2.5.10 Copying, moving, and backing up a database 65

2.6 Summary 66

References 66

3 Automating processes using Perl 67

3.1 Downloading and installing Perl 68

3.1.1 Getting Perl on Windows 68

3.1.2 Before getting started 69

3.2 Basic Perl syntax and logic 70

3.2.1 Scalar variables 72

3.2.2 Arrays 77

3.2.3 Hashes 81

3.2.4 Control structures and logic operators 84

3.2.5 Writing interactive programs-I/O basics 89

3.2.6 Some good coding practice 93

3.2.7 Summary 95

3.3 References 96

3.3.1 Multidimensional arrays 96

3.3.2 Multidimensional hashes 99

3.3.3 Viewing data structures with Data:: Dumper 102

3.4 Subroutines and modules 103

3.4.1 Making a Perl module 107

3.5 Regular expressions 108

3.5.1 Defining regular expressions 109

3.5.2 More advanced regular expressions 111

3.5.3 Regular expressions in practice 113

3.6 File handling and directory operations 115

3.6.1 Reading text files 115

3.6.2 Writing text files 116

3.6.3 Directory operations 117

3.7 Error handling 119

3.8 Retrieving files from the internet 120

3.8.1 Utilizing NCBI's eUtilities 122

3.9 Accessing relational databases using Perl DBI 124

3.9.1 Installing DBD:: MySQL 124

3.9.2 Connecting to database 126

3.9.3 Querying the database 127

3.9.4 Populating the database 129

3.9.5 Database transactions and error handling 130

3.10 Harnessing existing tools 131

3.10.1 CPAN 131

3.10.2 BioPerl 133

3.10.3 System commands 133

3.11 Alternatives to Perl 133

3.11.1 Python, Ruby and other scripting languages 134

3.11.2 Javaa, C/C++, and other compiled languages 134

3.11.3 Workflows 135

3.12 Summary 135

References 135

4 Numerical data analysis using R 137

4.1 Introduction to R 138

4.1.1 Downloading and installing R 139

4.1.2 Basic R concepts and syntax 140

4.1.3 Vectors and data frames 142

4.1.4 The nature of experimental data 145

4.1.5 R modes, objects, lists, classes, and methods 149

4.1.6 Importing data into R 153

4.1.7 Data visualization in R 154

4.1.8 Writing programs in R 160

4.2 Multivariate data analysis 164

4.2.1 Exploratory data analysis 165

4.2.2 Scatter plots 165

4.2.3 Principal components analysis 165

4.2.4 Hierarchical cluster analysis 167

4.2.5 Classification 171

4.3 R packages 172

4.3.1 Installing and using Bioconductor packages 173

4.3.2 The RMySQL package for database connectivity 178

4.3.3 Packages for multivariate classification 180

4.3.4 Writing your own R packages 181

4.3.5 Integrating Perl and R 181

4.4 Alternatives to R 182

4.4.1 S-Plus 182

4.4.2 Matlab 182

4.4.3 Octave 184

4.5 Summary 185

References 185

5 Programming for the Web 187

5.1 Introduction to web servers and Apache 187

5.1.1 Using the Apache web server 188

5.1.2 Apache fundamentals 189

5.2 Introduction to HTML 191

5.2.1 HTML versus XHTML 192

5.2.2 Creating and editing HTML/XHTML documents 193

5.2.3 The structure of a web page 193

5.2.4 XHTML tags and general formatting 194

5.2.5 An example web page 194

5.2.6 Web standards and browser compatbility 198

5.3 CGI programming using Perl 199

5.3.1 Debugging CGI programs 201

5.3.2 Adding dynamic content to web pages 202

5.3.3 Getting user input via forms 205

5.4 Advanced web techniques and languages 211

5.4.1 Cascading style sheets 211

5.4.2 JavaScript, JavaScript libraries, and Ajax 213

5.5 Data visualization with Perl and CGI 215

5.5.1 Using R graphics in Perl 215

5.5.2 Plotting graphs with GD::Graph 218

5.5.3 Plotting graphs with SVG::TT::Graph 224

5.5.4 Low level graphics in Perl 231

5.6 Summary 231

References 231

Appendix A Using command line interfaces 233

A.1 Getting to the operating system command line 233

A.2 General command line concepts 235

A.3 Command line tips 236

Index 239

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >