- Shopping Bag ( 0 items )
Want a NOOK? Explore Now
R is fast becoming the de facto standard for statistical computing and analysis in science, business, engineering, and related fields. This book examines this complex language using simple statistical examples, showing how R operates in a user-friendly context. Both students and workers in fields that require extensive statistical analysis will find this book helpful as they learn to use R for simple summary statistics, hypothesis testing, creating graphs, regression, and much more. It covers formula notation, complex statistics, manipulating data and extracting components, and rudimentary programming.
Beginning R offers anyone who needs to perform statistical analysis the information necessary to use R with confidence.
WHAT YOU WILL LEARN IN THIS CHAPTER:
* Discovering what R is
* How to get the R program
* How to install R on your computer
* How to start running the R program
* How to use the help system and find help from other sources
* How to get additional libraries of commands
R is more than just a program that does statistics. It is a sophisticated computer language and environment for statistical computing and graphics. R is available from the R-Project for Statistical Computing website (www.r-project.org), and following is some of its introductory material:
R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project webpage is the main site for information on R. At this site are directions for obtaining the software, accompanying packages, and other sources of documentation.
R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages." However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes.
Because R is a computer language, it functions slightly differently from most of the programs that users are familiar with. You have to type in commands, which are evaluated by the program and then executed. This sounds a bit daunting to many users, but the R language is easy to pick up and a lot of help is available. It is possible to copy and paste in commands from other applications (for example: word processors, spreadsheets, or web browsers) and this facility is very useful, especially if you keep notes as you learn. Additionally, the Windows and Macintosh versions of R have a graphical user interface (GUI) that can help with some of the basic tasks.
R can deal with a huge variety of mathematical and statistical tasks, and many users find that the basic installation of the program does everything they need. However, many specialized routines have been written by other users and these libraries of additional tools are available from the R website. If you need to undertake a particular type of analysis, there is a very good chance that someone before you also wanted to do that very thing and has written a package that you can download to allow you to do it.
R is open source, which means that it is continually being reviewed and improved. R runs on most computers—installations are available for Windows, Macintosh, and Linux. It also has good interoperability, so if you work on one computer and switch to another you can take your work with you.
R handles complex statistical approaches as easily as more simple ones. Therefore once you know the basics of the R language, you can tackle complex analyses as easily as simple ones (as usual it is the interpretation of results that can be the really hard bit).
GETTING THE HANG OF R
R is unlike most current computer programs in that you must type commands into the console window to carry out most tasks you require. Throughout the text, the use of these commands is illustrated, which is indeed the point of the book.
Where a command is illustrated in its basic form, you will see a fixed width font to mimic the R display like so:
help.start()
When the use of a particular command is illustrated, you will see the user-typed input illustrated by beginning the lines with the > character, which mimics the cursor line in the R console window like so:
> data1 = c(3, 5, 7, 5, 3, 2, 6, 8, 5, 6, 9)
Lines of text resulting from your actions are shown without the cursor character, once again mimicking the output that you would see from R itself:
> data1 [1] 3 5 7 5 3 2 6 8 5 6 9
So, in the preceding example the first line was typed by the user and resulted in the output shown in the second line. Keep these conventions in mind as you are reading this chapter and they will come into play as soon as you have R installed and are ready to begin using it!
The R Website
The R website at www.r-project.org is a good place to visit to obtain the R program. It is also a good place to look for help items and general documentation as well as additional libraries of routines. If you use Windows or a Mac, you will need to visit the site to download the R program and install it. You can also find installation files for many Linux versions on the R website.
The R website is split into several parts; links to each section are on the main page of the site. The two most useful for beginners are the Documentation and Download sections.
In the Documentation section (see Figure 1-1) a Manuals link takes you to many documents contributed to the site by various users. Most of these are in HTML and PDF format. You can access these and a variety of help guides under Manuals d Contributed Documentation. These are especially useful for helping the new user to get started. Additionally, a large FAQ section takes you to a list that can help you find answers to many question you might have. There is also a Wiki, and although this is still a work in progress, it is a good place to look for information on installing R on Linux systems.
In the Downloads section you will find the links from which you can download R. The following section goes into more detail on how to do this.
Downloading and Installing R from CRAN
The Comprehensive R Archive Network (CRAN) is a network of websites that host the R program and that mirror the original R website. The benefit of having this network of websites is improved download speeds. For all intents and purposes, CRAN is the R website and holds downloads (including old versions of software) and documentation (e.g. manuals, FAQs). When you perform searches for R-related topics on the internet, adding CRAN (or R) to your search terms increases your results. To get started downloading R, you'll want to perform the following steps:
1. Visit the main R web page (www.r-project.org); you see a Getting Started box with a link to download R (see Figure 1-2). Click that link and you are directed to select a local CRAN mirror site from which to download R.
2. The starting page of the CRAN website appears once you have selected your preferred mirror site. This page has a Software section on the left with several links. Choose the R Binaries link to install R on your computer (see Figure 1-3). You can also click the link to Packages, which contains libraries of additional routines. However, you can install these from within R so you can just ignore the Packages link for now. The Other link goes to a page that lists software available on CRAN other than the R base distribution and regular contributed extension packages. This link is also unnecessary for right now and can be ignored as well.
3. Once you click the R Binaries link you move to a simple directory containing folders for a variety of operating system (see Figure 1-4). Select the appropriate operating system on which you will be downloading R and follow the link to a page containing more information and the installation files that you require.
The details for individual operating systems vary, so the following sections are split into instructions for each of Windows, Macintosh, and Linux.
Installing R on Your Windows Computer
The install files for Windows come bundled in an .exe file, which you can download from the windows folder (refer to Figure 1-4). Downloading the .exe file is straightforward (see Figure 1-5), and you can install R simply by double-clicking the file once it is on your computer.
Run the installer with all the default settings and when it is done you will have R installed.
Versions of Windows post XP require some of additional steps to make R work properly. For Vista or later you need to alter the properties of the R program so that it runs with Administrator privileges. To do so, follow these steps:
1. Click the Windows button (this used to be labeled Start).
2. Select Programs.
3. Choose the R folder.
4. Right-click the R program icon to see an options menu (see Figure 1-6).
5. Select Properties from the menu. You will then see a new options window.
6. Under the Compatibility tab, tick the box in the Privilege Level section (see Figure 1-7) and click OK.
7. Run R by clicking the Programs menu, shortcut, or quick-launch icon like any other program. If the User Account Control window appears (see Figure 1-8), select Yes and R runs as normal.
Now R is set to run with administrator access and will function correctly. This is important, as you see later. R will save your data items and a history of the commands you used to the disk and it cannot do this without the appropriate access level.
Installing R on Your Macintosh Computer
The install files for OS X come bundled in a DMG file, which you can download from the macosx folder (refer to Figure 1-4).
Once the file has downloaded it may open as a disk image or not (depending how your system is set up). Once the DMG file opens you can double-click the installer file and installation will proceed (see Figure 1-9). Installation is fairly simple and no special options are required. Once installed, you can run R from Applications and place it in the dock like any other program.
Installing R on Your Linux Computer
If you are using a Linux OS, R runs through the Terminal program. Downloadable install files are available for many Linux systems on the R website (see Figure 1-10). The website also contains instructions for installation on several versions of Linux. Many Linux systems also support a direct installation via the Terminal.
The major Linux systems allow you to install the R program directly from the Terminal, and R files are kept as part of their software repositories. These repositories are not always very up-to-date however, so if you want to install the very latest version of R, look on the CRAN website for instructions and an appropriate install file. The exact command to install direct from the Terminal varies slightly from system to system, but you will not go far wrong if you open the Terminal and type R into it. If R is not installed (the most likely scenario), the Terminal may well give you the command you need to get it (see Figure 1-11)!
In general, a command along the following lines will usually do the trick:
sudo apt-get install r-base-core
In Ubuntu 10.10, for example, this installs everything you need to get started. In other systems you may need two elements to install, like so:
sudo apt-get install r-base r-base-dev
The basic R program and its components are built from the r-base part. For many purposes this is enough, but to gain access to additional libraries of routines the r-base-dev part is needed. Once you run these commands you will connect to the Internet and the appropriate files will be downloaded and installed.
Once R is installed it can be run through the Terminal program, which is found in the Accessories part of the Applications menu. In Linux there is no GUI, so all the commands must be typed into the Terminal window.
RUNNING THE R PROGRAM
Once R is installed you can run it in a variety of ways:
* In Windows the program works like any other—you may have a desktop shortcut, a quick launch icon, or simply get to it via the Start button and the regular program list.
* On a Macintosh the program is located in the Applications folder and you can drag this to the dock to create a launcher or create an alias in the usual manner.
* On Linux the program is launched via the Terminal program, which is located in the Accessories section of the Applications menu.
Once the R program starts up you are presented with the main input window and a short introductory message that appears a little different on each OS:
* In Windows a few menus are available at the top as shown in Figure 1-12.
* On the Macintosh OS X, the welcome message is the same (see Figure 1-13). In this case you also have some menus available and they are broadly similar to those in the Windows version. You also see a few icons; these enable you to perform a few tasks but are not especially useful. Under these icons is a search box, which is useful as an alternative to typing in help commands (you look at getting help shortly).
* In Linux systems there are no icons and the menu items you see relate to the Terminal program rather than R itself (see Figure 1-14).
R is a computer language, and like any other language you must learn the vocabulary and the grammar to make yourself understood and to carry out the tasks you want. Getting to know where help is available is a good starting point, and that is the subject of the next section.
FINDING YOUR WAY WITH R
Finding help when you are starting out can be a daunting prospect. A lot of material is available for help with R and tracking down the useful information can take a while. (Of course, this book is a good starting point!) In the following sections you see the most efficient ways to access some of the help that is available, including how to access additional libraries that you can use to deal with the tasks you have.
Getting Help via the CRAN Website and the Internet
The R website is a good place to find material that supports your learning of R. Under the Manuals link are several manuals available in HTML or as PDF. You'll also find some useful beginner's guides in the Contributed Documentation section. Different authors take different approaches, and you may find one suits you better than another. Try a few and see how you get on. Additionally, preferences will change as your command of the system develops. There is also a Wiki on the R website that is a good reference forum, which is continually updated.
The Help Command in R
R contains a lot of built-in help, and how this is displayed varies according to which OS you are using and the options (if any) that you set. The basic command to bring up help is:
help(topic)
Simply replace topic with the name of the item you want help on. You can also save a bit of typing by prefacing the topic with a question mark, like so:
?topic
You can also access the help system via your web browser by typing:
help.start()
This brings up the top-level index page where you can use the Search Engine & Keywords hyperlink to find what you need. This works for all the different operating systems. Of course, you need to know what command you are looking for to begin with. If you are not quite sure, you can use the following command:
apropos('partword')
This searches through the help files for matches to the word you typed, you replace 'partword' with the text you want to search for. Note that unlike the previous help() command you do need the quotes (single or double quotes are fine as long as they match).
Help for Windows Users
The Windows default help generally works fine (see Figure 1-15), but the Index and Search tabs only work within the section you are in, and it is not possible to get to the top level in the search hierarchy. If you return to the main command window and type in another help command, a new window opens so it is not possible to scroll back through entries unless they are in the same section.
Once you are done with your help window, you can close it by clicking the red X button.
Help for Macintosh Users
In OS X the default help appears in a separate window as HTML text (see Figure 1-16). The help window acts like a browser and you can use the arrow buttons to return to previous topics if you follow hyperlinks. You can also type search terms into the search box.
Scrolling to the foot of the help entry enables you to jump to the index for that section (Figure 1-17). Once at the index you can jump further up the hierarchy to reach other items.
The top level you can reach is identical to the HTML version of the help that you get if you type the help.start() command (see Figure 1-18), except that it is in a dedicated help window rather than your browser.
Once you are finished you can close the window in the usual manner by clicking the red button. If you return to the main command window and type another help item, the original window alters to display the new help. You can return to the previous entries using the arrow buttons at the top of the help window.
(Continues...)
Excerpted from Beginning R by Mark Gardener Copyright © 2012 by John Wiley & Sons, Ltd. Excerpted by permission of John Wiley & Sons. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Chapter 1: Introducing R: What It Is and How to Get It 1
Getting the Hang of R 2
The R Website 3
Downloading and Installing R from CRAN 3
Installing R on Your Windows Computer 4
Installing R on Your Macintosh Computer 7
Installing R on Your Linux Computer 7
Running the R Program 8
Finding Your Way with R 10
Getting Help via the CRAN Website and the Internet 10
The Help Command in R 10
Help for Windows Users 11
Help for Macintosh Users 11
Help for Linux Users 13
Help For All Users 13
Anatomy of a Help Item in R 14
Command Packages 16
Standard Command Packages 16
What Extra Packages Can Do for You 16
How to Get Extra Packages of R Commands 18
How to Install Extra Packages for Windows Users 18
How to Install Extra Packages for Macintosh Users 18
How to Install Extra Packages for Linux Users 19
Running and Manipulating Packages 20
Loading Packages 21
Windows-Specific Package Commands 21
Macintosh-Specific Package Commands 21
Removing or Unloading Packages 22
Summary 22
Chapter 2: Starting Out: Becoming Familiar with R 25
Some Simple Math 26
Use R Like a Calculator 26
Storing the Results of Calculations 29
Reading and Getting Data into R 30
Using the combine Command for Making Data 30
Entering Numerical Items as Data 30
Entering Text Items as Data 31
Using the scan Command for Making Data 32
Entering Text as Data 33
Using the Clipboard to Make Data 33
Reading a File of Data from a Disk 35
Reading Bigger Data Files 37
The read.csv() Command 37
Alternative Commands for Reading Data in R 39
Missing Values in Data Files 40
Viewing Named Objects 41
Viewing Previously Loaded Named-Objects 42
Viewing All Objects 42
Viewing Only Matching Names 42
Removing Objects from R 44
Types of Data Items 45
Number Data 45
Text Items 45
Converting Between Number and Text Data 46
The Structure of Data Items 47
Vector Items 48
Data Frames 48
Matrix Objects 49
List Objects 49
Examining Data Structure 49
Working with History Commands 51
Using History Files 52
Viewing the Previous Command History 52
Saving and Recalling Lists of Commands 52
Alternative History Commands in Macintosh OS 52
Editing History Files 53
Saving Your Work in R 54
Saving the Workspace on Exit 54
Saving Data Files to Disk 54
Save Named Objects 54
Save Everything 55
Reading Data Files from Disk 56
Saving Data to Disk as Text Files 57
Writing Vector Objects to Disk 58
Writing Matrix and Data Frame Objects to Disk 58
Writing List Objects to Disk 59
Converting List Objects to Data Frames 60
Summary 61
Chapter 3: Starting Out: Working With Objects 65
Manipulating Objects 65
Manipulating Vectors 66
Selecting and Displaying Parts of a Vector 66
Sorting and Rearranging a Vector 68
Returning Logical Values from a Vector 70
Manipulating Matrix and Data Frames 70
Selecting and Displaying Parts of a Matrix or Data Frame 71
Sorting and Rearranging a Matrix or Data Frame 74
Manipulating Lists 76
Viewing Objects within Objects 77
Looking Inside Complicated Data Objects 77
Opening Complicated Data Objects 78
Quick Looks at Complicated Data Objects 80
Viewing and Setting Names 82
Rotating Data Tables 86
Constructing Data Objects 86
Making Lists 87
Making Data Frames 88
Making Matrix Objects 89
Re-ordering Data Frames and Matrix Objects 92
Forms of Data Objects: Testing and Converting 96
Testing to See What Type of Object You Have 96
Converting from One Object Form to Another 97
Convert a Matrix to a Data Frame 97
Convert a Data Frame into a Matrix 98
Convert a Data Frame into a List 99
Convert a Matrix into a List 100
Convert a List to Something Else 100
Summary 104
Chapter 4: Data: Descriptive Statistics and Tabulation 107
Summary Commands 108
Summarizing Samples 110
Summary Statistics for Vectors 110
Summary Commands With Single Value Results 110
Summary Commands With Multiple Results 113
Cumulative Statistics 115
Simple Cumulative Commands 115
Complex Cumulative Commands 117
Summary Statistics for Data Frames 118
Generic Summary Commands for Data Frames 119
Special Row and Column Summary Commands 119
The apply() Command for Summaries on Rows or Columns 120
Summary Statistics for Matrix Objects 120
Summary Statistics for Lists 121
Summary Tables 122
Making Contingency Tables 123
Creating Contingency Tables from Vectors 123
Creating Contingency Tables from Complicated Data 123
Creating Custom Contingency Tables 126
Creating Contingency Tables from Matrix Objects 128
Selecting Parts of a Table Object 130
Converting an Object into a Table 132
Testing for Table Objects 133
Complex (Flat) Tables 134
Making “Flat” Contingency Tables 134
Making Selective “Flat” Contingency Tables 138
Testing “Flat” Table Objects 139
Summary Commands for Tables 139
Cross Tabulation 142
Testing Cross-Table (xtabs) Objects 144
A Better Class Test 144
Recreating Original Data from a Contingency Table 145
Switching Class 146
Summary 147
Chapter 5: Data: Distrib ution 151
Looking at the Distribution of Data 151
Stem and Leaf Plot 152
Histograms 154
Density Function 158
Using the Density Function to Draw a Graph 159
Adding Density Lines to Existing Graphs 160
Types of Data Distribution 161
The Normal Distribution 161
Other Distributions 164
Random Number Generation and Control 166
Random Numbers and Sampling 168
The Shapiro-Wilk Test for Normality 171
The Kolmogorov-Smirnov Test 172
Quantile-Quantile Plots 174
A Basic Normal Quantile-Quantile Plot 174
Adding a Straight Line to a QQ Plot 174
Plotting the Distribution of One Sample Against Another 175
Summary 177
Chapter 6: Si mple Hypothesis Testing 181
Using the Student’s t-test 181
Two-Sample t-Test with Unequal Variance 182
Two-Sample t-Test with Equal Variance 183
One-Sample t-Testing 183
Using Directional Hypotheses 183
Formula Syntax and Subsetting Samples in the t-Test 184
The Wilcoxon U-Test (Mann-Whitney) 188
Two-Sample U-Test 189
One-Sample U-Test 189
Using Directional Hypotheses 189
Formula Syntax and Subsetting Samples in the U-test 190
Paired t- and U-Tests 193
Correlation and Covariance 196
Simple Correlation 197
Covariance 199
Significance Testing in Correlation Tests 199
Formula Syntax 200
Tests for Association 203
Multiple Categories: Chi-Squared Tests 204
Monte Carlo Simulation 205
Yates’ Correction for 2 n 2 Tables 206
Single Category: Goodness of Fit Tests 206
Summary 210
Chapter 7: Introduction to Graphical Analysis 215
Box-whisker Plots 215
Basic Boxplots 216
Customizing Boxplots 217
Horizontal Boxplots 218
Scatter Plots 222
Basic Scatter Plots 222
Adding Axis Labels 223
Plotting Symbols 223
Setting Axis Limits 224
Using Formula Syntax 225
Adding Lines of Best-Fit to Scatter Plots 225
Pairs Plots (Multiple Correlation Plots) 229
Line Charts 232
Line Charts Using Numeric Data 232
Line Charts Using Categorical Data 233
Pie Charts 236
Cleveland Dot Charts 239
Bar Charts 245
Single-Category Bar Charts 245
Multiple Category Bar Charts 250
Stacked Bar Charts 250
Grouped Bar Charts 250
Horizontal Bars 253
Bar Charts from Summary Data 253
Copy Graphics to Other Applications 256
Use Copy/Paste to Copy Graphs 257
Save a Graphic to Disk 257
Windows 257
Macintosh 258
Linux 258
Summary 259
Chapter 8: Formula Notation and Complex Statistic s 263
Examples of Using Formula Syntax for Basic Tests 264
Formula Notation in Graphics 266
Analysis of Variance (ANOVA) 268
One-Way ANOVA 268
Stacking the Data before Running Analysis of Variance 269
Running aov() Commands 270
Simple Post-hoc Testing 271
Extracting Means from aov() Models 271
Two-Way ANOVA 273
More about Post-hoc Testing 275
Graphical Summary of ANOVA 277
Graphical Summary of Post-hoc Testing 278
Extracting Means and Summary Statistics 281
Model Tables 281
Table Commands 283
Interaction Plots 283
More Complex ANOVA Models 289
Other Options for aov() 290
Replications and Balance 290
Summary 292
Chapter 9: Manipulating Data and Extracting Components 295
Creating Data for Complex Analysis 295
Data Frames 296
Matrix Objects 299
Creating and Setting Factor Data 300
Making Replicate Treatment Factors 304
Adding Rows or Columns 306
Summarizing Data 312
Simple Column and Row Summaries 312
Complex Summary Functions 313
The rowsum() Command 314
The apply() Command 315
Using tapply() to Summarize Using a Grouping Variable 316
The aggregate() Command 319
Summary 323
Chapter 10: Regression (Li near Modeling) 327
Simple Linear Regression 328
Linear Model Results Objects 329
Coefficients 330
Fitted Values 330
Residuals 330
Formula 331
Best-Fit Line 331
Similarity between lm() and aov() 334
Multiple Regression 335
Formulae and Linear Models 335
Model Building 337
Adding Terms with Forward Stepwise Regression 337
Removing Terms with Backwards Deletion 339
Comparing Models 341
Curvilinear Regression 343
Logarithmic Regression 344
Polynomial Regression 345
Plotting Linear Models and Curve Fitting 347
Best-Fit Lines 348
Adding Line of Best-Fit with abline() 348
Calculating Lines with fitted() 348
Producing Smooth Curves using spline() 350
Confidence Intervals on Fitted Lines 351
Summarizing Regression Models 356
Diagnostic Plots 356
Summary of Fit 357
Summary 359
Chapter 11: More About Graphs 363
Adding Elements to Existing Plots 364
Error Bars 364
Using the segments() Command for Error Bars 364
Using the arrows() Command to Add Error Bars 368
Adding Legends to Graphs 368
Color Palettes 370
Placing a Legend on an Existing Plot 371
Adding Text to Graphs 372
Making Superscript and Subscript Axis Titles 373
Orienting the Axis Labels 375
Making Extra Space in the Margin for Labels 375
Setting Text and Label Sizes 375
Adding Text to the Plot Area 376
Adding Text in the Plot Margins 378
Creating Mathematical Expressions 379
Adding Points to an Existing Graph 382
Adding Various Sorts of Lines to Graphs 386
Adding Straight Lines as Gridlines or Best-Fit Lines 386
Making Curved Lines to Add to Graphs 388
Plotting Mathematical Expressions 390
Adding Short Segments of Lines to an Existing Plot 393
Adding Arrows to an Existing Graph 394
Matrix Plots (Multiple Series on One Graph) 396
Multiple Plots in One Window 399
Splitting the Plot Window into Equal Sections 399
Splitting the Plot Window into Unequal Sections 402
Exporting Graphs 405
Using Copy and Paste to Move a Graph 406
Saving a Graph to a File 406
Windows 406
Macintosh 406
Linux 406
Using the Device Driver to Save a Graph to Disk 407
PNG Device Driver 407
PDF Device Driver 407
Copying a Graph from Screen to Disk File 408
Making a New Graph Directly to a Disk File 408
Summary 410
Chapter 12: Writing Your Own Scripts: Beginning to Program 415
Copy and Paste Scripts 416
Make Your Own Help File as Plaintext 416
Using Annotations with the # Character 417
Creating Simple Functions 417
One-Line Functions 417
Using Default Values in Functions 418
Simple Customized Functions with Multiple Lines 419
Storing Customized Functions 420
Making Source Code 421
Displaying the Results of Customized Functions and Scripts 421
Displaying Messages as Part of Script Output 422
Simple Screen Text 422
Display a Message and Wait for User Intervention 424
Summary 428
Appendix: Answers to Exerci ses 433
Index 461
Overview
R is fast becoming the de facto standard for statistical computing and analysis in science, business, engineering, and related fields. This book examines this complex language using simple statistical examples, showing how R operates in a user-friendly context. Both students and workers in fields that require extensive statistical analysis will find this book helpful as they learn to use R for simple summary ...