This is a book about the scientific process and how you apply it to data in ecology. You will learn how to plan for data collection, how to assemble data, how to analyze data and finally how to present the results. The book uses Microsoft Excel and the powerful Open Source R program to carry out data handling as well as producing graphs. Statistical approaches covered include: data exploration; tests for difference - t-test and U-test; correlation - Spearman's rank test and Pearson product-moment; association including Chi-squared tests and goodness of fit; multivariate testing using analysis of variance (ANOVA) and Kruskal-Wallis test; and multiple regression. Key skills taught in this book include: how to plan ecological projects; how to record and assemble your data; how to use R and Excel for data analysis and graphs; how to carry out a wide range of statistical analyses including analysis of variance and regression; how to create professional looking graphs; and how to present your results. New in this edition: a completely revised chapter on graphics including graph types and their uses, Excel Chart Tools, R graphics commands and producing different chart types in Excel and in R; an expanded range of support material online, including; example data, exercises and additional notes & explanations; a new chapter on basic community statistics, biodiversity and similarity; chapter summaries and end-of-chapter exercises. Praise for the first edition: This book is a superb way in for all those looking at how to design investigations and collect data to support their findings. - Sue Townsend, Biodiversity Learning Manager, Field Studies Council [M]akes it easy for the reader to synthesise R and Excel and there is extra help and sample data available on the free companion webpage if needed. I recommended this text to the university library as well as to colleagues at my student workshops on R. Although I initially bought this book when I wanted to discover R I actually also learned new techniques for data manipulation and management in Excel - Mark Edwards, EcoBlogging A must for anyone getting to grips with data analysis using R and excel. - Amazon 5-star review It has been very easy to follow and will be perfect for anyone. - Amazon 5-star review A solid introduction to working with Excel and R. The writing is clear and informative, the book provides plenty of examples and figures so that each string of code in R or step in Excel is understood by the reader. - Goodreads, 4-star review
About the Author
Mark Gardener (www.gardenersown.co.uk) is an ecologist, lecturer, and writer working in the UK. His primary area of research was in pollination ecology and he has worked in the UK and around the word (principally Australia and the United States). Since his doctorate he has worked in many areas of ecology, often as a teacher and supervisor. He believes that ecological data, especially community data, is the most complicated and ill-behaved and is consequently the most fun to work with. He was introduced to R by a like-minded pedant whilst working in Australia during his doctorate. Learning R was not only fun but opened up a new avenue, making the study of community ecology a whole lot easier. He is currently self-employed and runs courses in ecology, data analysis, and R for a variety of organizations. Mark lives in rural Devon with his wife Christine, a biochemist who consequently has little need of statistics.
Read an Excerpt
The planning process is important, as it can save you a lot of time and effort later on.
1.1 The scientific method
Science is a way of looking at the natural world. In short, the process goes along the following lines:
You have an idea about something.
You come up with a hypothesis.
You work out a way of testing this hypothesis/idea.
You collect appropriate data in order to apply a test.
You test the hypothesis and decide if the original idea is supported or rejected.
If the hypothesis is rejected, then the original idea is modified to take the new findings into account.
The process then repeats.
In this way, ideas are continually refined and your knowledge of the natural world is expanded. You can split the scientific process into four parts (more or less): planning, recording, analysing and reporting (summarized in Table 1.1).
1.1.1 Planning stage
This is the time to get the ideas. These may be based on previous research (by you or others), by observation or stem from previous data you have obtained. On the other hand, you might have been given a project by your professor, supervisor or teacher. If you are going to collect new data, then you will determine what data, how much data, when it will be collected, how it will be collected and how it will be analysed, all at this planning stage. Looking at previous research is a useful start as it can tell you how other researchers went about things. If you already have old data from some historic source then you still need to plan what you are going to do with it. You may have to delve into the data to some extent to see what you have – do you have the appropriate data to answer the questions you want answered? It may be that you have to modify your ideas/questions in light of what you have. A hypothesis is a fancy term for a research question. A hypothesis is framed in a certain scientific way so that it can be tested (see more about hypotheses in Section 1.4).
1.1.2 Recording stage
Finally, you get to collect data. The planning step will have determined (possibly with the help of a pilot study) how the data will be collected and what you are going to do with it. The recording stage nevertheless is important because you need to ensure that at the end you have an accurate record of what was done and what data were collected. Furthermore, the data need to be arranged in an appropriate manner that facilitates the analysis. It is often the case, especially with old data, that the researcher has to spend a lot of time rearranging numbers/data into a new configuration before anything can be done. Getting the data layout correct right at the start is therefore important (see more about data layout in Chapter 2).
1.1.3 Analysis stage
The means of undertaking your analysis should have been worked out at the planning stage. The analysis stage is where you apply the statistics and data handling methods that make sense of the numbers collected. Helping to understand data is vastly aided by the use of graphs. As part of the analysis, you will determine if your original hypothesis is supported or not (see more about kinds of analysis in Chapter 5).
1.1.4 Reporting stage
Of course there is some personal satisfaction in doing this work, but the bottom line is that you need to tell others what you did and what you found out. The means of reporting are varied and may be informal, as in a simple meeting between colleagues. Often the report is more formal, like a written report or paper or a presentation at a meeting. It is important that your findings are presented in such a way that your target audience understands what you did, what you found and what it means. In the context of conservation, for example, your research may determine that the current management is working well and so nothing much needs to be done apart from monitoring. On the other hand, you may determine that the situation is not good and that intervention is needed. Making the results of your work understandable is a key skill and the use of graphs to illustrate your results is usually the best way to achieve this. Your audience is much more likely to dwell on a graph than a page of figures and text. You'll see examples of how to report results throughout the text, with a summary in Chapter 13.
1.2 Types of experiment/project
As part of the planning process, you need to be aware of what you are trying to achieve. In general, there are three main types of research:
Differences: you look to show that a is different to b and perhaps that c is different again. These kinds of situations are represented graphically using bar charts and box–whisker plots.
Correlations: you are looking to find links between things. This might be that species a has increased in range over time or that the abundance of species a (or environmental factor a ) affects the abundance of species b. These kinds of situations are represented graphically using scatter plots.
Associations: similar to the above except that the type of data is a bit different, e.g. species a is always found growing in the same place as species b. These kinds of situations are represented graphically using pie charts and bar charts.
Studies that concern whole communities of organisms usually require quite different approaches. The kinds of approach required for the study of community ecology are dealt with in detail in the companion volume to this work (Community Ecology, Analytical Methods Using R and Excel, Gardener 2014).
In this volume you'll see some basic approaches to community ecology, principally diversity and sample similarity (see Chapter 12). The other statistical approaches dealt with in this volume underpin many community studies.
Once you know what you are aiming at, you can decide what sort of data to collect; this affects the analytical approach, as you shall see later. You'll return to the topic of project types in Chapter 5.
1.3 Getting data – using a spreadsheet
A spreadsheet is an invaluable tool in science and data analysis. Learning to use one is a good skill to acquire. With a spreadsheet you are able to manipulate data and summarize it in different ways quite easily. You can also prepare data for further analysis in other computer programs in a spreadsheet. It is important that you formalize the data into a standard format, as you'll see later (in Chapter 2). This will make the analysis run smoothly and allow others to follow what you have done. It will also allow you to see what you did later on (it is easy to forget the details).
Your spreadsheet is useful as part of the planning process. You may need to look at old data; these might not be arranged in an appropriate fashion, so using the spreadsheet will allow you to organize your data. The spreadsheet will allow you to perform some simple manipulations and run some straightforward analyses, looking at means, for example, as well as producing simple summary graphs. This will help you to understand what data you have and what they might show. You'll look at a variety of ways of manipulating data later (see Section 3.2).
If you do not have past data and are starting from scratch, then your initial site visits and pilot studies will need to be dealt with. The spreadsheet should be the first thing you look to, as this will help you arrange your data into a format that facilitates further study. Once you have some initial data (be it old records or pilot data) you can continue with the planning process.
1.4 Hypothesis testing
A hypothesis is your idea of what you are trying to determine. Ideally it should relate to a single thing, so "Japanese knotweed and Himalayan balsam have increased their range in the UK over the past 10 years" makes a good overall aim, but is actually two hypotheses. You should split up your ideas into parts, each of which can be tested separately:
"Japanese knotweed has increased its range in the UK over the past 10 years."
"Himalayan balsam has increased its range in the UK over the past 10 years."
You can think of hypothesis testing as being like a court of law. In law, you are presumed innocent until proven guilty; you don't have to prove your innocence.
In statistics, the equivalent is the null hypothesis. This is often written as H0 (or H0) and you aim to reject your null hypothesis and therefore, by implication, accept the alternative (usually written as H1 or H1).
The H0 is not simply the opposite of what you thought (called the alternative hypothesis, H1) but is written as such to imply that no difference exists, no pattern (I like to think of it as the dull hypothesis). For your ideas above you would get:
"There has been no change in the range of Japanese knotweed in the UK over the past 10 years."
"There has been no change in the range of Himalayan balsam in the UK over the past 10 years."
So, you do not say that the range of these species is shrinking, but that there is no change. Getting your hypotheses correct (and also the null hypotheses) is an important step in the planning process as it allows you to decide what data you will need to collect in order to reject the H0. You'll examine hypotheses in more detail later (Section 5.2).
1.4.1 Hypothesis and analytical methods
Allied to your hypothesis is the analytical method you will use to help test and support (or otherwise) your hypothesis. Even at this early stage you should have some idea of the statistical test you are going to apply. Certain statistical tests are suitable for certain kinds of data and you can therefore make some early decisions. You may alter your approach, change the method of analysis and even modify your hypothesis as you move through the planning stages: this all part of the scientific process. You'll look at ways to choose which statistical test is right for your situation in Section 5.3, where you will see a decision flow-chart (Figure 5.1) and a key (Table 5.1) to help you. Before you get to that stage, though, you will need to think a little more about the kind of data you may collect.
1.5 Data types
Once you have sorted out more or less what your hypotheses are, the next step in the planning process is to determine what sort of data you can get. You may already have data from previous biological records or some other source. Knowing what sort of data you have will determine the sorts of analyses you are able to perform.
In general, you can have three main types of data:
Interval: these can be thought of as "real" numbers. You know the sizes of them and can do "proper" mathematics. Examples would be counts of invertebrates, percentage cover, leaf lengths, egg weights, or clutch size.
Ordinal: these are values that can be placed in order of size but that is pretty much all you can do. Examples would be abundance scales like DAFOR or Domin (named after a Czech botanist). You know that A is bigger than O but you cannot say that one is twice as big as the other (or be exact about the difference).
Categorical (sometimes called nominal data): this is the tricky one because it can be confused with ordinal data. With categorical data you can only say that things are different. Examples would be flower colour, habitat type, or sex.
With interval data, for example, you might count something, keep counting and build up a sample. When you are finished, you can take your list and calculate an average, look to see how much larger the biggest value is from the smallest and so on. Put another way, you have a scale of measurement. This scale might be millimetres or grams or anything else. Whenever you measure something using this scale you can see how it fits into the scheme of things because the interval of your scale is fixed (10 mm is bigger than 5 mm, 4 g is less than 12 g). Compare this to the ordinal scales described below.
With ordinal data you might look at the abundance of a species in quadrats. It may be difficult or time consuming to be exact so you decide to use an abundance scale. The Domin scale shown in Table 1.2, for example, converts percentage cover into a numerical value from 0 to 10.
The Domin scale is generally used for looking at plant abundance and is used in many kinds of study. You can see by looking at Table 1.2 that the different classifications cover different ranges of abundance. For example, a Domin of 8 represents a range of values from about half to three-quarters coverage (51–74%). A value of 6 represents a range from about a quarter to a third coverage (26–33%). The first three divisions of the Domin scale all represent less than 4% coverage but relate to the number of individuals found. The Domin scale is useful because it allows you to collect data efficiently and still permits useful analysis. You know that 10 is a greater percentage coverage than 8 and that 8 is bigger than 6; it is just that the intervals between the divisions are unequal.
There are many other abundance scales, and various researchers have at times worked out useful ways to simplify the abundance of organisms. The DAFOR scale is a general phrase to describe abundance scales that convert abundance into a letter code. There are many examples. Table 1.3 shows a generalized scale for vegetation analysis.
There are other letters that might be used to extend your scale. For example C for "common" might be inserted between A and F (ACFOR is a commonly used ordinal scale). You might add E and/or S for "extremely abundant" and "super abundant". You might also add N for "not found". The DAFOR type of scale can be used for any organism, not just for vegetation.
When you are finished, you can convert your DAFOR scale into numbers (ranks) and get an average, which can be converted to a DAFOR letter, but you cannot tell how much larger the biggest is from the smallest – the interval between the values is inexact.
Many of the abundance scales used are derived from the work of Josias Braun-Blanquet, an eminent Swiss botanist. Table 1.4 shows a basic example of a Braun-Blanquet scale for vegetation cover.
With categorical data it is useful to think of an example. You might go out and look to see what types of insect are visiting different colours of flower. Every time you spot an insect, you record its type (bee, fly, beetle) and the flower colour. At the end you can make a table with numbers of how many of each type visited each colour. You have numbers but each value is uniquely a combination of two categories.
Table 1.5 shows an example of categorical data laid out in what is called a contingency table. The rows are one category (colour) and the columns another category (type of insect).
1.6 Sampling effort
Sampling effort refers to the way you collect data and how much to collect. For example, you have decided that you need to determine the abundance of some plant species in meadows across lowland Britain. How many quadrats will you use? How large will the quadrats need to be? Do you need quadrats at all?
Sample is the term used to describe the set of data that you have. Because you generally cannot measure "everything", you will usually have a subset of stuff that you've measured (or weighed or counted). Think about a field of buttercups as an example. You wish to know how many there are in the field, which is a hectare in size (i.e. 100 m × 100 m). You aren't really going to count them all (that would take too long) so you make up a square that has sides of 1 metre and count how many buttercups there are in that. Now you can estimate how many buttercups there are in the whole field. Your sample is 1/10,000th of the area, which is pretty small. The estimate is not likely to be very good (although by random chance it could be). It seems reasonable to count buttercups in a few more 1 m2 areas. In this way your estimate is likely to get more "on target". Think of it this way: if you carried on and on and on, eventually you would have counted buttercups in every 1 m2 of the field. Your estimate would now be spot on because you would have counted everything. So as you collect more and more data, your estimate of the true number of buttercups will likely become more and more like the true number.
The problem is, how many 1 m2 areas will you have to count in order to get a good estimate of the true number? You will return to this issue a little later. Another problem – where do you put your 1 m2 areas? Will it make a difference? Is a 1 m2 quadrat the right size? You will look at these themes now.
Excerpted from "Statistics for Ecologists Using R and Excel"
Copyright © 2017 Mark Gardener.
Excerpted by permission of Pelagic Publishing.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
2. Data recording
3. Beginning data exploration – using software tools
4. Exploring data – looking at numbers
5. Exploring data – which test is right?
6. Exploring data – using graphs
7. Tests for differences
8. Tests for linking data – correlations
9. Tests for linking data – associations
10. Differences between more than two samples
11. Tests for linking several factors
12. Community ecology
13. Reporting results