**From the Publisher**

*Mathematics for the Life Sciences*makes it a refreshing new entry into the world of bioscience textbooks."

**—George Pryn Ford,**

*The Biologist*
Spend $25, Get

Free Shipping

Free Shipping

×

**Uh-oh, it looks like your Internet Explorer is out of date.**

For a better shopping experience, please upgrade now.

The life sciences deal with a vast array of problems at different spatial, temporal, and organizational scales. The mathematics necessary to describe, model, and analyze these problems is similarly diverse, incorporating quantitative techniques that are rarely taught in standard undergraduate courses. This textbook provides an accessible introduction to these

- ISBN-13:
- 9781400852772
- Publisher:
- Princeton University Press
- Publication date:
- 08/17/2014
- Sold by:
- Barnes & Noble
- Format:
- NOOK Book
- Pages:
- 640
- Sales rank:
- 1,131,331
- File size:
- 49 MB
- Note:
- This product may take a few minutes to download.

All rights reserved.

ISBN: 978-1-4008-5277-2

CHAPTER 1

**Basic Descriptive Statistics**

**1.1 Types of Biological Data**

Any observation or experiment in biology involves the collection of information, and this may be of several general types:

*Data on a Ratio Scale*

Consider measuring heights of plants. The difference in height between a 20-cm-tall plant and a 24-cm-tall plant is the same as that between a 26-cm-tall plant and a 30-cm-tall plant. These data have a "constant interval size." They also have a true zero point on the measurement scale, so that ratios of measurements make sense (e.g., it makes sense to state that one plant is three times as tall as another). A measurement scale that has constant interval size and a true zero point is called a "ratio scale." For example, this applies to measurements of weights (mg, kg), lengths (cm, m), volumes (cc, cu m), and lengths of time (s, min).

*Data on an Interval Scale*

Measurements with an interval scale but having no true zero point are of this type. Examples are temperatures measured in Celsius or Fahrenheit: it makes no sense to say that 40 degrees is twice as hot as 20 degrees. Absolute temperatures, however, are measured on a ratio scale.

*Data on an Ordinal Scale*

Data that can be ordered according to some measurements are on an ordinal scale. Examples would be rankings based on size of objects, the speed of an individual relative to another individual, the depth of the orange hue of a shirt, and so on. In some cases (e.g., size), there may be an underlying ratio scale, but if all that is provided is a ranking of individuals (e.g., you are told only that tomato genotype A is larger than tomato genotype B, not how much larger), there is a loss of information if we are given only the ranking on an ordinal scale. Quantitative comparisons are not possible on an ordinal scale (how can one say that one shirt is half as orange as another?).

*Data on a Nominal Scale*

When a measurement is classified by an attribute rather than by a quantitative, numerical measurement, then it is on a nominal scale (male or female; genotype AA, Aa or aa; in the taxa *Pinus* or in the taxa *Abies*; etc.). Often, these are called categorical data because you classify the data elements according to their category.

*Continuous vs. Discrete Data*

When a measurement can take on any conceivable value along a continuum, it is called continuous. Weight and height are continuous variables. When a measurement can take on only one of a discrete list of values, it is discrete. The number of arms on a starfish, the number of leaves on a plant, and the number of eggs in a nest are all discrete measurements.

**1.2 Summary of Descriptive Statistics of Data Sets**

Any time a data set is summarized by its statistical information, there is a loss of information. That is, given the summary statistics, there is no way to recover the original data. Basic summary statistics may be grouped as

(i) measures of central tendency (giving in some sense the central value of a data set) and

(ii) measures of dispersion (giving a measure of how spread out that data set is).

*Measures of Central Tendency*

*Arithmetic Mean (the average)*

If the data collected as a sample from some set of observations have values *x*1, *x*2, ..., *xn*, then the mean of this sample (denoted by [bar.x]) is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Note the use of the [summation] notation in the above expression, that is,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

*Median*

The median is the middle value: half the data fall above this and half below. In some sense, this supplies less information than the mean since it considers only the ranking of the data, not how much larger or smaller the data values are. But the median is less affected than the mean by "outlier" points (e.g., a really large measurement or data value that skews the sample). The LD 50 is an example of a median: the median lethal dose of a substance (half the individuals die after being given this dose, and half survive). For a list of data *x*1, *x*2, ..., *xn*, to find the median, list these in order from smallest to largest. This is known as "ranking" the data. If n is odd, the median is the number in the 1 + *n*-1/2 place on this list. If *n* is even, the median is the average of the numbers in the *n*/2 and 1 + *n*/2 positions on this list.

Quartiles arise when the sample is broken into four equal parts (the right end point of the 2nd quartile is the median), quintiles when five equal parts are used, and so on.

*Mode*

The mode is the most frequently occurring value (or values; there may be more than one) in a data set.

*Midrange*

The midrange is the value halfway between the largest and smallest values in the data set. So, if *x*min and *x*max are the smallest and largest values in the data set, then the midrange is

[bar.x]mid = xmin + xmax/2.

*Geometric Mean*

The geometric mean of a set of *n* data is the *n*th root of the product of the *n* data values,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The geometric mean arises as an appropriate estimate of growth rates of a population when the growth rates vary through time or space. It is always less than the arithmetic mean. (The arithmetic mean and the geometric mean are equal if all the data have the same value.)

*Harmonic Mean*

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the data,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

It also arises in some circumstances as the appropriate overall growth rate when rates vary.

*Measures of Dispersion*

*Range*

The range is the largest minus the smallest value in the data set: *x*max - *x*min. This does not account in any way for the manner in which data are distributed across the range.

*Variance*

The variance is the mean sum of the squares of the deviations of the data from the arithmetic mean of the data. The *best* estimate of this (take a good statistics class to find out how *best* is defined) is the sample variance, obtained by taking the sum of the squares of the differences of the data values from the sample mean and dividing this by the number of data points minus one,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where *n* is the number of data points in the data set, *xi* is the *i*th data point in the data set *x,* and [bar.*x*] is the arithmetic mean of the data set *x.*

**Standard Deviation**

The variance has square units, so it is usual to take its square root to obtain the standard deviation,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which has the same units as the original measurements. The higher the standard deviation *s*, the more dispersed the data are around the mean.

Both the variance and the standard deviation have values that depend on the measurement scale used. So measuring body weights of newborns in grams will produce much higher variances than if the same newborns were measured in kilograms. To account for the measurement scale, it is typical to use the coefficient of variability (sometimes called the coefficient of variance): the standard deviation divided by the arithmetic mean, which is dimensionless and has no units. This coefficient of variability is thus independent of the measurement scale used.

*Dispersion over Nominal Scale Data and the Simpson Index*

All the above measures of dispersion apply to ratio scale data. For nominal scale data, there is no mean or variance that makes sense, but there certainly can be a measure of how spread out the data are among the various categories, a concept called diversity. In ecology, the two main factors taken into account when measuring diversity are richness and evenness. Species richness is the number of different species present, while evenness is a measure of the relative abundance of the different species making up the richness of an area. The area has uneven diversity if virtually all the individuals found are of one species with only rare individuals of the other species. The area has even diversity if all species have the same abundances. Simpson's index of diversity (SID) is one of several diversity indices. The SID represents the probability that two individuals randomly selected from a sample will belong to different species. In a certain area or sample, let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where *ni* is the number of individuals in species *i, N* is the total number of individuals, and *S* is the number of species. Then, the SID is

SID = 1 - D.

When SID is close to 1, the sample is considered to be highly diverse.

**1.3 Matlab Skills**

If you are not familiar with the software Matlab, review "Getting Started with Matlab" in Appendix A.

*Entering Data Sets in Matlab*

In Matlab, data sets are entered as arrays, and arrays are denoted with square brackets: [ ]. If we wanted to enter the trees per hectare data from Example 1.2, we would type

[13.3 13.5 24.6 18.7 10.9]

into Matlab. Notice that the data points in the set are separated by spaces. If we want to refer back to this data set using Matlab, we need to name the data set. In Example 1.2, we called the data set x. To call the data set x in Matlab, we type

x = [13.3 13.5 24.6 18.7 10.9]

into Matlab. Now, whenever we want to refer back to our data set, we can just use x instead of typing the entire data set again.

*Calculating Descriptive Statistics in Matlab*

Now that we know how to enter our data sets into Matlab, we can use Matlab to quickly compute basic descriptive statistics. Table 1.1 shows the commands for the descriptive statistics described earlier in this chapter.

Each of the commands in Table 1.1 returns its corresponding answer and names the answer ans. If we wish to save the answer for future use, we must name the output of the command. For example, if we wish to save the arithmetic mean, we can type

xbar = mean(x)

into Matlab. If you are typing this into the command window, you will see that the value that is returned is named xbar.

Notice there are no commands for calculating the range or the midrange. We can calculate these, however, by using the min and max commands. To calculate the midrange, we use

(min(x)+max(x))/2

and to calculate the range, we use

max(x)-min(x)

As an example, suppose we wanted to calculate the mean, median, mode, midrange, geometric mean, harmonic mean, range, variance, and standard deviation for the data set in Example 1.1.

The following shows the input typed into the command window (always proceeded by ») and its corresponding output:

[ILLUSTRATION OMITTED]

**1.4 Exercises**

1.1 The capacity for physical exercise (in seconds) was determined for each of 11 patients who were being treated for chronic heart failure.

906 1320 711 1170 684 1200 837 1056 897 882 1008

(a) Determine the mean and the median of the data.

(b) Determine the geometric and harmonic means of the data.

(c) How do the three different measures of the mean differ?

1.2 Daily crude oil output (in millions of barrels) for the U.S. is shown below for the years 1971 to 1990.

9.45 9.40 9.25 8.75 8.30 8.10 8.25 8.70 8.55 8.60

8.55 8.65 8.70 8.70 8.91 8.60 8.20 7.70 7.20 6.75

Compute the mean, median, and mode for the data.

1.3 Suppose the scale of a data set is changed by multiplying each measurement by a positive constant. How would this affect the mean, median, mode, and range?

1.4 Ten hospital employees on a standard American diet agreed to adopt a vegetarian diet for 1 month. Below is the change in the serum cholesterol level (before ? after).

49 -10 27 13 36

19 48 21 8 16

(a) Compute the median and mean change in cholesterol.

(b) Compute the range, variance, and standard deviation of the data. Are the data fairly spread out or close together?

1.5 Twelve sheep were fed pingue (a toxin-producing weed of the southwestern United States) as a part of an experiment and died as a result. The time of death in hours after the ingestion of pingue for each sheep follows:

44 27 24 24 36 36

44 120 29 36 36 36

Compute the range, variance, and standard deviation of the sample.

1.6 The National Weather Service reports data on the number of hurricanes to strike the United States in decades in the last century (using the Saffir-Simpson category). Calculate the mean of the number of hurricanes per decade.

1.7 Consider these two sets of data [70]:

*A* = {0, 5, 10, 15, 25, 30, 35, 40, 45, 50, 71, 72, 73, 74, 75, 76, 77, 78, 100}

*B* = {0, 22, 23, 24, 25, 26, 27, 28, 29, 50, 55, 60, 65, 70, 75, 85, 90, 95, 100}

For both sets of data, calculate the range, median, the first quartile, and the third quartile. Do these values adequately represent the distribution in each data set?

1.8 Suppose the mean score on a national test is 400 with a standard deviation of 50. If each score is increased by 25, what are the new mean and standard deviation?

1.9 Suppose the mean score on a national test is 400 with a standard deviation of 50. If each score is increased by 25%, what are the new mean and standard deviation?

1.10 Use the following simple data set to calculate the SID for these trees in a particular plot [18]. Interpret your results as a probability

1.11 Below are some data from the Citizen Science program in the Great Smoky Mountains National Park that record the species of salamanders observed in a particular area in 2000 [18]. Calculate the SID for salamanders in this area using these data.

Excerpted fromMathematics for the Life SciencesbyErin N. Bodine, Suzanne Lenhart, Louis J. Gross. Copyright © 2014 Princeton University Press. Excerpted by permission of PRINCETON UNIVERSITY PRESS.

All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.

Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

**Erin N. Bodine** is assistant professor of mathematics at Rhodes College. **Suzanne Lenhart** is Chancellor's Professor of Mathematics at the University of Tennessee. **Louis J. Gross** is Distinguished Professor of Ecology and Evolutionary Biology and Mathematics at the University of Tennessee.

Average Review: