 Shopping Bag ( 0 items )

All (15) from $79.65

New (8) from $87.65

Used (7) from $79.65
More About This Textbook
Overview
Statistical Analysis and Modeling of Geographic Information with ArcView GIS is an update to Lee and Wong's Statistical Analysis with ArcView GIS, featuring expanded coverage of classical statistical methods, probability and statistical testing, new student exercises to facilitate classroom use, new exercises featuring interactive ArcView Avenue scripts, and a new overview of compatible spatial analytical functions in ArcGIS 9.0.
Product Details
Related Subjects
Meet the Author
David W. S. Wong, PhD, is Professor and Chair of the Earth Systemsand GeoInformation Sciences Program at George Mason University inFairfax, Virginia.
Jay Lee, PhD, is Professor and Chair of the Department ofGeography at Kent State University in Kent, Ohio.
Read an Excerpt
Statistical Analysis of Geographic Information with ArcView GIS And ArcGIS
By David W. S. Wong
John Wiley & Sons
ISBN: 0471468991Chapter One
INTRODUCTION1.1 WHY STATISTICS AND SAMPLING?
Attempts to understand, explain, estimate, or predict events or phenomena occurring around us often start by simplifying the information we have about them. In many cases, statistics have been devised and used to digest large quantities of information and to provide streamlined and concise impressions of the events or phenomena that we are trying to comprehend. For example, population counts of the 164 cities in Ohio would provide little meaning to us unless we know the largest, smallest, or average size among these cities or the range within which these city population sizes vary. In this case, the maximum, minimum, average, and the range of population counts are among the summary information that is known as statistics because they help to describe how values in a set of numeric information, or data, are distributed.
With this understanding, we can state that, given a set of numeric data, statistics are quantitative measures derived from the data to describe various aspects of the data. If they are classified by their functions, we have descriptive statistics and inferential statistics. Descriptive statistics are calculated from a set of data to describe how the values are distributed within the set of data. For example, the maximum, minimum, range, and average of a set of data are all in this category. Inferential statistics are calculated from sample data for the purpose of making an inference to a population or for making comparisons between sets of data. Depending on the areas of application, classical statistics, or conventional statistics, are generally used in different application areas, such as sociology, political science, medicine, and engineering. But these statistics have been modified and extended to accommodate specific application areas. In this book, we will include a great deal of statistics known as spatial statistics. These statistics are strongly based upon classical statistics but have been extended to work with data that are spatially referenced. Other statistics that are extensions of classical statistics for various application areas include econometrics, psychometrics, biostatistics, geostatistics, and several others. Certain statistics discussed in this book are sometimes classified as geostatistics, which originated in geoscience.
With statistics, an analysis can be performed to understand how data values concentrate or disperse around certain values, how they are compared with each other or with another set of data, or whether they are just subsets of a larger set of data. When analyzing data statistically, each observation should be independent so that its values or data are not dependent on, or tied to, values of other observations in the same data set. This independence assumption is one of the most fundamental assumptions in statistical analysis. Unfortunately, it is often violated for data collected to describe events or phenomena that are spatially referenced. This is because, in many geographic events or phenomena, what happened in a location is highly correlated to what happened in its surroundings. Because of this characteristic of spatially referenced data, much of our discussion in this book will focus on how statistics and associated methods can be modified to analyze spatially referenced data.
When one attempts to answer a scientific question, one will rarely draw conclusions based on just one or a few observations. For instance, if there was one case or a few cases of malaria in a community, can we say that there is a real epidemic or should we treat those occurrences as accidents or events occurring by chance? To take another example, can we conclude that the soil in a farm has lost its fertility if the farmer harvested much less crop this year than last year? Could the decline in yield be a onetime event or a shortterm fluctuation? Will this happen again next year? Is soil fertility the only factor determining the amount of crop yield? Before any conclusions can be drawn, we need to understand the nature of these events or occurrences. In other words, when a certain phenomenon occurs, it may be due to a random process or a systematic process. We have to determine if the process is random or systematic. If an event or a phenomenon is triggered by a random process, there may not be much that we can do to identify its underlying cause in order to explain why the event or phenomenon happens the way that it does. But if it is part of a systematic process, the numeric or spatial patterns will be interesting to study and explore. As the first step in understanding these processes, statistical analysis is usually the tool used to help us decide if the events are random or not.
Using the soil example again, if we suspect that the soil fertility of a farm is low, and if this suspicion is based on some observations made in this farm, we are essentially formulating a hypothesis. This hypothesis can be tested to see if it should be rejected. To test this hypothesis, we would need to gather more information or data about the soil. Instead of just focusing on a small plot in the farm, we may want to examine different plots around the field for soil fertility levels. For a more rigorous study, we may want to drill holes at various locations in the field to collect soil samples to conduct a soil chemical analysis in a laboratory. By selecting different locations in the field to drill holes, we are essentially collecting a sample of soil for further examination instead of examining the entire population, which will require us to examine every location in the field. Each examined location can be regarded as an observation or a case in the sample, and the number of observations selected is known as the sample size. Similarly, by examining the same location over time and treating each examination of that location as an observation, we are collecting these observations from a population along the temporal dimension. Finally, the measured value from an observation is normally referred to as a data value. When there is a set of such values, they are referred to as a dataset.
After a sample of soil is assembled from different locations, a chemical analysis can be conducted to evaluate the levels of different chemicals, such as phosphorus, nitrogen, and potassium, in the sample. A measurement of each chemical can be derived by examining all observations in the sample, such as on average 30 mg of nitrogen per 1 kg of soil. This measurement is then a statistic, because it is derived from all the observations in the sample. If the datagathering process covers the entire population, a similar measurement is derived from that process. This measurement is then known as a parameter. For instance, in the U.S. decennial census, certain questions were asked of all individuals in the United States in principle (we know that some people were missed due to the difficulty of reaching themsocalled undercounting). The measurements derived from those questions are parameters.
When analyzing a sample, a logical question to ask is, why should we examine a sample but not the entire population? Isn't it more accurate to enumerate the entire population? Of course, we would prefer to survey the entire population if we could. But often it is impossible and/or impractical for one or more of the following reasons:
1. The population is too large to be enumerated completely.
2. The cost of enumerating the entire population may be prohibitive.
3. The study requires a quick turnaround time, and studying the entire population may take too long.
4. If the enumeration process requires destroying the observations, such as in certain processes of quality control, then a full enumeration will destroy the entire population.
Using the soil study example again, it is impossible to evaluate the fertility level of every cubic foot of soil in the field for a complete examination. It is also very expensive and will take much too long to get the full result. In addition, if a hole is drilled for every location to gather the soil, there will be no soil left in the field. Therefore, sampling is often used instead of examining the entire population. For this reason, studying statistics becomes necessary.
The statistics on chemical levels that are generated from the soil sample may offer descriptive information about the condition of the soil in the field, including a numerical distribution of the chemical levels. Therefore, these statistics are regarded as descriptive statistics. How accurate the statistics are in describing the distribution of chemical levels in the entire field or in describing the population is dependent upon many factors. We know that these statistics will never be 100% accurate (since they are not from a complete survey of every inch of soil in the farm) and that the level of accuracy is dependent upon how representative the sample is of the population.
Fortunately, procedures have been developed, based on random processes, to allow us to draw conclusions on whether a sample is a reliable representative of the population or not. This process of drawing a conclusion about a population based on information derived from a sample is known as inference. The process of drawing an inference normally includes
1. formulating one or more hypotheses,
2. collecting relevant data by making observations,
3. computing descriptive or test statistics, and
4. deciding if the hypothesis should be rejected based on the computed statistics.
If sampling is desirable or preferred because an exhaustive survey of the population is not possible, then the sampling process should be carefully considered. But how should one select sample observations from the population? There are two general sampling schemes one may adopt: random sampling and systematic sampling. Random sampling is the process of selecting observations randomly from the population without any specific predefined structure or rules. Often, random numbers are used to assist the selection process. For example, items in an ordered set of objects are selected as samples if their positions correspond to those assigned by the random numbers. Alternatively, all objects in the set can be mixed up randomly before selection.
In contrast to random sampling, systematic sampling is the process of selecting observations based on certain rules developed according to certain principles. These principles are based on the objective(s) of the studies. Often one would like to adopt a sampling principle to cover the entire spectrum of the population. For instance, one may select every fifth observation from an ordered list of objects or select the households at the northwest corner of every street block in the city.
But sometimes a study may want to emphasize a specific segment or segments of the population, such as minority groups in the general population. For this purpose, sampling can be set up so that a particular minority group is sampled more than other groups. However, this should be done only with careful consideration of what the sample may represent and how it may affect the results because it is possible that those segments may be oversampled.
Within the two general sampling schemes, additional variations of the sampling process have been developed. For instance, observations sharing certain common characteristics may be grouped into different strata. With objects in different strata or groups, either random or systematic sampling can be performed within each stratum or group. This is called stratified sampling.
For example, selecting 30 cities from the 164 cities in Ohio may be performed in several ways. In random sampling, all 164 cities may be ordered or ranked by their population sizes. Next, we can select cities if their ordered positions match the first 20 random numbers from a random number table. Or we can select every 8th city until we have selected 20 from the list of 164 Ohio cities to perform the systematic sampling. Finally, we can use stratified sampling by first dividing the 164 cities into four groups based on their locations in northeast, northwest, southeast, or southwest Ohio and then selecting either randomly or systematically, 5 cities from each of the four groups to ensure that the sampled cities provide a good representation of Ohio cities over the entire state.
If the sampling of observations involves objects that have geographic references, more variations are needed to accommodate the geographic dimension. The sampling scheme that is designed to accommodate the sampling of observations in the geographic space is called spatial sampling. A good summary is available for further reading in Griffith and Amrhein (1991, pp. 215).
In the spatial sampling framework, locations are randomly selected to perform random sampling. When this process is implemented in a computer environment or with a Geographic Information System (GIS), the random locations are usually defined by the xy coordinates taken from two sets of random numbers, as shown in Figure 1.1a. If the xcoordinates and ycoordinates are randomly determined, the resulting points defined by these xy pairs are thought to be randomly distributed. In its simplest form, systematic sampling selects regularly spaced locations to ensure complete coverage of the entire study area, such as the structure shown in Figure 1.1b. Note that the distances between adjacent points are kept the same or approximately the same along the x and ydirections only, not along the diagonals. If one prefers a spatial systematic sampling framework with observations regularly spaced, but with equal distances to their nearest neighbors, then the structure will be a triangular lattice, which resembles a hexagonal structure.
With these two general schemes of spatial sampling, we can create more variations. For example, we can combine random sampling with systematic sampling so that the geographical space is divided systematically but sampling is done randomly within each partitioned region. Of course, the partition of the geographical space should be mutually exclusive and collectively exhaustive. Figure 1.1c combines the systematic and random sampling frameworks by first dividing the entire region into subregions and then randomly selecting a point within each subregion.
One final note about spatial sampling is that our sampling unit so far is limited to locations in space, or points. There are, in fact, alternative sampling units. For example, Griffith and Amrhein (1991, p. 215) reviewed two other types of sampling units: linear units or traverses and areas. Sampling by areas will be discussed in Chapter 6, which deals with point pattern analysis. When Quadrat Analysis is used to analyze point patterns, the sampling areal units are known as quadrats.
1.2 WHAT ARE SPECIAL ABOUT SPATIAL DATA?
Techniques for statistical analysis have been very well developed and are widely used in many research fields and practical applications. However, most of the statistical techniques and models were developed not for observations with explicit geographic referencing information, but rather for data most likely compiled by selecting sample observations randomly from the population. When conventional statistical methods are used to analyze data derived from these observations, it is assumed that these observations and associated data can be considered independent. But for spatial data gathered from nearby observations or within the study region, these data tend to be related to each other. Thus, we cannot assume that observations are independent of each other. For this reason, using conventional statistical methods to analyze spatial data derived from these observations may cause problems.
(Continues...)
Table of Contents
PREFACE.
ACKNOWLEDGMENTS.
1 INTRODUCTION.
1.1 Why Statistics and Sampling?
1.2 What Are Special about Spatial Data?
1.3 Spatial Data and the Need for Spatial Analysis/Statistics.
1.4 Fundamentals of Spatial Analysis and Statistics.
1.5 ArcView Notesâ€”Data Model and Examples.
PART I: CLASSICAL STATISTICS.
2 DISTRIBUTION DESCRIPTORS: ONE VARIABLE(UNIVARIATE).
2.1 Measures of Central Tendency.
2.2 Measures of Dispersion.
2.3 ArcView Examples.
2.4 Higher Moment Statistics.
2.5 ArcView Examples.
2.6 Application Example.
2.7 Summary.
3 RELATIONSHIP DESCRIPTORS: TWO VARIABLES(BIVARIATE).
3.1 Correlation Analysis.
3.2 Correlation: Nominal Scale.
3.3 Correlation: Ordinal Scale.
3.4 Correlation: Interval /Ratio Scale.
3.5 Trend Analysis.
3.6 ArcView Notes.
3.7 Application Examples.
4 HYPOTHESIS TESTERS.
4.1 Probability Concepts.
4.2 Probability Functions.
4.3 Central Limit Theorem and Confidence Intervals.
4.4 Hypothesis Testing.
4.5 Parametric Test Statistics.
4.6 Difference in Means.
4.7 Difference Between a Mean and a Fixed Value.
4.8 Significance of Pearsonâ€™s Correlation Coefficient.
4.9 Significance of Regression Parameters.
4.10 Testing Nonparametric Statistics.
4.11 Summary.
PART II: SPATIAL STATISTICS.
5 POINT PATTERN DESCRIPTORS.
5.1 The Nature of Point Features.
5.2 Central Tendency of Point Distributions.
5.3 Dispersion and Orientation of Point Distributions.
5.4 ArcView Notes.
5.5 Application Examples.
6 POINT PATTERN ANALYZERS.
6.1 Scale and Extent.
6.2 Quadrat Analysis.
6.3 Ordered Neighbor Analysis.
6.4 KFunction.
6.5 Spatial Autocorrelation of Points.
6.6 Application Examples.
7 LINE PATTERN ANALYZERS.
7.1 The Nature of Linear Features: Vectors and Networks.
7.2 Characteristics and Attributes of Linear Features.
7.3 Directional Statistics.
7.4 Network Analysis.
7.5 Application Examples.
8 POLYGON PATTERN ANALYZERS.
8.1 Introduction.
8.2 Spatial Relationships.
8.3 Spatial Dependency.
8.4 Spatial Weights Matrices.
8.5 Spatial Autocorrelation Statistics and Notations.
8.6 Joint Count Statistics.
8.7 Spatial Autocorrelation Global Statistics.
8.8 Local Spatial Autocorrelation Statistics.
8.9 Moran Scatterplot.
8.10 Bivariate Spatial Autocorrelation.
8.11 Application Examples.
8.12 Summary.
APPENDIX: ArcGIS Spatial Statistics Tools.
ABOUT THE CDROM.
INDEX.