Managing Data Using Excel

Microsoft Excel is a powerful tool that can transform the way you use data. This book explains in comprehensive and user-friendly detail how to manage, make sense of, explore and share data, giving scientists at all levels the skills they need to maximize the usefulness of their data.

Readers will learn how to use Excel to:
* Build a dataset – how to handle variables and notes, rearrangements and edits to data.
* Check datasets – dealing with typographic errors, data validation and numerical
errors.
* Make sense of data – including datasets for regression and correlation; summarizing data with averages and variability; and visualizing data with graphs, pivot charts and sparklines.
* Explore regression data – finding, highlighting and visualizing correlations.
* Explore time-related data – using pivot tables, sparklines and line plots.
* Explore association data – creating and visualizing contingency tables.
* Explore differences – pivot tables and data visualizations including box-whisker plots.
* Share data – methods for exporting and sharing your datasets, summaries and
graphs.

Alongside the text, Have a Go exercises, Tips and Notes give readers practical experience and highlight important points, and helpful self-assessment exercises and summary tables can be found at the end of each chapter. Supplementary material can also be downloaded on the companion website.

Managing Data Using Excel is an essential book for all scientists and students who use data and are seeking to manage data more effectively. It is aimed at scientists at all levels but it is especially useful for university-level research, from undergraduates to postdoctoral researchers.

1121063588
Managing Data Using Excel

Microsoft Excel is a powerful tool that can transform the way you use data. This book explains in comprehensive and user-friendly detail how to manage, make sense of, explore and share data, giving scientists at all levels the skills they need to maximize the usefulness of their data.

Readers will learn how to use Excel to:
* Build a dataset – how to handle variables and notes, rearrangements and edits to data.
* Check datasets – dealing with typographic errors, data validation and numerical
errors.
* Make sense of data – including datasets for regression and correlation; summarizing data with averages and variability; and visualizing data with graphs, pivot charts and sparklines.
* Explore regression data – finding, highlighting and visualizing correlations.
* Explore time-related data – using pivot tables, sparklines and line plots.
* Explore association data – creating and visualizing contingency tables.
* Explore differences – pivot tables and data visualizations including box-whisker plots.
* Share data – methods for exporting and sharing your datasets, summaries and
graphs.

Alongside the text, Have a Go exercises, Tips and Notes give readers practical experience and highlight important points, and helpful self-assessment exercises and summary tables can be found at the end of each chapter. Supplementary material can also be downloaded on the companion website.

Managing Data Using Excel is an essential book for all scientists and students who use data and are seeking to manage data more effectively. It is aimed at scientists at all levels but it is especially useful for university-level research, from undergraduates to postdoctoral researchers.

36.0 In Stock
Managing Data Using Excel

Managing Data Using Excel

by Mark Gardener
Managing Data Using Excel

Managing Data Using Excel

by Mark Gardener

eBook

$36.00 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

Microsoft Excel is a powerful tool that can transform the way you use data. This book explains in comprehensive and user-friendly detail how to manage, make sense of, explore and share data, giving scientists at all levels the skills they need to maximize the usefulness of their data.

Readers will learn how to use Excel to:
* Build a dataset – how to handle variables and notes, rearrangements and edits to data.
* Check datasets – dealing with typographic errors, data validation and numerical
errors.
* Make sense of data – including datasets for regression and correlation; summarizing data with averages and variability; and visualizing data with graphs, pivot charts and sparklines.
* Explore regression data – finding, highlighting and visualizing correlations.
* Explore time-related data – using pivot tables, sparklines and line plots.
* Explore association data – creating and visualizing contingency tables.
* Explore differences – pivot tables and data visualizations including box-whisker plots.
* Share data – methods for exporting and sharing your datasets, summaries and
graphs.

Alongside the text, Have a Go exercises, Tips and Notes give readers practical experience and highlight important points, and helpful self-assessment exercises and summary tables can be found at the end of each chapter. Supplementary material can also be downloaded on the companion website.

Managing Data Using Excel is an essential book for all scientists and students who use data and are seeking to manage data more effectively. It is aimed at scientists at all levels but it is especially useful for university-level research, from undergraduates to postdoctoral researchers.


Product Details

ISBN-13: 9781784270094
Publisher: Pelagic Publishing
Publication date: 03/16/2015
Series: Research Skills
Sold by: Barnes & Noble
Format: eBook
Pages: 326
File size: 10 MB

About the Author

Mark Gardener (www.gardenersown.co.uk) is an ecologist, lecturer, and writer working in the UK. His primary area of research was in pollination ecology and he has worked in the UK and around the word (principally Australia and the United States). Since his doctorate he has worked in many areas of ecology, often as a teacher and supervisor. He believes that ecological data, especially community data, is the most complicated and ill-behaved and is consequently the most fun to work with. He was introduced to R by a like-minded pedant whilst working in Australia during his doctorate. Learning R was not only fun but opened up a new avenue, making the study of community ecology a whole lot easier. He is currently self-employed and runs courses in ecology, data analysis, and R for a variety of organizations. Mark lives in rural Devon with his wife Christine, a biochemist who consequently has little need of statistics.


Mark Gardener began his career as an optician but returned to science and trained as an ecologist. His research is in the area of pollination ecology. He has worked extensively in the UK as well as Australia and the United States. Currently he works as an associate lecturer for the Open University and also runs courses in data analysis for ecology and environmental science.

Read an Excerpt

CHAPTER 1

ARRANGING YOUR DATA

All data are important. At the least they are important to you, as you've invested time and effort in collecting the data. Your data may well be important to others as well, it doesn't matter whether you are doing a high school project, a PhD or government research, the data you collect are important. You will use these data to help you make sense of your project and they may also be shared and presented to others. It is therefore important that your data are understandable by others. You may well take a break from your project so it is also helpful if your data are understandable by you when you return at some future date!

You may have spent a considerable time planning your work and deciding how to collect the data. You can also spend a lot of time collecting data so it is important to take care of them. The scientific process is a cyclical one and generally involves several stages:

• Planning.

• Data collection and recording.

• Analysis.

• Reporting and moving on.

You should have spent some time during the planning process to determine various aspects of your data:

• What data to collect.

• How to collect the data.

• How much data to collect.

• How to record the data.

• How to analyze and present the data.

In this book you will learn how to make best use of your data. The way you record your data underpins all your data management. It is easy to underestimate the importance of this aspect. Good data management can:

• Save time.

• Save money.

• Save effort.

• Reduce errors.

Good data management also means that you are able to add to your data at a later stage with minimal fuss.

You'll also learn how to explore your data and get some insights into the patterns that may (or not) exist in your carefully collected data. This aspect is sometimes called data mining, and can be a useful way to look for patterns and trends.

So, the way you arrange your data is of fundamental importance to your ability to utilize it. In the next section you'll see some examples of how you might set about arranging your data.

1.1 SYSTEMS FOR DATA LAYOUT

The way you arrange your data should be part of your general scientific approach. Science is a way of looking at the natural world. In short, the scientific process goes along the following lines:

1. You have an idea about something.

2. You come up with a hypothesis.

3. You work out a way of testing this hypothesis.

4. You collect appropriate data in order to apply a test.

5. You test the hypothesis and decide if the original idea is supported or rejected.

6. If the hypothesis is rejected, then the original idea is modified to take the new findings into account.

7. The process then repeats.

In this way, ideas are continually refined and our knowledge of the natural world is expanded. You can split the scientific process into four parts (more or less): planning, recording, analysing and reporting.

Planning: This is the stage where you work out what you are going to do: formulate your ideas, undertake background research, decide on your hypothesis and determine a method of collecting the appropriate data and a means by which the hypothesis may be tested. This is the stage where you should be thinking about how to arrange your data to make maximum use of them.

Recording: The means of data collection should be determined at the planning stage although you may undertake a small pilot study to see if it works. After the pilot stage you may return to the planning stage and refine the methodology. You collect and arrange your data in a manner that allows you to begin the analysis. The arrangement of your data should help you to check it for errors and also to add extra information at a later point. Good data layout also facilitates the following stages.

Analyzing: The method of analysis should have been determined at the planning stage. You use analytical methods (involving statistics) to test a hypothesis. Having a good arrangement of data means that the analyses run smoothly.

Reporting: Disseminating your work is vitally important. Your results need to be delivered in an appropriate manner so they can be understood by your peers (and perhaps by the public); this means summarizing your data numerically and graphically. Part of the reporting process is to determine what the future direction needs to be. Having a good data layout can help make your data understandable by others and also help you to present as usefully as possible.

Essentially you use the planning stage to help you determine what data to collect and how to arrange it. You use the recording stage to save your data in an arrangement that allows you to proceed to the analytical stage. The recording stage should also allow you to check for errors and permit you to add extra information that you may have overlooked at the earlier planning stage. If your data are arranged sensibly the analysis and reporting stages are facilitated, and you can share your data with others more easily.

It is easier to understand the issues by looking at some examples. In the following sections you'll see examples of different ways to set out data.

1.1.1 Common ways to lay out data

How you set out your data depends somewhat on the kind of analysis you are going to do. In the following examples you'll see several kinds of experimental situation.

Comparing samples

When you are comparing samples of things the simplest way to set out data is sample by sample. In Table 1.1 you can see such a layout, where there are two samples of data. Each column represents the data from a separate sample: there is one for females and one for males. The numbers in the columns show the lengths of the mandibles in millimetres. There is seemingly no great problem with Table 1.1 and if this were annotated fully it would certainly be acceptable as a data format. If you have more samples you simply have more columns, such as in Table 1.2.

In Table 1.2 there are five columns; one for each diet. The values in each column show the wing lengths of flies fed on that particular diet. If you had carried out this experiment and recorded the data in your lab notebook you would probably have included the date and a few additional details so that you could repeat the experiment at some future date if required.

As your data becomes more complicated it becomes more difficult to maintain a sensible layout (Table 1.3).

The data in Table 1.3 show the number of breaks in a fixed length of wool for three tensions (high, medium and low); there are two types of wool (A and B). In this instance the data are set out in two separate blocks, each corresponding to wool type. Whilst this makes some sense, you can see that if additional factors were to be included, things might get a little tricky to represent using the "sample" approach. In the following example the data are in three separate blocks because there are too many factors to show easily (Table 1.4).

The data in Table 1.4 show the abundance of three species of beetle. The beetles were counted at three different sites. At each site there were two contrasting habitats. You can see that this layout has stacked blocks of data (the species); if there were any more variables you would not be able to display them sensibly in a single table.

Association analysis

In some kinds of analyses you might collect your data in the form of a frequency table (sometimes called a contingency table). An example is shown in Table 1.5 where you can see frequencies of combinations of hair and eye colour for female university students.

You can see from Table 1.5 that there are two sets of categories: the columns are for eye colour and the rows for hair colour. Each cell of the table shows the number of people (the frequency) with a particular combination of hair and eye colour.

The contingency table in Table 1.5 is certainly how you would set out your data in order to carry out the association analysis, but it might not be the most flexible approach, as you will see later. If you have more than two categories (perhaps male and female students) you would not be able to show the data in a single table.

Correlation and regression

When you are looking for relationships between factors your data tends to come in a particular way that lends itself to a certain layout. In simple correlation you have two variables and are looking to examine the strength of any link between them. In Table 1.6 you can see an example of the link between height and weight for a group of American women.

The data in Table 1.6 are naturally arranged in pairs: a particular height is matched up with a particular weight (they are measurements from the same person of course). The arrangement is neat: each row represents data from one case (called an observation or replicate) and each column represents a variable (also called a factor). If the experiment becomes more complicated it is easy to add extra columns for the additional variables. In Table 1.7 you can see part of a larger dataset containing four environmental variables from New York.

The data in Table 1.7 are part of a larger set; there are 111 observations altogether. Each row shows the measurements for a day; each column shows measurements for a single environmental variable. The first column shows a simple index value here but it could easily have shown the actual date of the observations (see Section 1.4.2).

In most cases you have a response variable (also known as the dependent variable ) and a number of predictor variables (also called independent variables ). In Table 1.7 the Ozone variable is the response variable. The other variables are the predictors; you hope that they will help to predict ozone levels.

It is probably best to use the terms response and predictor, rather than dependent and independent. When you have multiple variables you do not really know if the predictor variables are actually independent of one another.

In many cases you are looking to see what effect the various predictor variables have on a particular response variable; in Table 1.8 you can see such a situation; an example of where you would use a regression analysis.

The first column in Table 1.8 shows the response variable, the number of cycles of loading until the yarn breaks. The other columns show the predictor variables; they have an effect on the response variable.

The data examples you've seen here are all set out in subtly different ways. However, it is possible to use a single layout that permits you to represent all these experimental situations, as you will see next.

1.1.2 A standard layout for data

To be most useful your data has to be laid out such that it:

• Shows all the information.

• Is flexible.

• Can be checked for errors easily.

• Can be analysed easily.

• Can be extended and modified easily.

Meeting all those criteria could be a tall order but it is possible with a little thought. Look back at the jackal data in Table 1.1. There are two columns, one for each sample (male and female). This layout seems to meet the criteria. If you add more samples you can simply add more columns. Then the data would look more like the fly-wing data in Table 1.2, where there are five columns, one for each sample.

One potential problem with the multi-sample layout is that many computer programs cannot carry out an analysis with the data in this form. Another problem is shown by the next example shown in Table 1.3. Here you have the number of breaks in lengths of wool under three tensions. If you only had one sort of wool this would not be a problem but you have two sorts of wool. In order to display the data you have to make two blocks of results. In Table 1.3 the blocks are shown side by side but an alternative layout is where you have one above the other as shown in Table 1.9.

This stacked-block layout has a certain logic: you can see the experimental situation fairly clearly. There are problems though; most computer programs cannot analyze the data in this form. The more additional variables you add to the situation the harder it becomes to display the data in a convenient manner. For example, Table 1.4, which shows the abundance of different beetle species, contains about as much information in a single table as you can get.

The key to arranging your data lies in being able to split it into two types: variables and observations. This is fairly apparent when you look at data like those in Table 1.8 for example. Each column represents a single variable. The first column is the response variable and the others are predictor variables, that is, the values in the predictor variables have an influence on the magnitude of the response variable. The rows of the dataset represent the individual observations (sometimes called replicates or records ), and each row is an individual set of measurements. These come from your sampling units, which might be quadrats, subjects, time periods or whatever.

Sometimes the data do not fall neatly into response and predictor variables; the data in Table 1.7 are like this. None of the variables (columns) could be accurately described as response or predictor variables but nonetheless you are interested in their relationship to one another (you'd lean towards the Ozone variable being the response and the others predictors). The rows still represent the separate observations (daily throughout the experimental period).

The variables do not have to be numeric, so you can have columns like those shown in Table 1.10, which shows an alternative layout for the wool data shown in Tables 1.3 and 1.9.

In Table 1.10 the data are shown with a response variable in the first column; this is the number of breaks for lengths of wool. The next two columns show the type of wool and the tension. Both contain simple labels. Sometimes this kind of notation is called a factor variable, as opposed to a continuous variable (that is, a number). You should note however, that in some branches of science plain numbers are used as factors. This tends to occur in medical and psychological research, where it is felt (by some) that a plain number prevents any potential bias. Thus drug 1 might be a placebo whilst drug 2 has an active ingredient.

The rows of the dataset represent the individual observations, or replicates. Not all of the data are shown in Table 1.10 but you can look back to see that there are nine replicates for each combination of wool and tension (Table 1.9).

Most computer programs are set up to expect data in this form and if you are going to carry out much in the way of serious analysis this is the way to arrange your data. Even if you are only going to use your spreadsheet for analysis this kind of layout is advantageous. You are essentially setting out your data like a database, a form that your spreadsheet can utilize.

In fact setting out your data in database form is the key to maximizing its flexibility and usefulness, as you will see shortly.

1.2 RECORDING FORMAT

It does not matter what kind of data you collect, but how you record your data is of fundamental importance to your ability to make sense of them at a later stage. If you are collecting new data you can work out the recording of the data as part of your initial planning. If you have past data you may have to spend some time rearranging them before you can do anything useful with them. The time you spend planning your data layout will pay handsomely later on.

It is easy to write down a string of numbers in a notebook. You might even be able to do a variety of analyses on the spot; however, if you simply record a string of numbers and nothing else you will soon forget what the numbers represent. Worse still, nobody else will have a clue what the numbers mean and your carefully collected data will become useless.

All recorded data need to conform to certain standards in order to be useful at a later stage. First of all you need to be able to make sense of them: if you cannot remember what a column of values represents you are in big trouble. It would be even better if anyone could understand what your data represent. In most cases you add data to a spreadsheet so that each column represents a variable. Somewhere you need to add notes so that a reader can see what the variables are, such as the units and how the data were collected (see Chapters 2 and 9). Each row should be a separate observation.

(Continues…)



Excerpted from "Managing Data Using Excel"
by .
Copyright © 2015 Mark Gardener.
Excerpted by permission of Pelagic Publishing.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Table of Contents

1. Arranging your data
2. Managing your data: building your dataset
3. Managing your data: checking your dataset
4. Making sense of your data
5. Exploring regression data
6. Exploring time-related data
7. Exploring association data
8. Exploring differences data
9. Sharing your data
Appendices
Index

From the B&N Reads Blog

Customer Reviews