For scientists and engineers tired of trying to learn Excel with examples from accounting, this self-paced tutorial is loaded with informative samples from the world of science and engineering. Techniques covered include creating a multifactorial or polynomial trendline, generating random samples with various characteristics, and tips on when to use PEARSON instead of CORREL. Other science- and engineering-related Excel features such as making columns touch each other for a histogram, unlinking a chart from its data, and pivoting tables to create frequency distributions are also covered.
About the Author
Read an Excerpt
Excel for Scientists
By Gerard Verschuuren, Linda DeLonais
Holy Macro! BooksCopyright © 2005 Dr. Gerard M. Verschuuren
All rights reserved.
Chapter I: General Techniques
There are some general techniques in Excel that every scientist should know about. These techniques will make your work easier and faster, and you will need them in all the chapters to come.
* How do you copy information from your spreadsheets?
* How do you refer to (the contents of) other cells?
* How do you use Names in Excel spreadsheets?
* What is behind your values: Appearance or Reality?
* How do you manipulate dates?
* How do you place functions inside functions?
For a more general coverage of these issues, consult the interactive CD called "Join the Excellers League" published by MrExcel. This CD is available through www.mrexcel.com or www.amazou.com.
Making Copies and Trends
You are probably familiar with the most common copy and paste routines in Excel. Select the cell(s) to copy from, and then choose one of the following options:
1. Hit the Copy button, select a new cell, and hit the Paste button.
2. Choose Edit / Copy, select a new cell, and choose Edit/Paste button.
3. Press Ctrl+C, select a new cell, and press Ctrl+V.
4. Right-Click (R-Click) on your selection, choose Copy, R-Click on a new cell, and choose Paste.
But there are also a few other, even fancier ways that we will discuss next.
Using Paste Special
Sometimes, you don't want an exact replica of the copied cells, but a modified version instead. That's when you need the Paste Special option from either the Edit menu or from the R-Click menu.
Say you have some values in your spreadsheet based on a certain formula but you don't want these calculated values to change anymore (see Figure - 1):
1. Select all formula results.
2. Copy these cells.
3. Select Paste Special from the Edit menu.
4. Choose Paste [??] Values + hit OK.
Now all your formulas have been replaced with values!
The Paste Special menu has many more powerful options. Let's assume you have replaced formulas with values, but you need to update these static values with a certain factor – say, a multiplication by 1.1:
1. Type your multiplier 1.1 in a cell somewhere on the sheet.
2. Copy that specific cell. p.s. Don't forget this step; Excel can't read minds!
3. Select the cells that need to be updated.
4. Choose Edit / Paste Special.
5. Select Operation [??] Multiply.
How can you display all formulas at once? By using Ctrl ~ (the Ctrl key plus the ~ (Tilda) key). This is a toggle option that allows you to switch back and forth between the value view and the formula view.
To add your own customized list(s), do the following:
1. Select Tools / Options / Custom Lists.
2. Type your entries, clicking Add after each one (see Figure - 4).
Let's say that you want the same formula in the cells A1:A200. What is the most efficient way of doing such a thing?
1. Select cell A1.
2. Select Edit / GoTo / Reference: A200.
3. Do NOT hit Enter by itself; instead, hit Shift + Enter (or Shift + OK). The Shift key will also select the cells in between A1 and A200, even though you are actually still in cell A1 (white) (unless you click somewhere else, so don't do this!).
4. In the formula bar, type: =RAND()
5. Do not hit Enter, for that would take you to A2 (and you would have to type the formula again). Instead, you want this formula to be in all the selected cells, so hit Ctrl + Enter. The Ctrl key inserts the formula in all selected cells
Let's say that you want cell B1 to display the sum of A1:A200. When you select cell B1 plus the AutoSum button ([summation]), you see only: =SUM(). How can you get the total range in there?
1. Click in A1. Now you see: =SUM(A1)
2. Hold both Shift + Ctrl and press the Down-Arrow key ([down arrow]). Voila!
Understanding Relative versus Absolute
When you copy a formula in any direction, cell references such as "A1" will change accordingly because they are actually relative cell references – something like "one cell up" or "one cell to the left." To make all or part of a cell reference absolute, you need to apply the "$" sign to the row and/or column symbol. This can be done easily by hitting the F4 key (after selecting or typing a specific cell address). The F4 key is actually a cycle key: it goes from A1 to $A$1 to A$1 to $A1 and back to A1. When you copy formulas like these, they will behave as shown in Figure - 5:
Another way of creating absolute cell addresses is to give one or more cells a specific name. This name can then be used in formulas as an absolute cell reference:
1. Select the cell(s) you want to name.
2. Click inside the Name box (the first box on the formula bar).
3. Type a (unique) name — either without spaces or using underscores in place of spaces (e.g., My_Name).
4. Hit Enter.
5. The dropdown list of the Name box now holds the new name, which is accessible from any sheet in your book.
If you ever need to change or delete the address "behind" a name, go to Insert / Name / Define (see Figure - 6).
* To change: Select the name and change the reference at the bottom
* To delete: Select the name and hit the Delete button
From now on, you can use this name in formulas. This is one way to make your formulas more "readable."
Names are also great when you want to refer to a list of items or a table of values in formulas. Later on, we will see that many functions need to know the range of cells in which to search. Supply them with a name!
Another nice feature of names is that they refer not only to cell addresses but also to specific values. Say you often need to use a certain constant in your formulas:
1. Go to Insert / Name / Define.
2. Type the name of the constant, such as "signif".
3. In Reference: type the constant's value preceded by an equal sign, e.g. =0.975.
You can now use the "=A1*signif" name in formulas. Plus, whenever you change the constant behind the name, all formulas using it will automatically update!
Names function at the book level, which means that you can access them from anywhere in the book, on any sheet. However, if you want to create a sheet-level name, you must add the sheet's name followed by an exclamation mark:
Filling Adjacent Cells
Excel has a special tool, called the AutoFill Handle, that allows you to automatically fill adjacent cells. This handle is located in the cell's (or a selection's) right lower corner. When you move the cursor there, it changes into a small plus sign (+):
* You can L-Click and drag this handle down to copy cells
* You can R-Click and drag this handle down to get a menu with options to choose from.
Using a R-Click-drag, you have several options to choose from (especially for dates). Series, the last one, is the most powerful because it allows you to specify the step by which you want to increment (see Figure - 3). You can even type "– 5" to go five days back, and there is also a choice to skip weekends.
Another way of creating trends is to select two or more cells at the same time and, using the AutoFill Handle with a L-Click drag, to continue the selected pattern.
Excel also has some built in lists (see Table - 1) that the AutoFill handle can use.
Names can even be dynamic, provided you use the OFFSET function to define them under Insert / Name / Define.
=OFFSET(Start, 0, 0, Rows, Cols)
The range A1:A10 in Figure - 7 has been given a name – in this case, "Patient". You may have to change this reference if you add new patients at the bottom. Instead, you could count how many cells in column A have something in them (number or not) by using:
This count can then be used in the Patient reference, provided you make sure there are no hidden figures farther down in column A:
=OFFSET(A1, 0, 0, COUNTA(A:A))
Unlike COUNTA, COUNT will only count cells that have numbers in them.
From now on, the name "Patient" can be used in any formula and it will include all future additions (see Figure - 8).
Telling Appearance from Reality
Numbers in Excel are stored with a maximum precision of 15 digits and/or decimals. You can change the "look" of a number by using Formatting tools – such as Format/Cells or the Currency Style, Percent Style, and Increase/Decrease Decimal buttons (see Figure - 9).
However, changing the look of a number does not change the real number in the background. Excel will still use the entire number (up to 15 digits and or decimals) in its calculations. What are the consequences?
* A number like 123,456,789,123,456,789 will show up as 123456789123456000
* A number like 0.123456789123456789 will show up as 0.123456789123456
With Format/Cells/Number, you can also choose a scientific notation – which again is a matter of "looks." This way, 0.123456789123456 would be shown as 1.2346E-01 if you decide to display four decimals.
To really change a number, you need functions such as INTO, ROUNDO, ROUNDUP(), or ROUNDDOWNO. With the function ROUND(X, n), for instance, you can also specify a number's (X's) number of decimals (n). Table - 3 gives you an overview of some useful rounding functions.
Dates in Excel are also not the way they appear. When you see 1/1/1900, Excel actually stores the serial number 1. The number "2" represents "Jan 2 1900" – and today's number could be something like "38500". What's the purpose of all of this? The reason is quite simple: You can calculate with dates, adding and subtracting them like numbers!
In addition, each serial number allows for decimals as part of a day. A number like 1.5 stands for 1/1/1900 12:00 p.m. Time is expressed as a decimal based on 24 hours.
The issue of how you want to display this number is a matter of formatting:
* mm/dd/yyyy: 01/01/1900
* dddd mmm-d: Monday Jan-1
* m/d/yy h:mm:ss: 1/1/00 12:00:00
There are also some great date functions:
* =TODAY(): e.g., 38500
* =NOW(): e.g., 38500.1234567890
* =NOW() — TODAY(): e.g., 0.123456789
Given this information, we should be able to work with dates more comfortably.
1 We can add, subtract, sum, and average time information – as long as we use the format h:mm:ss.
2 When date and time are involved, the difference between two values may be more than 24 hours. This requires a special format: [h]:mm:ss.
3 If you prefer to deal with time as a decimal of hours – e.g. 8.5 instead of 8:30 – then you need to convert this time information into day decimals: Just divide by 24 (hours). If the total goes beyond 24 hours, you have to use the format [h]:mm:ss again (see Figure - 10).
Putting Functions Inside Functions
All formulas in Excel are based on calculations and/or functions (see Table - 6):
* Calculations work with operators such as (), ^, *, /, +, and -. These operators work in a certain order of precedence, so 2+4/2 is 4 in Excel, whereas (2+4)/2 returns 3.
* Functions are built in operations, such as =SUM(A1:A3), which is equivalent to the calculation =A1+A2+A3. Most functions accept or require one or more arguments inside their parentheses.
* Formulas can also be a combination of calculations and functions - for instance, =SUM(A1:A3)*0.05, or =NOWO+30.
* Formulas can also "nest" functions inside functions — such as: =ROUND(SUM(A1:A3), 2).
* Formulas always start with an equal sign. They are based on calculations and/or functions.
The question is: How do you type a function that has a complex syntax behind it? If you know its syntax and the order of arguments, you can just type the function (from Excel 2002 on, you even get help while you are typing the function).
But what if you don't know the syntax yet? There are several options on the toolbar (depending on your Excel version):
* You can use the fx button (either on the toolbar or on the formula bar).
* There may also be a drop-down button next to the [summation] button (AutoSum) on the toolbar.
The [summation] button covers one function; the fx button covers them all.
Through these options, you get a complete listing of all the functions available in Excel. Select the function you want and use the new dialog box to provide the arguments that the function calls for.
What to do if one of the function's arguments requires a new (nested) function? Do not use the fx button again, but instead click on the formula bar's first button (in this case SUM; see left-most arrow in Figure - 11). This opens a new dialog box, although the total formula remains visible in the formula bar.
Provide the information for the second function. If you are completely done, hit OK. If you need to go back to the first function, click on its name in the formula bar. This opens the first dialog box once again.
The exercises that follow are intended to reinforce the information just presented. You will find similar exercises throughout the remainder of this book. The answers to questions posed in these exercises can be found in the Answers to Exercises section on page 104.
1. Start with a new spreadsheet and type the following labels:
a. In A1: [AH]
b. In B1: [A]
c. In C1: Ka = 4.7
2. In column A, put in the percentages of acetic acid [AH]. They go from 98, 96, 94, and so forth down to 2. Fill the column the easiest and fastest way possible.
3. In column B, put a calculation for the percentage of acetate [A]:
a. Select B2 through B50 (Hint: Use Go To + Shift).
b. Type your formula: =100 – A2
c. Enter this formula in all cells.
4. In column C, calculate the pH of the solution:
a. Select C2 through C50.
b. Use this formula: =C1 + LOG10(B2/A2)
c. Make sure you have the right type of reference for C1: $C$1
d. Enter this formula in all cells.
5. Save the table.
1. Start with a new spreadsheet and use the information from Figure - 12.
2. Calculate the time lapse.
3. Format the cells properly.
4. Calculate glucose mol/min.CHAPTER 2
Chapter II: Statistical Analysis
Understanding Sampling Distributions
Measurements are usually based on a sample taken from the total population (see Figure - 13).
This sample can be either:
* a test group taken from the total population or
* a finite series of measurements done from among an infinite number of potential measurements.
Using Normal Distribution
The mean of a sample distribution is the result of observing a specific sample. Other samples could have given us different means. The collection of all possible means is the focus of a sampling distribution.
A normal distribution is a sampling distribution with the following properties:
* 1x the StDev around the mean is 70% of its total surface (= 1 x SE)
* 1.65x StDev is 90% (= 1.65 x SE)
The Z-value is plotted on the X-axis of Figure - 14. The Z-value represents the distance between a specific mean and "the Mean of the means," expressed in units of Standard Error (which is the Standard Deviation of the means).
These are some of the formulas used in the spreadsheet shown in Figure - 15:
* D1: =AVERAGE(A:A)
* D2: =STDEV(A:A)
* D6:D18: =FREQUENCY(A:A,C6:C18) (array function; see Multiple-cell arrays on page 74.)
* E6: =NORMDIST(C6,$D$1, $D$2,TRUE)
* F18: =Total*(E18 – E17) (and then copied upwards)
* G6 holds one of these three options:
1. =(C6-$D$1)/ $D$2 or
2. =NORMSINV(E6) or
* H6: =NORMSDIST(G6
A Normal Distribution can be plotted either as cumulative or as non-cumulative. The surface under the bell-shaped curve (right) is 100%, The cumulative curve (left) goes up to 100%.
To calculate the surface of a specific section of a bell-shaped curve (say, the category between 10 and 10.25), subtract the probability (51%) of the bottom value (10) from the probability (69%) of the top value (10.25), and you will find a surface of 18%. So 18% of all cases fits in this category (see Figure - 16).
A sample distribution is a frequency distribution of values, whereas a sampling distribution is a frequency distribution of means. The mean of a sample distribution is one of the means in a sampling distribution. The ideal sampling distribution is a Normal Distribution. Figure - 17 shows the relationship between both distributions.
1. Fill A1:A10 with these values: 24, 27, 20, 23, 25, 25, 24, 22, 26, 29
2. Find the Mean and StDev.
3. Create these bins: 21, 23, 25, 27, 29
4. Put the frequency next to each bin.
5. Put the Z-value that comes with that bin value next to each bin.
6. Place the cumulative surface area of each Z-value next to that Z-value.
7. Save your file.
1. Create a column with 13 Z-values: -3.0, -2.5, ... 2.5, and 3.0
2. Create a second column with the corresponding cumulative surface areas by using the correct function.
3. Save your file.
Using the Random Number Generator
The Random Number Generator is available through the Analysis ToolPak.
To install the Analysis ToolPak, select:
3. Analysis ToolPak
Excerpted from Excel for Scientists by Gerard Verschuuren, Linda DeLonais. Copyright © 2005 Dr. Gerard M. Verschuuren. Excerpted by permission of Holy Macro! Books.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
About the Author,
Chapter I: General Techniques,
Chapter II: Statistical Analysis,
Chapter III: Plotting Graphs,
Chapter IV: Regression Analysis,
Chapter V: Complex Functions,
Chapter VI: Data Analysis,
Appendix A: Answers to Exercises,