Read an Excerpt
Guerilla Data Analysis Using Microsoft Excel
By Bill Jelen, Anne Troy, Linda DeLonais
Holy Macro! BooksCopyright © 2002 Bill Jelen
All rights reserved.
In the trenches
Mainframe data is never perfect. There is a long backlog for requests at the average IT Department. Because we cannot get information in a timely manner from the IT guys, the user departments have to take things into their own hands.
The MIS Department is a willing ally in this regard. Realizing that they are understaffed, they will often purchase a 4th GL reporting tool and foist that on the user community in the hope that some users can "serve themselves" and run their own reports.
I began using a product called EZIQ from Pansophic in 1989. Fourth GL was just coming to the forefront, and this product had numerous problems. It was possible to group by customer number and sum twenty fields, but if you wanted the customer name as well, the detailed version of the report was cumbersome and hard to read. You could calculate a gross profit percentage on the detail lines, but at the total line, your only choices were to sum or average the gross profit percentage from the detail lines. If you are reading this book, I don't have to tell you that neither option makes sense.
The first week of EZIQ was rough. I was a young Data Analyst, recently promoted from the MIS Department, trying to sell the concept that "Sure! You have gross profit percentage on every detail line. Why do you need the gross profit percentage for the entire division??" It didn't play well in Minnetonka.
I had been casually using Lotus 1-2-3 since 1985. I had some basic programming skills. I figured that between my knowledge of Lotus and some basic data processing concepts, I could use Lotus 1-2-3 as the engine for producing the final reports. So, after finding a way to download the Lotus *.wks file from the IBM Mainframe to Lotus, I began using the spreadsheet to actually do the bulk of the data processing.
From 1989 to 1998, some portion of my job involved taking mainframe data to Lotus, and later to Excel, and using the spreadsheet to sort, summarize, format, and print reports.
During the Fourth-GL Reporting Package sales presentation, the VPs of Finance were given a demo that made it appear that they were buying a tool that could generate a custom green-bar report in about 10 minutes. No one was smart enough during that presentation to ask if you could calculate a gross profit percentage on the total line, so we were stuck. The point is that the managers expected SPEED.
There were advantages to the fourth GL Tool, however. Early on, ninety percent of the reports that I created were ad-hoc. "We have a problem with product line A," they might have said. "Tell us everyone who has that product." These are requests that would have taken a day for a COBOL programmer, and I could have the data in a spreadsheet in about an hour.
The Fourth GL tool, even with its warts and shortcomings, did offer a faster way to get the ad-hoc data out of the mainframe and to the desks of the decision makers.
The volume of requests increased, and when we hit upon a useful report, it would be added to the list for monthly distribution.
During those first few years, I only had the ability to query history, and only at the end of the month. This kept the volume of requests to a low roar and allowed me to perfect the techniques that I teach you in this book.
I'll also walk you through several case scenarios and how we dealt with them. I'll walk you through the detailed steps of how to deal with and solve these problems in Microsoft Excel.
My goal is to explain the concept in plain English — no "programmer-speak." This book is aimed at accountants, marketers, financial or operations analysts, administrative assistants — anyone who has a large volume of data and needs to get that data into a meaningful format. You might think that I would start with that super-power of Excel data analysis — the pivot table — but I'm going to start with far more pedestrian tasks. Read through these, learn what methods are there, and then, weeks from now, you can refer to this book when you run into a similar problem.CHAPTER 2
Changing formulas to values
Excel is fantastic at performing calculations. Sometimes, we need to have Excel temporarily calculate something and we need to change the formulas to values. I refer to this technique dozens of times in this book.
One day I received a call from one of the marketing staffers in the company. She had purchased a list of data, and this data included the telephone number and the area code in separate columns as shown below.
A concatenation formula joins two or more fields together.
She wanted to combine the area code and telephone number into a single field. I showed her how to use a concatenation formula to create the desired results.
She was thrilled. Before she knew of this formula, she believed she was going to have to manually retype over 1,000 telephone numbers.
A #REF! Error indicates that one of the cells Excel needs to calculate a formula is missing.
Commonly, Excel novices would now either hide or delete columns B and C. Deleting these columns will have disastrous results. The formatted telephone numbers that were in column D will change to #REF! errors. The #REF! error is Excel's way of saying that you have a formula, and that you have deleted one of the cells needed to calculate that formula.
When this occurs, immediately hit Ctrl+Z to undo the deletion of the columns. The #REF! errors go away.
Spreadsheet veterans may remember this as the /Range Value feature of Lotus. It is just as simple in Excel.
Before you can safely delete the B and C columns, you need to change the formulas in column D to values.
The shortcut for Paste Special Values is
Throughout this book, I advise you to "Paste Special Values" into some range or another. When you see those instructions, I am referring to this technique.
I find that I do Paste Special Values a lot. My fingers can perform the shortcut keys:
Other uses for Paste Special
Performing a calculation on every cell in a range
This technique can also be used to multiply a range by a certain number, or subtract, or divide.
This is somewhat obscure, but if you wanted to add the number 2 to every cell in a range of 10,000 cells, it is simple to do so without a formula.
Changing text that looks like numbers to real numbers
Some data sources provide numbers that Excel sees as text. Text that looks like numbers is great when you need to keep the leading zeroes (for example, U.S. Zip codes), but lousy when you want to perform calculations on the numbers. Excel 2002's smart tags let you correct this automatically, but for Excel 2000 and before, use the "Multiply-by-One Technique":
Enter the number 1 in a blank cell. Step 1
Copy that cell. Select the cell and Step enter Ctl+C. 2
Highlight the range of text that looks like The text Step numbers and choose Edit [right arrow] Paste changes to 3
Special [right arrow] Multiply [right arrow] OK. numbers.
Transposing a column to a row
If you have headings running down a column and you want to have them run across the rows, the Paste Special command is the answer.
The Transpose process takes text from a column and transposes it to a row so that it can be used as a heading.
To transpose columns to rows, follow this procedure:
Highlight the original Select the range and enter Step range of data and copy it.
Move the cellpointer to a Select the cell where you want Step new location. the new heading to start. 2
Select Edit [right arrow]Paste See the illustration below. Step
Transpose [right arrow] OK.
Use the "All except borders" option to avoid changing the borders.
Look at the Paste Special dialog shown above. There are a lot of cool uses for Paste Special. I often find that if I have borders set up around a range and attempt to copy a formula, the borders routinely misbehave. Using the "All except borders" option is a great way to avoid changing the borders.CHAPTER 3
Multiple customer names/single customer number
The saga of XYZ
Let's say that your company has a Customer number to uniquely identify each customer. Now let's follow the life of the XYZ Company, customer number 12345.
Suppose that this company places monthly orders with you and has been doing so for the past five years. There have been two particular events in the history of your relationship with this company.
The first event was fairly significant: XYZ Company was originally the ABC Company until three years ago when they merged with XYZ Company. All orders for the first two years of your data show the account number as 12345, and the company as ABC Plastics, Inc. Then, three years ago, their data began appearing as XYZ Company with the account number 12345.
The second event, far less significant, was just as troubling: Your company hired a new Accounts Payable clerk. This A/P clerk was super-efficient and a little obsessive/compulsive. Who knows why, but one Friday afternoon, the A/P clerk decided to take the top ten accounts and do some customer maintenance. The customer names had abbreviations based on whoever set them up. The A/P clerk edited these names for consistency. Recent orders show XYZ as "YXZ Corporation, Boise ID".
The problem of dealing with multiple customer names for a single customer number is sure to strike you if the source of your data is an Invoice Register or Order Entry System.
The saga of XYZ explains how the XYZ Company became the XYZ Corporation, Boise, Idaho. The names look neater, huh? But now, every time you query the invoice register, you could have up to three names associated with the account number 12345. Here is a sample of the data:
1 Account #
ABC Plastics, Inc.
ABC Plastics, Inc.
XYZ Corporation, Boise ID
XYZ Corporation, Boise ID
The problem with this data is that you really want your reports to show the same customer name for all of these records.
The solution? The VLOOKUP function allows us to solve this problem. Here is the syntax for the VLOOKUP function:
#VLOOKUP (Cell to lookup, Range w/Values, Column to Return, False)
The traditional use of VLOOKUP is illustrated by this example: 10,000 rows of mainframe sales data may include a field on each record with a Sales Rep Number. This is wonderful for anyone who can memorize all 95 Sales Rep Numbers, but not so wonderful for the division VP who would rather see "Kranicki, James J." instead of "032". It is common to have a separate Excel file that lists each Sales Rep Number and the corresponding Sales Rep Name. This small file might have 95 rows and just two columns. I would typically copy this data to an out-of-the-way location on my spreadsheet, say, I2:J96.
Any number of things can go wrong with a VLOOKUP, from data that isn't found, to data that isn't properly formatted. Using a False argument can help!
The false parameter — the fourth argument in this VLOOKUP — is an evil plot intended to punish power users of Lotus 1-2-3.
Decision makers are not power users of Lotus. After the decision makers forced their companies to switch from Lotus 1-2-3 to Microsoft Excel, they had little problem using their Lotus skills in Excel — the SUM and AVERAGE formulas worked just fine. Power users, however, found their VLOOKUPs no longer worked, causing angst and consternation. This discovery always comes when you are trying to power analyze.
It wasn't until someone who had already endured the pain told me, "Just add a ',FALSE' as the fourth parameter.", that everything finally fell into place.
This formula tells Excel to take the Rep # from F2, then cruise through all of the values in I2:I96 looking for a match. When Excel finds the matching Rep #, it then returns the second column from the table in I:J, which in this case is the rep name in J.
Problems can occur with the VLOOKUP function. In our example, a common problem could be that the representative cannot be found in the Representatives table. For instance, Representative 198 was just hired last week, and has not yet been added to the table where we used VLOOKUP to get the representative's name, and we end up with the dreaded #N/A error.
But what about the opposite problem? We don't have anything forcing the representatives name to be unique, so it is common that a representative might appear twice in the lookup table.
This is what happened with Representative 032: Someone thought that they were a new rep and added them to the table, so they now appear a second time. If a sales rep number is in the table twice, the VLOOKUP formula returns the name associated with the first matching Rep # in the table.
Going back to our example, this behavior works exactly to our advantage! Our goal is to replace the diverse account names in column B with just a single account name for every occurrence of the account number.
To use the Fill Handle:
Click and drag down to copy a formula; double click to auto-copy down many cells, as long as there is data in the adjacent column(s)
To create a VLOOKUP formula, follow this procedure:
Step Procedure Description Step 1 Insert a temporary column C labeled "Company II".
Step 2 In column C, insert this formula:
Insert a temporary column Step
C labeled "Company II". 1
In column C, insert this Step formula: 2
Copy the formula down to The resulting formula Step all cells using the fill returns the exact results that 3 handle. we need.
If you have a slow PC, you may find that the VLOOKUP formula can take a long time. As soon as this calculation is complete, I like to change those formulas to values. This is always a good idea; but because of the long recalculation times of VLOOKUP, it is an especially good idea.
To convert formulas to values in any workbook, follow this procedure:
Select the range with the
Highlight the target range. Step formulas. 1
From the Edit menu, choose Step
Edit [right arrow] Copy. 2
From the Edit menu, choose
You now have a single Step
Edit [right arrow] Paste Special [right arrow] customer name for each 3
Values [right arrow] OK. account number.
However, this solution is not the best one. The name, "ABC Plastics" is from an old transaction. It would be better to have the customer name from the most recent transaction shown.
Use this procedure if you can sort your data by another column.
Solution 1a: To sort or not to sort.
If the data can be sorted and you have a transaction date, you can use this trick:
 Step 1
Sort all data by descending transaction date.
 Step 2
Create your VLOOKUP formula as shown above.
 Step 3
Convert the formulas to values.
 Step 4
Sort all of the data by the original sort criteria.
Use this procedure if you cannot sort your data by some other column.
If you are unable to sort the data by some other column, try this:
Add a temporary column called "Sequence". Step 1
Add a sequence number, i.e., 1, 2, 3 using one of the Step following methods: 2
* Type 1, 2, 3 into the first three cells. Select the cells and copy down using the Fill Handle.
* Type 1 in the first cell. Using A2 as the cell that the 1 is typed into, type =A2+1 in cell A3. Copy the formula down using the Fill Handle. Convert these formulas to values.
Create your VLOOKUP formula. Step 3
Convert the formulas to values. Step 4
Sort your data back by sorting the Sequence column. Step 5
Delete the Sequence column. Step 6
It would be cool if the third parameter of VLOOKUP could be a negative number to indicate that you want a column to the LEFT of the key, but it doesn't work this way.
VLOOKUPs are great when the information you are looking for is to the right of the key field. What happens when the field you want is to the left of your key field? In the example below, we need a way to find the revenue from column C associated with an order number in column D.
The quick and dirty way is to copy the revenue to a temporary column F, do the VLOOKUP, convert the range formulas to values, then delete the temporary column F.
There is a better way! Unfortunately, it involves two seldom-used functions. I always find that I can learn to use one new function, but trying to master two functions at once is tough. Hang with me, though, as I walk through these in English:
The above formula returns the third row and the second column from the range A2:C99. This is one of those functions that seems really useless, right? But let's see if we can use this in our customer example. We need a way to specify a Row # on the fly:
Combine MATCH with INDEX to do a lookup to the left of your key field.
Excerpted from Guerilla Data Analysis Using Microsoft Excel by Bill Jelen, Anne Troy, Linda DeLonais. Copyright © 2002 Bill Jelen. Excerpted by permission of Holy Macro! Books.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.