In the many fields beginning to use mathematics widely (business, biology, the social and political science, for example) a knowledge of statistics has become almost as imperative as the ability to read and write. This useful volume promises to be the salvation of those who, despite a distaste for math, need to use statistics in their work. Approaching the topic through logic and common sense rather than via complex mathematics, the author introduces the principle and applications of statistics, and teaches his reader to extract truth and draw valid conclusions from numerical data.
An indispensable first chapter warns the reader on the ways he can be misled by numbers, and the ways in which numbers are used to misrepresent the truth (arithmetical errors, false percentages, fictitious precision, incomplete data, faulty comparisons, improper sampling, failure to allow for the effect of chance and misleading presentation). There follows a wealth of information on probability, sampling, averages and scatter, the design of investigations, significance tests — all presented in terms of specific, carefully worked out cases that make them both interesting and immediately understandable to the layman. The book is so entertaining, so eminently practical, that you'll gain expertise in the laws of chance, probability formulae, sampling methods, calculating the arithmetic mean and standard deviation, finding the geometric and the logarithmic mean, constructing an effective experiment or investigation using statistics, and a wide range of tests determining significance (zM test, X2 text, runs test for randomness and a number of others) and scores of other important skills — in a form so palatable you'll hardly realize how much you are learning, Scores of tables illustrate the text, and a complete table of squares and square roots is provided for your convenience. A handy guide to significance tests helps you to choose the test valid and appropriate for your data with speed and ease.
Written with humor, clarity, and eminent good sense by a scientist of international reputation, this book is for anyone who wants to dispel the mystery of the numbers that pervade modern life, from news articles to literary criticism. For the biologist, sociologist, experimental psychologist or anyone whose profession requires the handling of a large mass of data, it will be of incalculable value.
Read an Excerpt
PRACTICAL STATISTICS SIMPLY EXPLAINED
By Russell Langley
Dover Publications, Inc.Copyright © 1970 Dr. Russell Langley
All rights reserved.
On Being Misled by Numbers
Who hasn't been fooled by numbers at one time or another? For numbers are peculiar things. On the one hand, they are undoubtedly essential for the precise description of many observations ('rockets go terribly fast' is rather vague, isn't it?), and yet on the other hand, we all know that numbers can be very misleading at times.
Some people even get to the stage of mistrusting all numerical observations. You will hear them say, 'Go on, you can prove almost anything you want to with figures.' Which implies, of course, that you can prove almost nothing with figures. I have even heard it said that with Statistics you can prove that a man is perfectly comfortable when he is standing with one foot in a pail of iced water, and the other foot in a pail of boiling water! Such jibes are due to ignorance, born out of the unhappy experience of being misled by figures in the past. But surely the answer to this is to learn enough about figures to make sure that you won't be duped again.
This book deals with this particular problem. It might even have been subtitled, 'How to Avoid Being Misled by Numbers'.
There are 8 ways in which numerical data is likely to be misleading, viz.–
(1) Arithmetical errors.
(2) False percentages.
(3) Fictitious precision.
(4) Misleading presentation.
(5) Incomplete data.
(6) Faulty comparisons.
(7) Improper sampling.
(8) Failure to allow for the effect of chance.
Let us begin by looking at the first 6 traps; the last 2 on this list will be dealt with in detail in subsequent chapters.
It is well known that we tend to accept things in print as being true, chiefly on the strength of the fact that they have been printed. This applies both to numbers and to ideas. Yet, in spite of all due care, arithmetical mistakes do occasionally creep into print, so if the subject is one which matters to you, it is best to check the author's calculations before accepting them.
In a critical article on the first Kinsey Report, Professor W. A. Wallis pointed out that there were so many arithmetical mistakes that it was not even clear how many men had actually been studied. On one page of the Report it is stated that the observations were made on a total of 12,214 men, while on another page is a map showing 427 dots, each of which is said to represent 50 men; if so, there were 50 x 427 = 21,350 altogether. Or again, one table shows the number of men 30 years of age or less as being 11,467, while in the very next table the same group total is shown as 11,985. When two such figures differ, it stands to reason that at least one of them must be wrong. (Journ. Amer. Statist. Assoc., 1949, pp. 463-84.)
We all learnt something about percentages when we were young. But sometimes we forget little details, which in the present case would leave us wide open for swallowing a heap of false figures. For instance –
(a) Beware of adding percentages. 'The price of men's haircuts must be increased. In the past 2 years, wages have risen 10%, combs, brushes, and other materials have gone up 8%, shop rentals have gone up 10%, and electric light bills have gone up 5% - a total rise of exactly 33%.' But this total is wrong. If each of the items making up the cost of each haircut had risen 10%, the total cost would only rise by 10%.
(b) Beware of decreasing percentages. 'Apples are 100% cheaper than last year.' Does this really mean they're giving apples away free? For 100% less than any quantity is zero.
How about this one: 'Because of the shocking weather conditions, this year's wheat crop is 120% less than last year's.' This is quite impossible, for this year's crop can't be less than zero.
All percentage changes must be based on the original level. So, a rise in wages from £20 to £30 is a 50% increase; if wages now fall to £20 again, the downgrading is a fall of £10 from £30, which is a 33% decrease.
(c) Beware of huge percentages. 'J. B. earns 1,000% more than Smithy.' Sounds a colossal difference, doesn't it? Yet it is exactly equivalent to saying '11 times'. (Not '10 times', as might be thought, for 100% more = twice, 200% more = 3 times, and so on.) You can take it that, as a general rule, people using huge percentages are doing so to exaggerate their claim. In which case, they are apt to be biased, anyway.
(d) Beware of percentages unaccompanied by the actual numbers. 'In a special experiment, we found that 83.3% people got relief from Dumpties within 60 seconds.' They conveniently forgot to mention that the experiment concerned 6 people, 5 of whom got the stated relief. And if you test enough small groups, sooner or later you're almost certain to get one group to suit your purpose, purely by chance.
Surely no one would be fooled by the apparent accuracy of a figure given in the World Almanac of 1950, that there were 8,001,112 people in the world who spoke Hungarian. I like that final 12. It suggests that when this count was made, exactly 12 toddlers had just learned to say 'Pa-Pa' (which is 'Dad-Dad' in Hungarian).
However, this same kind of fault can appear in much more sophisticated forms, as illustrated in the following excerpt from How to Lie with Statistics by Darrell Huff (Gollancz, 1962) –
Ask a hundred citizens how many hours they slept last night. Come out with a total of, say, 783·1. Any such data is far from precise to begin with. Most people will miss their guess by fifteen minutes or more, and there is no assurance that the errors will balance out. We all know someone who will recall five sleepless minutes as half a night of tossing insomnia. But go ahead, do your arithmetic, and announce that people sleep an average of 7·831 hours a night. You will sound as if you knew precisely what you were talking about.
So don't be too impressed by a result simply because it is quoted to 10 or so decimal places. Make sure that the degree of precision claimed is warranted by the evidence.
You will often be able to detect fictitious precision by asking: How could anyone have found that out?
While a healthy scepticism is desirable, you must be prepared to sometimes come across results which at first seem to be incredible, and yet which are true. For example, if someone tells you that there are 3,300 fish in a certain lake, you are entitled to wonder how anyone could possibly know such a thing. Yet it could be quite a reliable figure. It is found as follows. Catch 100 fish in the lake, tag them with special markers, and put them back in the lake. Return a couple of months later, catch another sample of 100 fish, and see how many of this sample are tagged fish from the first catch. Suppose there are 3. Then it can be induced that if 100 tagged fish represent 3/100 of the total fish population of the lake i.e. 100 = 3/100 total), this total must be 100/3 × 100 = 3,333, or in round figures, 3,300.
One of the tricks about numbers is that there is often a variety of ways of presenting the same numerical fact, and some of these ways seem to suggest a different conclusion from others. As Darrell Huff says –
You can, for instance, express exactly the same fact by calling it a 1 % return on sales, a 15% return on investment, a $10,000,000 profit, an increase in profits of 40% (compared with 1935-39 average), or a decrease of 60% from last year. The method to choose is the one that sounds best for the purpose at hand!
Even diagrams, charts, and graphs, which are excellent for presenting a numerical message so that it can be noted at a single glance, are not immune from malpresentation. Sometimes it seems that the man who prepared the chart was really hoping you'll only take a single glance, for if you look twice you may notice that units are omitted, or that the values shown do not agree with those in the body of the text (you can always call it a printing error if you're caught at this one!), or most extraordinary of all, the neat conjuring trick which Huff calls the 'gee-whiz graph' (see Fig. 1).
The little figures that aren't mentioned can result in an awful lot of numerical distortion.
Time Magazine (June 12, 1964) quoted 'some sober statistics compiled in recent months by various state authorities' concerning the safety of driving in large cars versus small cars. Three independent reports showed the risk of being killed in an accident was up to 5½ times greater for persons in small cars as compared with large cars. This would make a good advertizing point if you were selling large cars, wouldn't it? But, as Time pointed out, this is only part of the story. For the same official reports also showed that small cars do not get into as many accidents as large cars, so the overall risk is about the same in both.
Sometimes data is incomplete for the reason that the person collecting it felt that certain figures were plainly unreasonable, and must therefore have been caused by some fault such as a clerical error. For instance, in a list of people's heights, one would suspect such an error if one of the heights was quoted as '8 feet 6½ inches'. In such a case, the right thing to do would be to check the original measurement. But note clearly that it would be wrong to discard it, simply because it seemed very unlikely. Once you start hand-picking the results, your sample becomes a biased one. The best rule is therefore to never discard a result unless there is good reason for doing so before the result is known (e.g. if the experimental apparatus was accidentally damaged).
Suppose that 3 analyses are made on a sample of ore, and that 2 results are in close agreement, while the third differs quite considerably from the other two. Many people would be tempted to accept the average of the 2 closest results, and would discard the third as being 'probably wrong'. To illustrate the unsoundness of this procedure, W. A. Wallis and H. V. Roberts (Statistics – A New Approach, Free Press of Glencoe, 1960, pp. 140-1, with permission) took 10 random samples, each of 3 measurements, from a large table of numbers which are known to vary in a natural manner around an average value of 2·000. Here are the results –
In each case, the average which is closest to the true average of 2.000 is marked with an asterisk. The Table shows that averaging the 2 closest measurements gives the better result in 3 cases, whereas averaging all 3 measurements gives the better result in 7 cases. And this is so, in spite of the fact that in 5 of the samples there is a distinct temptation to discard a 'wild' measurement (as in Sample #3).
As Wallis and Roberts point out, the ultimate folly of rejecting an extreme observation was demonstrated when 'shortly after 7 o'clock on the morning of December 7,1941, the officer in charge of a Hawaiian radar station ignored data solely because it seemed so incredible'. For those of you too young to remember this incident, it refers to the surprise attack on Pearl Harbour by Japanese bombers, by which they declared war on the USA.
Apart from being used to describe things, numbers are often used for comparing things. Whenever this is done, care must be taken to ensure that the things being compared are genuinely fit to be compared. Darrell Huff (loc. cit.) gives a couple of good examples of illogical comparisons –
The death rate in the American Navy during the Spanish-American War was 9 per 1,000. For civilians in New York City during the same period it was 16 per 1,000. This suggests that it was safer to be in the Navy than out of it. But the groups are not really comparable. The Navy is made up mostly of young healthy men, whereas the civilian population includes infants, the old, and the ill, all of whom have a higher death rate wherever they are.
Hearing that it cost $8 a day to maintain each prisoner in Alcatraz, a U.S. senator exclaimed, 'It would be cheaper to board them in the Waldorf-Astoria!' Well, it wouldn't really, because it's not fair to compare the total maintenance cost per prisoner at Alcatraz with the rent of a hotel room; after all, guarding and feeding prisoners must cost something.
The trick in these examples is to compare 2 things which sound as if they are fit to be compared when, in fact, they are not. The very preciseness of the numbers themselves helps to carry the illusion. How about this newspaper report –
The figures just released by the National Safety Council show that the most reckless age for car drivers is 20 to 29 years. This age group accounted for 31·6% of the accidents on our roads last year, compared with 23·3% for the 30 to 39 years group, 16·2% for the 40 to 49 years group, 9·4 % for the 50 to 59 years group, 11·0% for the 60 and over group, and an exemplary 8·5% for the under 20 year-olds.
Looks like those kids were full of caution, while their older brothers were full of beer. But did anyone say there were equal numbers of drivers in each of these age groups? Because otherwise these figures may indicate nothing more than the relative number of drivers in each age group.
However, the usual cause for being misled by comparisons is either that the things being compared are biased samples, or that the effect of chance has not been properly assessed. Which brings us to the main subject matter of this book.CHAPTER 2
NATURE OF PROBABILITY
Absolute and Probable Truth
The only kind of conclusion which can be absolutely true is one which is implied by the premisses on which it rests. The conclusion in such a case is really contained within the meaning of the premisses. A simple example of such a deduction is that if A > B, and B > C, then A must be larger than C. (The sign '>' means 'is larger than'.)
A nice instance of absolute truth is the conclusion reached by M. Cohen and E. Nagel (in An Introduction to Logic, Routledge & Kegan Paul Ltd., 1963) that there are at least two persons in New York City who have the same number of hairs on their heads. This piece of absolute truth was not discovered by counting the hairs on the eight million inhabitants of that city, but by studies which revealed that (1) the maximum number of hairs on human scalps could never be as many as 5,000 per square centimetre, and (2) the maximum area of the human scalp could never reach 1,000 square centimetres; from these premisses one can correctly infer that no human being could ever have 5,000 x 1,000 = 5,000,000 hairs on his head. As this number is less than the population of New York City, it follows by implication that at least two New Yorkers must have the same number of scalp hairs.
Of course, you can always question the truth of the underlying premisses, although in the present instance your chances of finding them wrong would be comparable with the likelihood of finding a man 35 feet tall. At a practical level, this means no chance at all.
Nevertheless, in the vast majority of cases we must be satisfied with conclusions based on incomplete evidence. For example, atoms can't be seen, so our belief in the atomic structure of matter rests on indirect evidence. Yet this belief is almost certainly true, because a multitude of observations all point to the same conclusion, and a host of predictions based on the atomic hypothesis has also come true to act as additional confirmation. As time goes by and more evidence accumulates, the closer our conclusions will approach absolute truth. If contrary evidence crops up, our conclusions must be revised.
The degree of probability of a thing being true is the subject we shall now investigate. It is a fascinating study, and though its origins go back over 300 years, it is only now beginning to make its impact felt on our everyday lives. It goes by the fancy name of Statistical Inference, but we shall be avoiding technical names in the interests of simplicity and clarity. Of one thing you can be sure – you'll be hearing a lot more about this subject in the future, and the people who count will be ones with a working knowledge of it. H. G. Wells (d. 1946) foresaw this when he said –
Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.
Excerpted from PRACTICAL STATISTICS SIMPLY EXPLAINED by Russell Langley. Copyright © 1970 Dr. Russell Langley. Excerpted by permission of Dover Publications, Inc..
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
2. Nature of Probability,
4. Averages and Scatter,
5. Design of Investigations,
6. Significance Tests,
Tables of Squares and Square Roots,
Guide to Significance Tests,