What you will learn from reading How to Lie with Statistics:
– How sample bias impacts all statistical information.
– 5 critical thinking questions to ask yourself when seeing someone make claims or quote statistical data.
– Why you need to be careful with extrapolations as they assume current trends continuing.
How to Lie with Statistics Book Summary:
How to lie with statistics is an eye opening book. A must read for anyone who wants to think more clearly about arguments presented with statistics.
Lying with Statistics:
The fact is that, despite its mathematical base, statistics is as much an art as it is a science. A great many manipulations and even distortions are possible within the bounds of propriety.
A well-wrapped statistic is better than Hitler’s ‘big lie’; it misleads, yet it cannot be pinned on you.
The secret language of statistics, so appealing in a fact minded culture, is employed to sensationalise, inflate, confuse, and oversimplify.
Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, ‘opinion’ polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.
Sample-Bias:
If your sample is large enough and selected properly, it will represent the whole well enough for most purposes.
If, however, it fails in either respect it may be far less accurate than an intelligent guess and have nothing to recommend it except a spurious air of scientific precision.
It is sad truth that conclusions from such samples, biased by the method of selection, or too small, or both, lie behind much of what we read or think we know.
No conclusion that ‘sixty seven per cent of the British people are against’ something or other should be read without the lingering question. Sixty seven per cent of which British people?
For one thing, there are at least three levels of sampling involved in work like Kinsey’s. As already noted, the samples of the population (first level) are far from random and so may not be particularly representative of any population. It is equally important to remember that any questionnaire is only a sample (another level) of the possible questions; and that the answer the gentleman or lady gives is no more than a sample (third level) of his or her attitudes and experiences on each question.
Actually, it is not necessary that a poll be rigged – that is, that the results be deliberately twisted in order to create a false impression. The tendency of the sample to be biased in this consistent direction can rig it automatically.
Choosing the best ‘average’:
You can’t pin it on me either time. That is the essential beauty of doing your lying with statistics. Both those figures are legitimate averages, legally arrived at. Both represent the same data, the same people, the same incomes. All the same it is obvious that at least one of them must be so misleading as to rival an out-and-out lie.
Only when there is a substantial number of trials involved is the law of averages a useful description or prediction.
Causality:
But flaws in assumptions of causality are not always so easy to spot, especially when the relationship seems to make a lot of sense or when it pleases a popular prejudice. Pre-conceived notions also.
The point is that when there are many reasonable explanations you are hardly entitled to pick one the suits your taste and insist on it. But many people do.
To avoid falling for the post hoc fallacy and thus wind up believing many things that are not so, you need to put any statement of relationship through a sharp inspection.
Watch out for extrapolations:
Extrapolations are useful, particularly in that form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend-to-now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is ‘everything else being equal’ and ‘present trends continuing’. And somehow everything else refuses to remain equal, else life would be dull indeed.
Critical Thinking Questions:
Here are 5 questions to ask yourself when presented with a claim backed by statistics:
Who Says So?
About the first thing to look for is bias – the laboratory with something to prove for the sake of a theory, a reputation, or a fee; the newspaper whose aim is a good story; labour or management with a wage level at stake.
Look for conscious bias. The method may be direct misstatement or it may be ambiguous statement that serves as well and cannot be convicted. It may be selection of favourable data and suppression of unfavourable. Units of measurement may be shifted, as with the practice of using one year for one comparison and sliding over to a more favourable year for another. An improper measure may be used: a mean where a median would be more informative (perhaps all too informative), with the trickery covered by the unqualified word ‘average’.
Look sharply for unconscious bias. It is often more dangerous.
How Does He/She Know?
Watch out for evidence of a biased sample, one that has been selected improperly or- as with this one- has selected itself. Ask the question we dealt with in an early chapter: Is the sample large enough to permit any reliable conclusion?
Similarly with a reported correlation: Is it big enough to mean anything? Are there enough cases to add up to any significance?
What’s Missing?
You won’t always be told how many cases. The absence of such a figure, particularly when the source is an interested one, is enough to throw suspicion on the whole thing. Similarly a correlation given without a measure of reliability (probable error, standard error) is not to be taken very seriously.
Watch out for an average, variety unspecified, in any matter where mean and median might be expected to differ substantially.
A report of a great increase in deaths from cancer in the last quarter-century is misleading unless you know how much of it is a product of such extraneous factors as these:
Cancer is often listed now where ’causes unknown’ was formerly used; autopsies are more frequent, giving surer diagnoses; reporting and compiling of medical statistics are more complete; and people more frequently reach the most susceptible ages now.
Did Somebody Change the Subject?
When assaying a statistic, watch out for a switch somewhere between the raw figure and the conclusion. One thing is all too often reported as another.
Example:
The ‘population’ of a large area in China was 28 million. Five years later it was 105 million. Very little of that increase was real; the great difference could be explained only by taking into account the purposes of the two enumerations and the way people would be inclined to feel about being counted in each instance. The first census was for tax and military purposes, the second for famine relief.
The post hoc variety of pretentious nonsense is another way of changing the subject without seeming to. The change of something with something else is presented as because of.
Does It Make Sense?
‘Does it make sense?’ will often cut a statistic down to size when the whole rigmarole is based on an unproved assumption.
Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense.