What you will learn from reading The Tyranny of Metrics:
– How and why measurement can become excessive measurement.
– The 6 sources of metric dysfunction.
– A 10 part checklist for understanding when and how to use metrics.
The Tyranny of Metrics Book Summary
The tyranny of metrics is a really interesting book, it is a book that will open you eyes up to the use and abuse of numbers in our modern society. It will make you realise just because something can be measured it doesn’t make it important as well as how metrics create incentive systems that may have run away side effects.
Jerry Z. Muller “The title, The Tyranny of Metrics is not meant to convey the message that metrics are intrinsically tyrannical, but rather that they are frequently used in ways that are dysfunctional and oppressive.”
There is a cultural pattern that has become ubiquitous in recent decades, engulfing an ever-widening range of institutions. Depending on taste, one could call it a cultural “meme” an “épistème,” a “discourse,” a “paradigm, a “self-reinforcing rhetorical system, or simply a fashion. It comes with its own vocabulary and master terms. It affects the way in which people talk about the world, and thus how they think about the world and how they act in it. For convenience, let’s call it metric fixation.
A key premise of metric fixation concerns the relationship between measurement and improvement. There is a dictum (wrongly) attributed to the great nineteenth century physicist Lord Kelvin: “If you cannot measure it, you cannot improve it” In 1986 the American management guru, Tom Peters, embraced the motto, “What gets measured gets done, which became a cornerstone belief of metrics. In time, some drew the conclusion that “anything that can be measured can be improved.”
The key components of metric fixation are
- the belief that it is possible and desirable to replace judgment, acquired by personal experience and talent, with numerical indicators of comparative performance based upon standardised data (metrics);
- the belief that making such metrics public (transparent) assures that institutions are actually carrying out their purposes (accountability);
- the belief that the best way to motivate people within these organisations is by attaching rewards and penalties to their measured performance, rewards that are either monetary (pay-for-performance) or reputational (rankings).
The war against judgement:
When numbers, standardised measurement of performance, and big data are seen as the wave of the future, professional judgment based upon experience and talent are seen as retrograde, almost anachronistic. Human judgment based on talent and experience-has become unfashionable.
The problem with measurement:
While we are bound to live in an age of measurement, we live in an age of mis-measurement, over-measurement, misleading measurement, and counterproductive measurement. This book is not about the evils of measuring. It is about the unintended negative consequences of trying to substitute standardised measures of performance for personal judgment based on experience. The problem is not measurement, but excessive measurement and inappropriate measurement-not metrics, but metric fixation.
There are things that can be measured. There are things that are worth measuring. But what can be measured is not always what is worth measuring; what gets measured may have no relationship to what we really want to know.
The costs of measuring may be greater than the benefits. The things that get measured may draw effort away from the things we really care about. And measurement may provide us with distorted knowledge-knowledge that seems solid but is actually deceptive.
When measurement works:
Used judiciously, then, measurement of the previously unmeasured can provide real benefits. The attempt to measure performance while pocked with pitfalls, as we will see-is intrinsically desirable.
If what is actually measured is a reasonable proxy for what is intended to be measured, and if it is combined with judgment, then measurement can help practitioners to assess their own performance, both for individuals and for organisations. But problems arise when such measures become the criteria used to reward and punish-when metrics become the basis of pay-for-performance or ratings.
Issues with accountability:
Accountability ought to mean being held responsible for one’s actions. But by a sort of linguistic sleight of hand, accountability has come to mean demonstrating success through standardised measurement, as if only that which can be counted really counts. Another assumption that is often taken for granted is that “accountability” demands that measurement of performance be made public, that is, “transparent.
When proponents of metrics advocate “accountability,” they tacitly combine two meanings of the word. On the one hand, to be accountable means to be responsible. But it can also mean “capable of being counted” Advocates of “accountability” typically assume that only by counting can institutions be truly responsible. Performance is therefore equated with what can be reduced to standardised measurements.
How metrics lead organisations astray:
Over-measurement is a form of overregulation, just as mismeasurement is a form of misregulation.
Metric fixation leads to a diversion of resources away from frontline producers toward managers, administrators, and those who gather and manipulate data. It’s one of the big reasons why so many contemporary organisations function less well than they ought to, diminishing productivity while frustrating those who work in them.
Most organisations have multiple purposes, and that which is measured and rewarded tends to become the focus of attention, at the expense of other essential goals. Similarly, many jobs have multiple facets, and measuring only a few aspects creates incentives to neglect the rest.
In the process, the nature of work is transformed in ways that are often pernicious. Professionals tend to resent the impositions of goals that may conflict with their vocational ethos and judgment, and thus morale is lowered. Almost inevitably, many people become adept at manipulating performance indicators through a variety of methods, many of which are ultimately dysfunctional for their organisations. They fudge the data or deal only with cases that will improve performance indicators. They fail to report negative instances. In extreme cases, they fabricate the evidence.
Whenever reward is tied to measured performance, metric fixation invites gaming.
Because the theory of motivation behind pay for measured performance is stunted, results are often at odds with expectations. The typical pattern of dysfunction was formulated in 1975 by two social scientists operating on opposite sides of the Atlantic, in what appears to have been a case of independent discovery.
What has come to be called “Campbell’s Law, named for the American social psychologist Donald T. Campbell, holds that “the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor”
6 Sources of Metric Dysfunction:
Measuring the most easily measurable.
There is a natural human tendency to try to simplify problems by focusing on the most easily measurable elements. But what is most easily measured is rarely what is most important, indeed sometimes not important at all. That is the first source of metric dysfunction.
Closely related is measuring the simple when the desired outcome is complex. Most jobs have multiple responsibilities and most organisations have multiple goals. Focusing measurement on just one responsibility or goal often leads to deceptive results.
Measuring inputs rather than outcomes.
It is often easier to measure the amount spent or the resources injected into a project than the results of the efforts. So organisations measure what they’ve spent, rather than what they produce, or they measure process rather than product.
Degrading information quality through standardisation.
Quantification is seductive, because it organises and simplifies knowledge. It offers numerical information that allows for easy comparison among people and institutions. But that simplification may lead to distortion, since making things comparable often means that they are stripped of their context, history, and meaning. The result is that the information appears more certain and authoritative than is actually the case: the caveats, the ambiguities, and uncertainties are peeled away, and nothing does more to create the appearance of certain knowledge than expressing it in numerical form.
Gaming through creaming.
This takes place when practitioners find simpler targets or prefer clients with less challenging circumstances, making it easier to reach the metric goal, but excluding cases where success is more difficult to achieve.
Improving numbers by lowering standards. One way of improving metric scores is by lowering the criteria for scoring. Thus, for example, graduation rates of high schools and colleges can be increased by lowering the standards for passing. Or airlines improve their on-time performance by increasing the scheduled flying time of their flights.
Improving numbers through omission or distortion of data.
This strategy involves leaving out inconvenient in stances, or classifying cases in a way that makes them disappear from the metrics. Police forces can “reduce” crime rates by booking felonies as misdemeanors, or by deciding not to book reported crimes at all.
One step beyond gaming the metrics is cheating a phenomenon whose frequency tends to increase directly with the stakes of the metric in question. As we’ll see, as the No Child Left Behind Act raised the stakes for schools of the test scores of their pupils, teachers and principals in many cities responded by altering students’ answers on the test.
The Changing meaning of Expertise:
This conception of education as machinery, tailored to the measurable production of reading, writing, and computation, and capable of being rewarded based on measurable output, ebbed and flowed in the decades that followed, reaching a flood tide at the end of the twentieth century. At each subsequent wave people pointed to the unmeasured costs of tying reward to standardised measurement.
The core of managerial expertise was now defined as a distinct set of skills and techniques, focused upon a mastery of quantitative methodologies.” Decisions based on numbers were viewed as scientific, since numbers were thought to imply objectivity and accuracy.
Before that, “expertise” meant the career-long accumulation of knowledge of a specific field, as one progressed from rung to rung within the same institution or business-accumulating what economists call “task-specific know-how.” Auto executives were “car guys”-men who had spent much of their professional life in the automotive industry. They were increasingly replaced by McNamara-like “bean counters,” adept at calculating costs and profit margins.
The decline of intuitive judgement:
The role of judgment grounded in experience and a deep knowledge of context was downplayed. The premise of managerialism is that the differences among organisations including private corporations, government agencies, and universities-are less important than the similarities. Thus the performance of all organisations can be optimised using the same toolkit of managerial techniques and skills. We might think of judgment and expertise based upon experience as the lubricant that makes organisations flourish by providing task-specific know-how. Managerialism under the spell of metrics tends to ignore, if it does not actually disdain, all that.
Metrics and new intellectual disciplines:
The new bread of the ‘systems analysts’ introduced new Standards of intellectual discipline and greatly improved bookkeeping methods, but also a trained incapacity to understand the most important aspects of military power, which happen to be nonmeasurable.” The various armed forces sought to maximise measurable “production”: the air force through the number of bombing sorties; artillery through the number of shells fired; infantry through body counts, reflecting statistical indices devised by McNamara and his associates in the Pentagon. But, as Luttwak writes, “In frontless war where there are no clear lines on the map to show victory and defeat, the only true measure of progress must be political and non-quantifiable: the impact on the enemy’s will to continue to fight.”
Military officers were themselves increasingly seeking a managerial outlook, pursuing degrees in business administration, management, or economics. That led to what Luttwak called a “materialist bias, aimed at measuring inputs and tangible outputs (such as firepower), rather than intangible human factors, such as strategy, leadership, group cohesion, and the morale of servicemen. What could be precisely measured tended to overshadow what was really important.” While the material inputs are all hard facts, costs precisely stated in dollars and cents, the intangibles are difficult even to define and mostly cannot be measured at all, he noted.
One vector of the metric fixation was the rise of management consultants, outfitted with the managerial skills of quantitative analysis, whose first maxim was “If you can’t measure it, you can’t manage it?” Reliance on numbers and quantitative manipulation not only gave the impression of scientific expertise based on “hard” evidence, it also minimised the need for specific, intimate knowledge of the institutions to whom advice was being sold. The culture of management demanded more data-standardised, numerical data.
The need for objective criteria:
In meritocratic societies with more open and changing elites, those who reach positions of authority are less likely to feel secure in their judgments, and more likely to seek seemingly objective criteria by which to make decisions. And numbers convey the air of objectivity; they imply the exclusion of subjective judgment. Numbers are regarded as “hard,” and thus a safer bet for those disposed to doubt their own judgments.
Numerical metrics also give the appearance (if one does not analyse their genesis and relevance too closely) of transparency and objectivity. A good part of their attractiveness is that they appear to be readily understood by all.
Those at the top face to a greater degree than most of us a cognitive constraint that confronts all of us: making decisions despite having limited time and ability to deal with information overload. Metrics are a tempting means of dealing with this “bounded rationality, and engaging with matters beyond one’s comprehension.
How decaying social trust leads to more metrics:
The demand for greater “accountability, which we saw reflected in the Google Ngram, fed upon the growing distrust of institutions and resentment of authority based on expertise that marked the United States (and to a considerable degree, other Western societies) from the 1960s onward. “Every profession is a conspiracy against the laity, wrote George Bernard Shaw in his play, The Doctor’s Dilemma.
In a vicious circle, a lack of social trust leads to the apotheosis of metrics, and faith in metrics contributes to a declining reliance upon judgment. In a series of books, Philip K. Howard has argued that the decline of trust leads to a new mindset in which “avoiding human choice in public decisions is not just a theory… but a kind of theology…. Human choice is considered too dangerous.” As a consequence,”Officials no longer are allowed to act on their best judgment” or to exercise discretion, which is judgment about what the particular situation requires.
Measurement allows initial improvement:
In one field after another, the introduction of greater measurement in the name of accountability did shine light upon real problems, including variations in professional practice that were supposedly grounded in “science, and gaps in performance that had previously gone unnoticed or undocumented.
The impact of these revelations both diminished faith in professional judgment and created pressure to find solutions, solutions thought to entail greater measurement in order to monitor the professionals whose ethos had been cast into doubt.
How changing people of power increase the dependance of metrics:
CEOs, university presidents, and heads of government agencies move from one organisation to another to a greater degree now than in the past. A strange, egalitarian alchemy often assumes that there must be someone better to be found outside the organisation than within it: that no one within the organisation is good enough to ascend, but unknown people from other places might be. That assumption leads to a turnover of top leaders, executives, and managers, who arrive at their new posts with limited substantive knowledge of the institutions they are to manage. Hence their greater reliance on metrics, and preferably metrics that are similar from one organisation to another (aka “best practices”). These outsiders turned-insiders, lacking the deep knowledge of context that comes from experience, are more dependent on standardised forms of measurement
Multiple goals don’t yield to easy measurement:
In government and nonprofit organisations there are rarely single goals, and they cannot be readily measured.
Primary schools, for example, have their tasks of teaching reading, writing, and numeracy, and these perhaps could be monitored through standardised tests. But what about goals that are less measureable but no less important, such as instilling good behaviour, inspiring a curiosity about the world, and fostering creative thought?
Pay for performance:
Many of the problems of pay-for-performance schemes can be traced to an overly simple, indeed deeply distortive, conception of human motivation, one that assumes that people are motivated to work only by material rewards.
For some are motivated less by extrinsic monetary rewards than by various sorts of intrinsic psychic rewards, including their commitment to the goals of the organisations for which they work, or a fascination with the complexity of the work they do, which makes it challenging, interesting, and entertaining. The existence of intrinsic as well as extrinsic motivations is obvious to anyone who has managed workers in complex tasks
There are indeed circumstances when pay for measured performance fulfills it’s promise: when the work to be done is repetitive, uncreative, and involves the production or sale of standardised commodities or services; when there is little possibility of exercising choice over what one does; when there is little intrinsic satisfaction in it; when performance is based almost exclusively on individual effort, rather than that of a team; and when aiding, encouraging, and mentoring others is not an important part of the job.
Technical Knowledge vs Real Knowledge:
The rationalist believes in the sovereignty of technique in which the only form of authentic knowledge is technical knowledge, for it alone satisfies the standard of certainty that marks real knowledge. The error of rationalism, for Oakeshott, is its failure to appreciate the necessity of practical knowledge and of knowledge of the peculiarity of circumstances.
Friedrich Hayek developed a related critique of what he called “the pretense of knowledge? Writing in the mid-twentieth century, he chastised socialist attempts at large-scale economic planning for their “scientism, by which he meant their attempt to engineer economic life, as if planners were in a position to know all the relevant inputs and outputs that make up life in a complex society. The advantage of the competitive market, he maintained, is that it allows individuals not only to make use of their knowledge of local conditions, but to discover new uses for existing resources or imagine new products and services hitherto unknown and unsuspected. In short, planning failed not only to consider relevant but dispersed information, but it also prohibited the entrepreneurial discovery of how to meet particular needs and how to generate new goals.
One could draw together the insights of a number of thinkers into this dictum: The calculative is the enemy of the imaginative. Entrepreneurship, depends on taking what the economist Frank Knight termed “unmeasureable risk, for the potential benefits of an innovation are not subject to precise calculation. Or in the formulation of Alfie Kohn, a longtime critic of pay-for-performance, metrics “inhibits risk taking, an inevitable concomitant of exploration and creativity. We are less likely to take chances, to play with possibilities, and to follow hunches, which may, after all, not pay off.”
“To demand or preach mechanical precision, even in principle, in a field incapable of it is to be blind and to mislead others, as the British liberal philosopher Isaiah Berlin noted in an essay on political judgment. Indeed what Berlin says of political judgment applies more broadly: judgment is a sort of skill at grasping the unique particularities of a situation, and it entails a talent for synthesis rather than analysis, “a capacity for taking in the total pattern of a human situation, of the way in which things hang together” A feel for the whole and a sense for the unique are precisely what numerical metrics cannot supply.
The problem with ‘more’ data:
Growing opportunities to collect data, and the declining cost of doing so, contribute to the meme that data is the answer, for which organisations have to come up with the questions. There is an often unexamined faith that amassing data and sharing it widely within the organisation will result in improvements of some sort-even if much information has to be denuded of nuance and context to turn it into easily transferred “data”
In search for more data means more data managers, more bureaucracy, more expensive software systems. Ironically, in the name of controlling costs, expenditures wax.
The effect is to increase costs or to divert spending from the doers to the administrators-which usually suits the latter just fine. It is hard to find a university where the ratio of administrators to professors and of administrators to students has not risen astronomically in recent decades. And the same holds true on the national level.
How Metrics have effected Academia:
In academia as elsewhere, that which gets measured gets gamed.
Rankings (or “league tables” as they are known in Britain) are an important source of university prestige: alumni and members of the board of trustees are anxious to have their institutions rate highly, as are potential donors and, of course, potential students. Maintaining or improving the institution’s rankings tends to become a priority for university presidents and their top administrators. Indeed, some American university presidents are awarded contracts that specify a bonus if they are able to raise the school’s rank.
Rankings create incentives for universities to become more like what the rankings measure. What gets measured is what gets attention. That leads to homogenisation as they abandon their distinctive missions and become more like their competitors.
So are other top administrators: since one factor that affects rankings is the achievement scores of incoming students, the dean of admissions of at least one law school was remunerated based in part on the scores of the admitted students.
In addition to expenditures that do nothing to raise the quality of teaching or research, the growing salience of rankings has led to ever new varieties of gaming through creaming and improving numbers through omission or distortion of data. A recent scholarly investigation of American law schools provides some examples. Law schools are ranked by USNWR based in part on the LSAT scores and GPAs of their admitted, full-time students. To improve the statistics, students with lower scores are accepted on a “part-time” or “probationary”
basis, so that their scores are not included. Since the scores of transfer students are not counted, many law school admissions offices solicit students from slightly lower ranked schools to transfer in after their first year.
Measuring academic productivity
In the attempt to replace judgments of quality with standardised measurement, some rankings organisations, government institutions, and university administrators have adopted as a standard the number of scholarly publications produced by a college or university’s faculty, and determined the number of these publications using commercial databases that aggregate such information.
When individual faculty members, or whole departments, are judged by the number of publications, whether in the form of articles or books, the incentive is to produce more publications, rather than better ones. Really important books may take many years to research and write. But if the incentive system rewards speed and volume of output, the result is likely to be a decline in truly significant works.
The importance of research
Take the practice of “impact factor measurement.” Once it was recognised that not all published articles were of equal significance, techniques were developed to try to measure each article’s impact. This took two forms: counting the number of times the article was cited, either on Google Scholar or on commercial databases; and considering the “impact factor” of the journal in which it was published, a factor determined in turn by the frequency with which articles in the journal were cited in the databases. (Of course, this method cannot distinguish between the following citations: “Jerry Z. Muller’s illuminating and wide-ranging book on the tyranny of metrics effectively slaughters the sacred cows of so many organisations” and “Jerry Z. Muller’s poorly conceived screed deserves to be ignored by all managers and social scientists” From the point of view of tabulated impact, the two statements are equivalent.)
“All too often, ranking systems are used as a cheap and ineffective method of assessing the productivity of individual scientists. Not only does this practice lead to inaccurate assessment, it lures scientists into pursuing high rankings first and good science second. There is a better way to evaluate the importance of a paper or the research output of an individual scholar: read it”
Increased value of administrators
What the advocates of greater government accountability metrics overlook is that the very real problem of the increasing costs of college and university education is due in part to the expanding cadres of administrators, many of whom are required in order to comply with government mandates.
One predictable effect of the new plan would have been to raise the costs of administration, both by diverting ever more faculty time from teaching and research into filling out forms to accumulate data, and by increasing the number of administrators to gather the forms, analsze the data, and hence supply the raw material for the government’s metrics.
The goal of increasing college graduation rates, for example, is at odds with increasing access, since less advantaged students tend to be not only financially poorer but also worse prepared.
The better prepared the students, the more likely they are to graduate on time. Thus community colleges and other institutions that provide greater access to the less prepared would have been penalised for their low graduation rates. They could, of course, have attempted to game the numbers in two ways. They could raise the standards for incoming students, increasing their likelihood of graduating-but at the price of access. Or they could respond by lowering the standards for graduation-at the price of educational quality and the market value of a degree.
Focus on high paying graduate jobs
The College Scoreboard treats college education in purely economic terms: its sole concern is return on investment, understood as the relationship between the monetary costs of college and the increase in earnings that a degree will ultimately provide.
Those are, of course, legitimate considerations: college costs eat up an increasing percentage of familial income or entail the student taking on debt; and making a living is among the most important tasks in life. But it is not the only task in life, and it is an impoverished conception of college education that regards it purely in terms of its ability to enhance earnings. Yet that is the ideal of education that the College Scorecard embodies and encourages, as do similar metrics. If we distinguish training, which is oriented to production and survival, from education, which is oriented to making survival meaningful, then the College Scorecard is only about the former.
The hazard of metrics so purely focused on monetary return on investment is that like so many metrics, they influence behaviour. Already, universities at the very top of the rankings send a huge portion of their graduates into investment banking, consulting, and high-end law firms-all highly lucrative pursuits.
These are honourable professions, but is it really in the best interests of the nation to encourage the best and the brightest to choose these careers? One predictable effect of the weight attributed to future income in college rankings will be to incentivise institutions to channel their students into the most high-paying fields. Those whose graduates go on to careers in less remunerative fields, such as teaching or public service, will be penalised.
A capitalist society depends for its flourishing on a variety of institutions that provide a counterweight to the market, with its focus on monetary gain. To prepare pupils and university students for their roles as citizens, as friends, as spouses, and above all to equip them for a life of intellectual richness-those are among the proper roles of college. Conveying marketable skills is a proper role as well. But to subordinate higher education entirely to the capacity for future earnings is to measure with a very crooked yardstick.
How Metrics have effected Schooling:
Test result pressures
Under NCLB, scores on standardised tests are the numerical metric by which success and failure are judged. And the stakes are high for teachers and principals, whose raises in salary and whose very jobs sometimes depend on this performance indicator.
It is the emphasis placed on these tests as the major criterion for evaluating schools that creates perverse incentives, including focusing on the tests themselves at the expense of the broader goals of the institution.
High-stakes testing leads to other dysfunctions as well, such as creaming: studies of schools in Texas and in Florida showed that average achievement levels were increased by reclassifying weaker students as disabled, thus removing them from the assessment pool.” Or out and out cheating, as teachers alter student answers, or toss out tests by students likely to be low scorers, these phenomena are well documented in Atlanta, Chicago, Cleveland, Dallas, Houston, Washington, D.C., and other cities. Or mayors and governors moving the goalposts by diminishing the difficulty of tests or lowering the grades required to pass them, in order to raise the pass rate and thus demonstrate the success of their educational reforms.
Metrics often miss the behavioural goals of schooling
Of course, the scores on English and math achievement tests cannot measure the full benefits of K-12 education. That is not because the NAEP scores are distorted or insignificant. They do provide a useful measure of student knowledge of the subjects tested. But there is much more to school than the learning of English and mathematics: not only other academic subjects but also the stimulation of interest in the world, and the cultivation of habits of behaviour (self-control, perseverance, ability to cooperate with others) that increase the likelihood of success in the adult world. Development of these non-cognitive qualities may well be going on in classrooms and schools without being reflected in performance metrics based on test scores.
How Metrics have effected medicine:
Hospital readmissions have indeed declined, a much touted success for performance metrics. But how much of that success is real? The falling rate of reported readmissions was due in part to gaming the system: instead of formally admitting returning patients, hospitals placed them on “observation status,” under which the patient stays in the hospital for a period of time (up to several days), and is billed for outpatient services rather than an inpatient “admission
This internal use of metrics of performance is of great value in helping hospitals and other medical institutions to enhance the safety and efficacy of their medical care. But metrics tend to be most successful for those interventions and outcomes that are almost entirely controlled by and within the organisation’s medical system, as in the case of checklists of procedures to minimise central line-induced infections.
When the outcomes are dependent upon more wide-ranging factors (such as patient behaviour outside the doctor’s office and the hospital), they become more difficult to attribute to the efforts or failures of the medical system.
How Metrics have effected Policing:
When the public and its politicians think of public safety, they think of the police, who are held responsible for the level of crime. However, like health and its relationship to the medical system, or education and its relationship to the school system, public safety is only partially dependent on the effectiveness of the police. It depends in part on other elements of the justice system: on the public prosecutors, the judiciary, and the penal and parole systems. It depends in good part on the propensity of the local population to engage in criminal activity, and that in turn depends on broader economic, ethnic, and cultural factors. And public safety also depends on the ease of committing crime.
Some of the decline in crime in recent decades is a product of private actions by property owners. The opportunity for car theft, burglary, and other crimes has been radically reduced by defensive measures undertaken by millions of private individuals, whose acquisition of improved car alarms and home alarms has made these crimes more difficult.
When the crime rate goes down, elected officials tout their success. When the crime index goes up, the politicians are criticised by their rivals. The politicians, in turn, put pressure on their police chiefs to reduce the crime rate, who in turn put pressure on those below them in the police hierarchy.
All of this creates tremendous temptations to demonstrate progress in reducing crime by massaging the figures. As one Chicago detective explained, Such problems prece ing “It’s so easy.” First, the responding officer can intentionally misclassify a case or alter the narrative to record a lesser charge. A house break-in becomes “trespassing”; a garage break-in becomes “criminal damage to property”; a theft becomes “lost property.”*
How Metrics have effected Wars:
Each war has it’s own local problems
Kilcullen emphasises, metrics must be adapted to the particularities of the case: standardised metrics drawn from past wars in other venues will simply not work. Not only that, but use of the best performance metrics demands judgment based upon experience: Interpretation of indicators is critically important, and requires informed expert judgment. It is not enough merely to count incidents or conduct quantitative or statistical analysis-interpretation is a qualitative activity based on familiarity with the environment, and it needs to be conducted by experienced personnel who have worked in that environment for long enough to detect trends by comparison with previous conditions. These trends may not be obvious to personnel who are on short duration tours in country, for example.
Developing valid metrics of success and failure requires a good deal of local knowledge, knowledge that may be of no use in other circumstances to the chagrin of those who look for universal templates and formulae. The hard part is knowing what to count, and what the numbers you have counted actually mean in context.
Inputs vs outputs
Kicullen also warns against the use of all “input metrics,” that is, metrics that count what the army and its allies are doing, for these may be quite distinct from the outcomes of those actions.
Business and Finance:
People do want to be rewarded for their performance, both in terms of recognition and remuneration. But there is a difference between promotions (and raises) based on a range of qualities, and direct remuneration based on measured quantities of output. For most workers, contributions to their company include many activities that are intangible but no less real: coming up with new ideas and better ways to do things, exchanging ideas and resources with colleagues, engaging in teamwork, mentoring subordinates, relating to suppliers or customers, and more. It’s appropriate to reward such activities through promotions and bonuses-even if it is more difficult to document and requires a greater degree of judgment by those who decide on the rewards.
The cases of Mylan and Wells Fargo are recent examples of an older and common pattern, by which policies of payment for measured performance lead employees to engage in actions that create long-run damage to a firm’s reputation.
There is nothing wrong with rating people on a scale. The problems arise when the scale is too one-dimensional, measuring only a few outputs that are most easily measured because they can be standardised.
Innovation and risk taking
The attempt to substitute precise measurement for informed judgment also limits innovation, which necessarily entails guesswork and risk.
Performance metrics as a measure of accountability help to allocate blame when things go badly, but do little to encourage success, especially when success requires imagination, innovation, and risk. Indeed, as the economist Frank Knight noted almost a century ago, entrepreneurship entails “immeasurable uncertainty, which is not susceptible to metric calculation.
Performance indicators can certainly aid, but not replace, the key functions of management: thinking ahead, judging, and deciding.
Why transparency is an issue:
It is characteristic of our culture that we tend to assume that performance and transparency rise and fall together. But that is a fallacy, or at least a misleading generalisation. For just as there are limits to the efficacy of measured performance, there are limits to the efficacy of transparency.
In interpersonal relations, even the most intimate ones, success depends on a degree of ambiguity and opacity, on not knowing everything that the other is doing, never mind thinking.
Our very sense of self is possible only because our thoughts and desires are not transparent to others. The possibility of intimacy depends on our ability to make ourselves more transparent to some people than to others.
As Tom Daschle, the Democratic former majority leader of the Senate, has recently observed, the “idea that Washington would work better if there were TV cameras monitoring every conversation gets it exactly wrong…. The lack of opportunities for honest dialogue and creative give and-take lies at the root of today’s dysfunction. That is also why effective politicians must to some degree be two-faced, pursuing more flexibility in closed negotiations than in their public advocacy. Only when multiple compromises have been made and a deal has been reached can it be subjected to public scrutiny, that is, made transparent.
A thriving polity, like a healthy marriage, relegates some matters to the shadows. In international relations, as in interpersonal ones, many practices are functional so long as they remain ambiguous and opaque. Clarity and publicity kill. The ability to negotiate between couples or states often involves coming up with formulas that allow each side to save face or retain self-esteem, and that requires compromising principles, or ambiguity.
The fact that allies spy on one another to a certain degree to determine intentions, capacities, and vulnerabilities is well known to practitioners of government. But it cannot be publicly acknowledged, since it represents a threat to the amour propre of other nations. Moreover, in domestic politics and in international relations as in interpersonal ones, there is a role for a certain amount of hypocrisy for practices that are tolerable and useful but that can’t be fully justified by international law and explicit norms.
To quote Moshe Halbertal: A degree of legitimate concealment is necessary to maintain the state and its democratic institutions. Military secrets, techniques for fighting crime, intelligence gathering, and even diplomatic negotiations that will fall apart if they become exposed-all these domains have to stay shrouded in secrecy in order to allow the functioning of ordinary transparency in the other institutions of the state. Our transparent open conversation rests upon a rather extensive dark and hidden domain that insures its flourishing.
In such a post-privacy society, people are inclined to overlook the value of secrecy. Thus, the power of “transparency” as a magic formula is such that its counterproductive effects are often ignored. “Sunlight is the best disinfectant” has become the credo of the new faith of Wikileakism: the belief that making public the internal deliberations of all organisations and governments will make the world a better place. But more often, the result is paralysis.
What are recurring issues with metrics:
Before we turn to the proper use of measured performance, let us gather together some lessons from our case studies about the recurrent perils of metrics.
Goal displacement through diversion of effort to what gets measured.
Goal displacement comes in many varieties. When performance is judged by a few measures, and the stakes are high (keeping one’s job, getting a raise, raising the stock price at the time that stock options are vested), people will focus on satisfying those measures-often at the expense of other, more important organisational goals that are not measured.
Measured performance encourages what Robert K. Merton called “the imperious immediacy of interests… where the actor’s paramount concern with the foreseen immediate consequences excludes consideration of further or other consequences”
Costs in employee time.
To the debit side of the ledger must also be added the transactional costs of metrics: the expenditure of employee time by those tasked with compiling and processing the metrics-not to speak of the time required to actually read them. That is exacerbated by the “reporting imperative” the perceived need to constantly generate information, even when nothing significant is going on. Sometimes the metric of success is the number and size of the reports generated, as if nothing is accomplished unless it is extensively documented.
Sometimes, newly introduced performance metrics will have immediate benefits in discovering poorly performing outliers. Having gleaned the low-hanging fruit, there is tendency to expect a continuingly bountiful harvest. The problem is that the metrics continue to get collected from everyone. And soon the marginal costs of assembling and analysing the metrics exceed the marginal benefits.
In an attempt to staunch the flow of faulty metrics through gaming, cheating, and goal diversion, organisations institute a cascade of rules. Complying with them further slows down the institution’s functioning and diminishes its efficiency.
Rewarding luck.Measuring outcomes when the people involved have little control over the results is tantamount to rewarding luck.
It means that people are rewarded or penalised for outcomes that are actually independent of their efforts. Those penalised rightly feel that they’ve been treated unfairly.
Attempts to measure productivity through performance metrics have other, more subtle effects: they not only promote short-termism, as noted earlier, but also discourage initiative and risk-taking.
When people are judged by performance metrics, they are incentivised to do what the metrics measure, and what the metrics measure will be some established goal. But that impedes innovation, which means doing something that is not yet established, indeed hasn’t been tried out. Innovation involves experimentation.
Discouraging cooperation and common purpose.
Rewarding individuals for measured performance diminishes the sense of common purpose as well as the social relationships that provide the un-measureable motivation for cooperation and institutional effectiveness. Reward based on measured performance tends to promote not cooperation but competition
Degradation of work.
Compelling the people in an organisation to focus their efforts on the narrow range of what gets measured leads to a degradation of the experience of work.
Costs to productivity.
Economists who specialise in measuring economic productivity report that in recent years the only increase in total factor productivity in the American economy has been in the information technology-producing industries.” A question that ought to be asked is to what extent the culture of metrics-with its costs in employee time, morale, and initiative, and its promotion of short termism has itself contributed to economic stagnation?
Metrics aren’t all bad:
There is nothing intrinsically pernicious about counting and measuring human performance. We all tend to project broadranging conclusions based on our inevitably limited experience, and measured data can serve as a useful counterpoint to those subjective judgments.
The challenge in such cases is to abandon universal templates and discover what is worth counting, and what the numbers actually mean in their local context.
As we’ve seen time and again, measurement is not an alternative to judgment: measurement demands judgment: judgment about whether to measure, what to measure, how to evaluate the significance of what’s been measured, whether rewards and penalties will be attached to the results, and to whom to make the measurements available.
THE METRIC CHECKLIST
1. What kind of information are you thinking of measuring?
The more the object to be measured resembles inanimate matter, the more likely it is to be measureable: that is why measurement is indispensable in the natural sciences and in engineering. When the objects to be measured are influenced by the process of measurement, measurement becomes less reliable. Measurement becomes much less reliable the more its object is human activity, since the objects-people-are self-conscious, and are capable of reacting to the process of being measured. And if rewards and punishments are involved, they are more likely to react in a way that skews the measurement’s validity. By contrast, the more they agree with the goals of those rewards, the more likely they are to react in a way that enhances the measurement’s validity
2. How useful is the information?
Always begin by reminding yourself that the fact that some activity is measureable does not make it worth measuring, indeed, the ease of measuring may be inversely proportional to the significance of what is measured.
To put it another way, ask your self, is what you are measuring a proxy for what you really want to know?
3. How useful are more metrics?
Remember that measured performance, when useful, is more effective in identifying outliers, especially poor performers or true misconduct
4. What are the costs of not relying upon standardised measurement?
Are there other sources of information about performance, based on the judgment and experience of clients, patients, or parents of students?
5. To what purposes will the measurement be put, or to put it another way, to whom will the information be made transparent?
Here a key distinction is between data to be used for purposes of internal monitoring of performance by the practitioners themselves versus data to be used by external parties for reward and punishment
Measurement instruments, such as tests, are invaluable, but they are most useful for internal analysis by practitioners rather than for external evaluation by public audiences who may fail to understand their limits.
Such measurement can be used to inform practitioners of their performance relative to their peers, offering recognition to those who have excelled and offering assistance to those who have fallen behind. To the extent that they are used to determine continuing employment and pay, they will be subject to gaming the statistics or to outright fraud
6. What are the costs of acquiring the metrics?
Information is never free, and often it is expensive in ways that rarely occur to those who demand more of it. Collecting data, processing it, analysing it all of these take time, and their expense is in the opportunity costs of the time put into them. To put it another way, every moment you or your colleagues or employees are devoting to the production of metrics is time not devoted to the activities being measured.
7. Ask why the people at the top of the organisation are demanding performance metrics.
As we’ve noted, the demand for performance measures sometimes flows from the ignorance of executives about the institutions they’ve been hired to manage, and that ignorance is often a result of parachuting into an organisation with which one has little experience.
8. How and by whom are the measures of performance developed?
Accountability metrics are less likely to be effective when they are imposed from above, using standardised formulas developed by those far from active engagement with the activity being measured. Measurements are more likely to be meaningful when they are developed from the bottom up, with input from teachers, nurses, and the cop on the beat. That means asking those with the tacit knowledge that comes from direct experience to provide suggestions about how to develop appropriate performance standards.
Metrics works best when those measured buy into its purposes and validity.
9. Remember that even the best measures are subject to corruption or goal diversion.
Insofar as individuals are agents out to maximise their own interests, there are inevitable drawbacks to all schemes of measured reward.
That doesn’t mean that performance measures should be abandoned just because they have some negative outcomes. Such metrics may still be worth using, despite their anticipatable problems: it’s a matter of trade-offs. And that too is a matter of judgment.
10. Remember that sometimes, recognising the limits of the possible is the beginning of wisdom.
Not all problems are soluble, and even fewer are soluble by metrics. It’s not true that everything can be improved by measurement, or that everything that can be measured can be improved. Nor is making a problem more transparent necessarily a step to its solution. Transparency may make a troubling situation more salient, without making it more soluble.
Ultimately, the issue is not one of metrics versus judgment, but metrics as informing judgment, which includes knowing how much weight to give to metrics, recognising their characteristic distortions, and appreciating what can’t be measured. In recent decades, too many politicians, business leaders, policymakers, and academic officials have lost sight of that.