Top Quotes: “Noise: A Flaw in Human Judgment” — Daniel Kahneman

Austin Rose
13 min readFeb 20, 2024

--

“A much larger study, conducted in 1981, involved 208 federal judges who were exposed to the same sixteen hypothetical cases. Its central findings were stunning:

In only 3 of the 16 cases was there a unanimous agreement to impose a prison term. Even where most judges agreed that a prison term was appropriate, there was a substantial variation in the lengths of prison terms recommended. In one fraud case in which the mean prison term was 8.5 years, the longest term was life in prison. In another case the mean prison term was 1.1 years, yet the longest prison term recommended was 15 years.

As revealing as they are, these studies, which involve tightly controlled experiments, almost certainly understate the magnitude of noise in the real world of criminal justice. Real-life judges are exposed to far more information than what the study participants received in the carefully specified vignettes of these experiments. Some of this additional information is relevant, of course, but there is also ample evidence that irrelevant information, in the form of small and seemingly random factors, can produce major differences in outcomes.”

“Even something as irrelevant as outside temperature can influence judges. A review of 207,000 immigration court decisions over four years found a significant effect of daily temperature variations: when it is hot outside, people are less likely to get asylum. If you are suffering political persecution in your home country and want asylum elsewhere, you should hope and maybe even pray that your hearing falls on a cool day.”

“Confidence is nurtured by the subjective experience of judgments that are made with increasing fluency and ease, in part because they resemble judgments made in similar cases in the past. Over time, as this underwriter learned to agree with her past self, her confidence in her judgments increased. She gave no indication that — after the initial apprenticeship phase — she had learned to agree with others, had checked to what extent she did agree with them, or had even tried to prevent her practices from drifting away from those of her colleagues.

Averaging two guesses by the same person does not improve judgments as much as does seeking out an independent second opinion. As Vul and Pashler put it, “You can gain about 1/10th as much from asking yourself the same question twice as you can from getting a second opinion from someone else.” This is not a large improvement. But you can make the effect much larger by waiting to make a second guess. When Vul and Pashler let three weeks pass before asking their subjects the same question again, the benefit rose to one-third the value of a second opinion.”

“Instead of merely asking their subjects to produce a second estimate, they encouraged people to generate an estimate that — while still plausible — was as different as possible from the first one. This request required the subjects to think actively of information they had not considered the first time. The instructions to participants read as follows:

First, assume that your first estimate is off the mark. Second, think about a few reasons why that could be. Which assumptions and considerations could have been wrong? Third, what do these new considerations imply? Was the first estimate rather too high or too low? Fourth, based on this new perspective, make a second, alternative estimate.

Like Vul and Pashler, Herzog and Hertwig then averaged the two estimates thus produced. Their technique, which they named dialectical bootstrapping, produced larger improvements in accuracy than did a simple request for a second estimate immediately following the first. Because the participants forced themselves to consider the question in a new light, they sampled another, more different version of themselves — two “members” of the “crowd within” who were further apart. As a result, their average produced a more accurate estimate of the truth. The gain in accuracy with two immediately consecutive “dialectical” estimates was about half the value of a second opinion.

The upshot for decision makers, as summarized by Herzog and Hertwig, is a simple choice between procedures: if you can get independent opinions from others, do it — this real wisdom of crowds is highly likely to improve your judgment. If you cannot, make the same judgment yourself a second time to create an “inner crowd.” You can do this either after some time has passed — giving yourself distance from your first opinion — or by actively trying to argue against yourself to find another perspective on the problem.”

People in a good mood are more cooperative and elicit reciprocation. They tend to end up with better results than do unhappy negotiators. Of course, successful negotiations make people happy, too, but in these experiments, the mood is not caused by what is going on in the negotiation; it is induced before people negotiate. Also, negotiators who shift from a good mood to an angry one during the negotiation often achieve good results — something to remember when you’re facing a stubborn counterpart!

On the other hand, a good mood makes us more likely to accept our first impressions as true without challenging them. In one of Forgas’s studies, participants read a short philosophical essay, to which a picture of the author was appended. Some readers saw a stereotypical philosophy professor — male, middle-aged, and wearing glasses. Others saw a young woman. As you can guess, this is a test of the readers’ vulnerability to stereotypes: do people rate the essay more favorably when it is attributed to a middle-aged man than they do when they believe that a young woman wrote it? They do, of course. But importantly, the difference is larger in the good-mood condition. People who are in a good mood are more likely to let their biases affect their thinking.”

Bad weather is associated with improved memory; judicial sentences tend to be more severe when it is hot outside; and stock market performance is affected by sunshine. In some cases, the effect of the weather is less obvious. Uri Simonsohn showed that college admissions officers pay more attention to the academic attributes of candidates on cloudier days and are more sensitive to nonacademic attributes on sunnier days. The title of the article in which he reported these findings is memorable enough: “Clouds Make Nerds Look Good.””

“The key finding was that group rankings were wildly disparate: across different groups, there was a great deal of noise. In one group, “Best Mistakes” could be a spectacular success, while “I Am Error” could flop. In another group, “I Am Error” could do exceedingly well, and “Best Mistakes” could be a disaster. If a song benefited from early popularity, it could do really well. If it did not get that benefit, the outcome could be very different.

To be sure, the very worst songs (as established by the control group) never ended up at the very top, and the very best songs never ended up at the very bottom. But otherwise, almost anything could happen. As the authors emphasize, “The level of success in the social influence condition was more unpredictable than in the independent condition.” In short, social influences create significant noise across groups. And if you think about it, you can see that individual groups were noisy, too, in the sense that their judgment in favor of one song, or against it, could easily have been different, depending on whether it attracted early popularity.

As Salganik and his coauthors later demonstrated, group outcomes can be manipulated fairly easily, because popularity is self-reinforcing. In a somewhat fiendish follow-up experiment, they inverted the rankings in the control group (in other words, they lied about how popular songs were), which meant that people saw the least popular songs as the most popular, and vice versa. The researchers then tested what the website’s visitors would do. The result was that most of the unpopular songs became quite popular, and most of the popular songs did very poorly.

27% of juries chose an award as high as, or even higher than, that of their most severe member. Not only were deliberating juries noisier than statistical juries, but they also accentuated the opinions of the individuals composing them.”

Everything seems to depend on early popularity. We’d better work hard to make sure that our new release has a terrific first week.”

“As I always suspected, ideas about politics and economics are a lot like movie stars. If people think that other people like them, such ideas can go far.

“Their striking finding was that any linear model, when applied consistently to all cases, was likely to outdo human judges in predicting an outcome from the same information. In one of the three samples, 77% of the ten thousand randomly weighted linear models did better than the human experts. In the other two samples, 100% of the random models outperformed the humans. Or, to put it bluntly, it proved almost impossible in that study to generate a simple model that did worse than the experts did.”

“In predictive judgments, human experts are easily outperformed by simple formulas — models of reality, models of a judge, or even randomly generated models. This finding argues in favor of using noise-free methods: rules and algorithms.”

“Cowgill developed a machine-learning algorithm to screen the résumés of candidates and trained it on more than three hundred thousand submissions that the company had received and evaluated. Candidates selected by the algorithm were 14% more likely than those selected by humans to receive a job offer after interviews. When the candidates received offers, the algorithm group was 18% more likely than the human-selected group to accept them. The algorithm also picked a more diverse group of candidates, in terms of race, gender, and other metrics; it was much more likely to select “nontraditional” candidates, such as those who did not graduate from an elite school, those who lacked prior work experience, and those who did not have a referral Human beings tended to favor résumés that checked all the boxes of the “typical” profile for a software engineer, but the algorithm gave each relevant predictor its proper weight.”

When a finding is described as “significant,” we should not conclude that the effect it describes is a strong one. It simply means that the finding is unlikely to be the product of chance alone. With a sufficiently large sample, a correlation can be at once very “significant” and too small to be worth discussing.”

“When we give in to this feeling of inevitability, we lose sight of how easily things could have been different — how, at each fork in the road, fate could have taken a different path. Jessica could have kept her job. She could have quickly found another one. A relative could have come to her aid. You, the social worker, could have been a more effective advocate. The building manager could have been more understanding and allowed the family a few weeks of respite, making it possible for Jessica to find a job and catch up with the rent.

These alternate narratives are as unsurprising as the main one — if the end is known. Whatever the outcome (eviction or not), once it has happened, causal thinking makes it feel entirely explainable, indeed predictable.”

“In the valley of the normal, events unfold just like the Joneses’ eviction: they appear normal in hindsight, although they were not expected, and although we could not have predicted them. This is because the process of understanding reality is backward-looking. An occurrence that was not actively anticipated (the eviction of the Jones family) triggers a search of memory for a candidate cause (the tough job market, the inflexible manager). The search stops when a good narrative is found. Given the opposite outcome, the search would have produced equally compelling causes (Jessica Jones’s tenacity, the understanding manager).”

When the search for an obvious cause fails, our first resort is to produce an explanation by filling a blank in our model of the world. This is how we infer a fact we had not known before (for instance, that the manager was an unusually kind person). Only when our model of the world cannot be tweaked to generate the outcome do we tag this outcome as surprising and start to search for a more elaborate account of it. Genuine surprise occurs only when routine hindsight fails.”

“Your neighbor may smile as your paths cross or may appear preoccupied and just nod — neither of these events will attract much attention if both have been reasonably frequent in the past. If the smile is unusually wide or the nod unusually perfunctory, you may well find yourself searching your memory for a possible cause. Causal thinking avoids unnecessary effort while retaining the vigilance needed to detect abnormal events.”

“Beyond an elementary level, statistical thinking also demands specialized training. This type of thinking begins with ensembles and considers individual cases as instances of broader categories. The eviction of the Joneses is not seen as resulting from a chain of specific events but is viewed as a statistically likely (or unlikely) outcome, given prior observations of cases that share predictive characteristics with the Joneses.

The distinction between these two views is a recurring theme of this book. Relying on causal thinking about a single case is a source of predictable errors. Taking the statistical view, which we will also call the outside view, is a way to avoid these errors.”

“Whether you’re haggling in a bazaar or sitting down for a complex business transaction, you probably have an advantage in going first, because the recipient of the anchor is involuntarily drawn to think of ways your offer could be reasonable. People always attempt to make sense of what they hear; when they encounter an implausible number, they automatically bring to mind considerations that would reduce its implausibility.”

“In a revealing study, consumers were found to be more likely to be affected by calorie labels if they were placed to the left of the food item rather than the right. When calories are on the left, consumers receive that information first and evidently think “a lot of calories!” or “not so many calories!” before they see the item. Their initial positive or negative reaction greatly affects their choices. By contrast, when people see the food item first, they apparently think “delicious!” or “not so great!” before they see the calorie label.”

“The advantage of comparative judgments applies to many domains. If you have a rough idea of people’s wealth, you will do better comparing individuals in the same range than you would by labeling their wealth individually. If you grade essays, you will be more precise when you rank them from best to worst than you are when you read and grade essays one by one. Comparative or relative judgments are more sensitive than categorical or absolute ones. As these examples suggest, they are also more effortful and time-consuming.”

Both of us say this movie is very good, but you seem to have enjoyed it a lot less than I did. We’re using the same words, but are we using the same scale?

The true expert who has “solved” a judgment problem knows not only why her explanatory story is correct; she is equally fluent in explaining why other stories are wrong. Here again, a person can gain confidence of equal strength but poorer quality by failing to consider alternatives or by actively suppressing them.”

“The examiners turned out to be susceptible to bias. When the same examiners considered the same prints they had seen earlier, but this time with biasing information, their judgments changed. In the first study, four out of five experts altered their previous identification decision when presented with strong contextual information that suggested an exclusion. In the second study, six experts reviewed four pairs of prints; biasing information led to changes in four of the twenty-four decisions. To be sure, most of their decisions did not change, but for these kinds of decisions, a shift of one in six can be counted as large. These findings have since been replicated by other researchers.”

“The susceptibility of forensic experts to confirmation bias is not just a theoretical concern because, in reality, no systematic precautions are in place to make sure that forensic experts are not exposed to biasing information. Examiners often receive such information in the transmittal letters that accompany the evidence submitted to them. Examiners are also often in direct communication with police, prosecutors, and other examiners.

Confirmation bias raises another problem. An important safeguard against errors, built into the ACE-V procedure, is the independent verification by another expert before an identification can be confirmed. But most often, only identifications are independently verified. The result is a strong risk of confirmation bias, as the verifying examiner knows that the initial conclusion was an identification. The verification step therefore does not provide the benefit normally expected from the aggregation of independent judgments, because verifications are not, in fact, independent.”

“When pathologists analyzed skin lesions for the presence of melanoma — the most dangerous form of skin cancer — there was only “moderate” agreement. The eight pathologists reviewing each case were unanimous or showed only one disagreement just 62% of the time.

Another study at an oncology center found that the diagnostic accuracy of melanomas was only 64%, meaning that doctors misdiagnosed melanomas in one of every three lesions. A third study found that dermatologists at New York University failed to diagnose melanoma from skin biopsies 36% of the time.”

“If all you know about two candidates is that one appeared better than the other in the interview, the chances that this candidate will indeed perform better are about 56 to 61%. Somewhat better than flipping a coin, for sure, but hardly a fail-safe way to make important decisions.”

“There is strong evidence, for instance, that hiring recommendations are linked to impressions formed in the informal rapport-building phase of an interview, those first two or three minutes where you just chat amicably to put the candidate at ease. First impressions turn out to matter — a lot.”

“The first seconds of an interview reflect exactly the sort of superficial qualities you associate with first impressions: early perceptions are based mostly on a candidate’s extraversion and verbal skills. Even the quality of a handshake.”

“In a traditional interview, interviewers are at liberty to steer the interview in the direction they see fit. They are likely to ask questions that confirm an initial impression. If a candidate seems shy and reserved, for instance, the interviewer may want to ask tough questions about the candidate’s past experiences of working in teams but perhaps will neglect to ask the same questions of someone who seems cheerful and gregarious. The evidence collected about these two candidates will not be the same. One study that tracked the behavior of interviewers who had formed a positive or negative initial impression from résumés and test scores found that initial impressions have a deep effect on the way the interview proceeds. Interviewers with positive first impressions, for instance, ask fewer questions and tend to “sell’ the company to the candidate.

Your chances of picking the better candidate with a structured interview are between 65 and 69%, a marked improvement over the 56 to 61% chance an unstructured interview would give you.

Google uses other data as inputs on some of the dimensions it cares about. To test job-related knowledge, it relies in part on work sample tests, such as asking a candidate for a programming job to write some code. Research has shown that work sample tests are among the best predictors of on-the-job performance. Google also uses “backdoor references,” supplied not by someone the candidate has nominated but by Google employees with whom the candidate has crossed paths.

--

--

Austin Rose
Austin Rose

Written by Austin Rose

I read non-fiction and take copious notes. Currently traveling around the world for 5 years, follow my journey at https://peacejoyaustin.wordpress.com/blog/

No responses yet