Top Quotes: “Everybody Lies” — Seth Stephens-Davidowitz
Background: Stephens-Davidowitz is a data scientist who uses Google searches along with other modern data sources to shed light on topics which surveys are often unable to accurately represent. He attempts to answer with data how many American men are in the closet, what percent of Americans are explicitly racist (and where do they live?), and whether more people regret having or not having children. It’s a fun read that made me question some things for granted and definitely made clear to me why pollsters are often so off in their predictions. *TRIGGER WARNING* There’s some very disturbing data about explicit racism in here :/
Introduction
“In the U.S., the n-word is included in roughly the same number of searches as migraines or Lakers. 20% of those searches also included the word ‘jokes.’”
“On Obama’s first election night, 1 in 100 searches that included the word Obama also included KKK or the n-word. In some states, there were more searches for ‘n-word President’ than ‘first black President.’”
“The highest racist search rates were in upstate New York, parts of the Rust Belt, West Virginia, South Louisiana, and Mississippi. The true divide isn’t North vs. South but East vs. West — you don’t get this sort of thing West of the Mississippi.”
“Obama lost roughly 4% nationwide just from explicit racism when you overlay Google Trends with voting results vs. Kerry in 2004.”
“In the previous three elections, the candidate who appeared first in the most searches containing both of the candidates names won.”
Data: Big and Small
“Winter climate swamped all other correlations between an area’s Google’s for depression and other factors. In winter months, warm climates have 40% fewer depression searches than cold climates. Antidepressants only decrease the incidence of depression by about 20%.”
“Black men are 40 times more likely than white men to reach the NBA. But a black kid born in one of the wealthiest U.S. counties is more than twice as likely to make the NBA than a black kid born in one of the poorest counties.”
“Black kids born into poverty are nearly twice as likely to have a name that is given to no other child born in that same year.”
“Each additional inch of height doubles your chance of making the NBA. 1 in 20 million men over six feet tall reach the NBA; 1 in 5 men over 7 feet tall do.”
The Powers of Big Data
“Big data provides new data, honest data, the ability to zoom in on small subsets of people, and the ability to do causal experiments.”
“Pizza went from virtually no mention in books before 1980 to nine mentions per million words in 1990.”
“‘The United States are’ was more commonly used until 1880–15 years after the Civil War. The complete domination of ‘the United States is’ wasn’t until the 1920s.”
“On a first date, when someone uses hedge words and phrases like ‘probably’ and ‘I guess,’ they are less likely to want a second date. They are likely to be interested when they talk about themself or use self-marking phrases like ‘You know?’ or ‘I mean.’”
“Daters like people who follow their lead, laugh at their jokes, and keep the conversation on topics they introduce rather than constantly changing the topic to things the other person wants to discuss.”
“Daters like people who express support — ‘that’s awesome’ ‘that’s really cool’ — or who express sympathy ‘that’s tough’ or ‘you must be sad.’”
“If there are a lot of questions asked on a date, it’s less likely either will report a connection — questions are signs of boredom and things people might say when the conversation stalls.”
“Women use the word ‘tomorrow’ much more than men do — perhaps because men aren’t so great at thinking ahead.”
“A huge percentage of stories — based on coding of words as positive or negative — fit into six relatively simple structures: rags to riches, riches to rags, man in a hole (fall, then rise), Icarus (rise, then fall), Cinderella (rise, then fall, then rise), or Oedipus (fall, then rise, then fall).”
“Positive news articles are more likely to go viral.”
“Analysis of politically charged phrases like ‘death tax’ reveal how biased newspapers are. Newspapers tend to reflect the political views of local residents more than their owners’ views.”
“People used to think of photos like paintings for which one couldn’t hold a smile for many hours, so they adopted a serious look. In the ’50s, Kodak was frustrated by the limited number of photos people were taking and changed ads to associate photos with happiness.”
“Night light data can be used for GDP estimates. And one company employs workers in developing countries to take photos of things like gas stations or bins of fruit in the supermarket and these photos are turned into data that can estimate economic output and inflation.”
“Social desirability bias is the term for when people underreport or lie on surveys because they want to look good even though surveys are anonymous. This is extremely common.”
“About 1/3 of the time, people lie in real life. We also lie to ourselves. Teenagers often lie just to mess with surveys. People have no incentive to tell the truth on surveys.”
“The more impersonal the conditions, the more honest people will be. An internet survey is better than a phone survey, which is better than an in-person survey.”
“When Googling, you do have an incentive to tell the truth: you will get the information or thing you’re looking for. And you can reveal things you haven’t admitted to yourself: not googling your polling place before an election, looking up why you’re having crying jabs.”
“People are seven times more likely to ask Google whether they’ll regret not having kids than having them. But after making their decision, adults with children are four times more likely than those without kids to tell Google they regret their decision.”
“Judging by porn searches, 5% of American men are gay and there is less variation between tolerant and intolerant places than you’d think — 4.8% in Mississippi vs. 5.2% in Rhode Island.”
“‘Is my husband gay?’ is searched eight times more often than ‘Is my husband an alcoholic?’ and ten times more often than ‘Is my husband depressed?. In 21 out of 25 states where this question is most frequently asked, support for same-sex marriage is below the national average. Craigslist ads for males seeking casual encounters with men are more common in less tolerant states — Kentucky, Louisiana, and Alabama are among the states with the highest percentages. Searches for a ‘gay test’ are twice as prevalent in the least tolerant states. Clearly many men remain in the closet.”
“25% of straight porn searches by women emphasize pain or humiliation placed upon a woman. 5% of searches by women are for nonconsensual sex.”
“On Google, there are 16 times more complaints about a spouse not wanting sex than about a married partner not wanting to talk.”
“For every search women make about a partner’s phallus, men make roughly 170 searches about their own. 40% of complaints about a partner’s penis size say that it’s too big.”
“According to sites people visit, interest in beauty and fitness is 42% male (vs. female), weight loss is 33% male, and cosmetic surgery is 39% male.”
“After the San Bernardino shooting, Americans searched for the phrase ‘kill Muslims’ with about the same frequency as ‘martini recipes’ or ‘migraine symptoms.’”
“After Obama’s ensuing speech about inclusion and toleration, all negative Muslim searches increased and all positive Muslim searches decreased.”
“Searches for the n-word or n-word jokes are highest when black people are in the media — like after Hurricane Katrina or on Martin Luther King, Jr. Day.”
“Parents are 2.5 times more likely to ask Google if their son is gifted vs. their daughter. But at young ages, girls have larger vocabularies and are 9% more likely to be in gifted programs.”
“Parents google ‘Is my daughter overweight?’ twice as frequently as they do for their sons. They’re three times more likely to ask if their daughter (rather than son) is ugly.”
“There’s a higher percentage of white people on white nationalist sites in diverse states and 19 is the most common age at which they join. There’s no correlation with unemployment rate. A high percentage of them view the New York Times site regularly.”
“The internet news sites are far closer to perfect desegregation of liberals and conservatives than segregation — partially because large news sites get 50% of views.”
“Because we have weak ties on Facebook, we’re more likely to see opposing views shared there than offline.”
“Child abuse rose during the ’08 recession as kids Googled things like ‘my dad hit me.’ Official reports missed this because people who tend to report or handle child abuse cases were more likely to be overworked or out of work. There’s a strong correlation between a region’s unemployment rate and child abuse.”
“In 2015, there were 700,000 searches for self-induced abortion and 3.4 million searches for abortion clinics. Clearly a significant proportion of women contemplated doing abortions themselves. Self-induced abortion searches jumped 40% in 2011 as 92 state provisions that restrict access to abortions were enacted. Looking at live birth and abortion data by state reveals that there are likely self-induced abortions occurring in restrictive states.”
“The more rich people in a city, the longer the life expectancy of poor people there — perhaps because of contagious habits like exercising more, smoking less, and eating healthier.”
“College towns, large cities, and areas with lots of foreign-born people create more famous people — likely because of early exposure.”
“The founder of Patients Like Me hopes you can find people of similar age and gender with your history reporting symptoms similar to yours — and see what worked for them.”
“A medical researcher Isaac Kohane wants to organize and collect all of our health information so instead of using a one-size-fits-all approach, doctors can find patients just like you. Then they can employ more personalized, focused diagnoses and treatments.”
“The datasets doctors use to make diagnoses are small — based on a doctor’s experience with the population of patients she’s treated and supplemented by academic papers from small populations that other researchers have encountered. To get really good, diagnoses would have to include many more cases.”
“More than half of middle school students in rural India can’t read a simple sentence; one potential reason students struggle so much is that teachers don’t show up consistently — more than 40% are absent on a given day. In a randomized experiment, teachers were paid a small amount for every day they showed up in addition to their base pay. Teacher absenteeism dropped in half and student test performance improved substantially, with the highest effect among young girls who were 7% more likely to be able to write.”
“The fourth power of Big Data is that it makes randomized experiments, which can find truly causal effects, much, much easier to conduct. In 2011, Google engineers ran 7,000 AB tests.”
“The lesson of AB testing is to be wary of general lessons — test everything.”
“AB testing may play a role in making the internet more addictive — sites are optimizing to maximize how much time you spend in them.”
“People in cities of teams that qualify for the Super Bowl attend movies advertised during it at a significantly higher rate than those in cities of teams that just missed qualifying.”
“Prisoners assigned to harsher conditions were more likely to commit crimes when they got out than similar prisoners in less harsh conditions.”
“When two students, from similar backgrounds, both got into Harvard but one chose Penn State, these students ended up with more or less the same incomes in their careers.”
“The days of academics devoting months to recruiting a small number of undergrads to perform a single test will come to an end. Instead, academics will utilize digital data to test a few hundred or thousand ideas in just a few seconds. We’ll be able to learn a lot more in a lot less time.”