Everybody Lies: What the Internet Can Tell Us About Who We Really Are, Seth Stephens-Davidowitz
Everybody Lies is an enthusiastic defence of the premise that “big data” — such as aggregate data from the kind of things people search in Google — might tell us things about humans that we wouldn’t admit even on an anonymous survey, and which things like implicit association tests hope to dig out. My main feeling going in was that I’d expect such a dataset to have its own drawbacks, and that I’d be very sceptical if the author pretended that it did not.
Well, though the author writes enthusiastically and persuasively about the subject, he does mention some cautionary tales and drawbacks, and he makes very good points about things like sexuality. Someone in the closet in a homophobic country doesn’t have much incentive to admit to being gay to an anonymous survey, but they might still search for gay porn (and indeed searches for gay porn match reasonably well across the world, showing that there’s a background rate of people who are at least interested in it in principle.
(His data actually just shows where men are interested in men having sex with men, not where men are gay, which is something he doesn’t really notice. Bisexual men don’t exist for the purposes of his discussion here, even though he’d be much better to just talk about same-sex attraction and include the possibility of both homosexuality and bisexuality.)
The book is full of interesting examples and applications, and a sprinkling of the author’s personality (as many pop-sci type books do). He’s excited about his work, but not too credulous, and it’s a reasonable introduction to the concept that has me… okay, not convinced that data science is actually necessarily going to produce the next great specialist in every subject (as he suggests), but hopeful that data from Google searches and other similar bodies of data can indeed teach us things about ourselves.