Book Reviews

A Play on Words

Uncharted: Big Data as a Lens on Human Culture

Riverhead Books, New York, 2013, 288 pp., $27.95 (cloth).

There are many things that are uncharted in this book. But Erez Aiden and Jean-Baptiste Michel don’t mean by “uncharted” that things are left out—in fact, a more appropriate title might be “Charted.”

The book tells the story of collecting the billions of words in all the world’s books, words that were previously lost in the meaning of the text—“uncharted” as it were—but can now be charted to our heart’s content. The authors hope that the charting process will lead us to discover interesting aspects of our culture, which they refer to as “culturomics.” Aiden and Michel collaborated with Google to make a powerful Web tool, but their claims about its usefulness are perhaps extravagant.Â

This is not to say that the book isn’t fun. That’s what you’d expect from the acknowledgments, in which Aiden thanks his three children and includes the middle name of a daughter: Banana. (At least he’s quirkily consistent; his son is Galileo.)

Now I’m all for fun. But Aiden and Michel are doing important scientific work, and they don’t do themselves any favors by giving “fun” examples. It doesn’t take big data to convince us that the word chupacabra (a blood-drinking creature reportedly sighted in Puerto Rico in 1995) is much rarer than Sasquatch or the Loch Ness Monster. It also seems silly to chart the changing usage of “argh” and “aargh” in books published sometime between the 1940s (it’s hard to tell the starting date from the chart reproduced in the book) and 2000. There’s a quote on the book jacket from Mother Jones that calls the Ngram Viewer “the greatest timewaster in the history of the Internet.” It was bold of the publisher to include that.Â

To document a cultural history by getting robots to read every word of every book ever published is ambitious. So what do I mean by saying there’s a lot left out of this effort? Aiden and Michel acknowledge that they are searching through a tiny sample of words, and although they say that Google has so far scanned some 30 million books (probably more by now), there are still some 100 million to go.Â

Further, if a word’s usage is a clue to our cultural history, many sources are ignored in this book: newspaper and magazine articles, letters, movies, TV and radio interviews, transcripts, lectures—in fact, everything written or spoken, but not published in a book. Besides, after books are...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT