jump to navigation

Google Books “Science” Article February 26, 2011

Posted by Christopher Lemery in Google.

A while back I read the much-discussed Science article entitled “Quantitative Analysis of Culture Using Millions of Digitized Books” (thanks to Librarian.net, you can read it here), which describes some lexicographic and cultural analysis  the authors performed on the Google Books database (or “corpus,” as the authors call it).  It was a really, really interesting article, and the graphs they provide are great. One graph in particular illustrates that, of their estimate of 1 million words in the English lexicon, only about half are in the Oxford English Dictionary. This reminded me how ridiculously time-consuming compiling a dictionary is (read the superb The Meaning of Everything for a good description of this) and of Erin McKean’s great TED talk about the evolution of the dictionary.  But I digress. In any event, the Science article illustrates once again how useful and potentially revolutionary Google Books is and I thought this field of “culturomics” could be earth-shattering.

As usual, the excellent Geoff Nunberg put the Science article into the proper perspective. In an article in the Chronicle Review (which I just got around to reading), Nunberg notes that quantitative methods have been around for a long time and this use of Google Books is just a jump in scale rather than kind from previous efforts. He also notes that the ways the data can be searched and manipulated leave a lot to be desired, particularly in comparison with similar tools such as the Corpus of Historical American English, which I’d never even heard of! Nunberg notes that culturomics will likely be subsumed into already-present fields and won’t replace the need for literature criticism or scholars to evaluate and understand the datasets that culturomics produces. I tend to agree with Nunberg’s conclusions, but it’s also true that there will be ways to use Google Books that we probably haven’t thought of yet, so the jury is still out on “culturomics.”

The Chronicle article also noted that metadata errors are a continuing problem with Google Books, again reminding me why the BIP project is so vital. As sophisticated as computers get, the maxim of “garbage in, garbage out” still holds and even Watson gets things really wrong!



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: