Google Ngram Viewer Hints at Google Books Potential

December 20, 2010 | Ian | Comments (0)

 
LasLabs_logot week Google went live with an interesting tool called the Google Ngram Viewer. Most will recall that a number of years ago Google began digitising the collections of a number of university libraries, as well as soliciting contributions directly from publishers. At the time,the potential of the project seemed enormous.  Unfortunately little has come of it due to legal disputes over various copyright issues. The Ngram Viewer offers a tangible hint of that original promise.

According to a paper published in the journal Science on the creation of the Ngram Viewer, Google has scanned in an estimated 11% of all the books ever published. To create the database for viewer, Google pulled just over 5 million of the best quality scans, particularly those for which publication information was known; about 4% of all the books ever published. Google then indexed each word to determine the frequency with which any given word or phrase appears, exposing the index through one of its staple search boxes. Typing in a word or phrase results in a graph of the frequency with which the word or phrase appeared in books published in each year over a given period. Google has also provided a number of options for limiting the search to specific date ranges and books published in various languages or countries.

Many have already devised interesting uses for the nGram search, ranging from tracking the relative popularity of oil versus gold, to the popularity of various philosophers. The project's creators themselves came to some interesting conclusions regarding the fleeting nature of, and best path to celebrity, among other things. These are likely the tip of the iceberg of what will result from the Ngram project and just a hint of the ultimate potential of the Google book scan effort.

Comments

Leave a Comment

Your email address will not be published. Required fields are marked *