Google debuts text analysis tools

A new tool for Google allows users to search for phrases within 5.2 million digitized books

Google has introduced two tools that may help users discover new ways to parse the company's massive collections of public information.

One tool counts how often a chosen phrase shows up across 500 years worth of digitized books, while another divvies up search results by their levels of reading difficulty.

The first service called the Books Ngram Viewer, allows people to search for specific phrases within the company's massive collection of digitized books. In addition to links to the source material, the results will also provide a timeline showing when the phrase was most often used.

The tool runs searches against a database of 500 billion words found within 5.2 million books Google has digitized. The sampled books were all published between 1500 and 2008, in Chinese, English, German, French, Russian or Spanish.

With this service, Google hopes to introduce a new form of quantitative analysis to academic fields, one that could provide insights into historical trends or the birth of new ideas by the tracking the popularity of associated words and phrases. One group of researchers has coined the term "culturomics" to describe the approach.

Such metrics can show how phrases come into and move out of vogue, oftentimes due to historical events.

For instance, a search for the phrase "World War One" shows the term began to be used just prior to the outbreak of World War II. Not surprisingly, occurrences of the phrase "The Great War" dropped by the 1950s, which was what World War I was called before people realized there would be a sequel.

Google has also added another form of analysis to its regular search as well: The company has introduced a new advanced search feature that can divide up results by reading level. The search breaks result into basic, intermediate and advanced reading levels.

Although Google does not specify what attributes define each reading level, most readability tests analyze texts by looking at attributes such as the number of words in each sentence or the number of letters and syllables in each word, under the assumption that more complicated sentences would be more difficult to read.

In one sample, 74 per cent of the material on the IDG site InfoWorld is classified as intermediate level, 21 per cent is basic and three per cent is advanced.

Google documentation explains that users might find the distinction of reading levels useful in helping to complete searches. A university professor might want only the advanced results, while a junior high school teacher might want to find more basic material for students.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags cloud computinginternetGooglesearch enginessoftwareapplicationsweb servicesdata mining

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Tom Pope

Dynabook Portégé X30L-G

Ultimately this laptop has achieved everything I would hope for in a laptop for work, while fitting that into a form factor and weight that is remarkable.

Tom Sellers


This smart laptop was enjoyable to use and great to work on – creating content was super simple.

Lolita Wang


It really doesn’t get more “gaming laptop” than this.

Jack Jeffries


As the Maserati or BMW of laptops, it would fit perfectly in the hands of a professional needing firepower under the hood, sophistication and class on the surface, and gaming prowess (sports mode if you will) in between.

Taylor Carr


The MSI PS63 is an amazing laptop and I would definitely consider buying one in the future.

Christopher Low

Brother RJ-4230B

This small mobile printer is exactly what I need for invoicing and other jobs such as sending fellow tradesman details or step-by-step instructions that I can easily print off from my phone or the Web.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?