Alexa opens up Web search database and API

Alexa Internet is offering online computing capacity for US$1 an hour -- and throwing in access to the database of millions of Web pages that lurk behind its Alexa toolbar search service.

Programmers who register for the beta version of Alexa Web Search Platform, released Tuesday, can use it to create specialized search engines for vertical markets, drawing results from the database of 4 billion Web pages crawled by Alexa, the company said. Alexa is a subsidiary of Amazon.com.

Following in the footsteps of Google, Alexa is opening up the API (application programming interface) to parts of its search engine, but going one better by offering to host applications that build on its database -- for a fee. Programmers remixing Google's search utilities must organize their own application hosting.

Alexa Web Search Platform gives programmers a way to specify a subset of documents from the archive, develop an application to search those documents, and publish the results as an XML (Extensible Markup Language) feed or a specialized search engine.

The results returned can include simple text or HTML (Hypertext Markup Language) documents, or graphics, audio or video files.

As an example of how to use the service, Alexa has built a photo search engine at http://photos.alexa.com/ that allows visitors to refine their search for photographs according to technical details such as the size of the image, the make and model of camera it was taken with, and even the aperture setting used.

While the photo search engine shows how the platform can be used to build a live service, a one-off search of the database content can also be used to seed another service. That's how Rainer Typke, a researcher at the University of Utrecht in the Netherlands, used the platform to expand his searchable melody directory, http://www.musipedia.org.

Typke used the platform to extract around 1,000 MIDI files from Alexa's database, converted them to a monophonic form and stored them on his own server to make them easier to search. Musipedia doesn't use Alexa for its live search service, Typke said in an e-mail response to questions.

Using the Alexa computer cluster, Typke plans to identify hundreds of thousands of MIDI files in the database and process them using an algorithm that extracts their characteristic melody. Those melody files will be used to expand the Musipedia directory. Later, he hopes to be able to process files containing audio recordings in the same way.

"For the more computationally expensive preprocessing that would be required, especially by audio, Alexa's fast and large computers will come in handy," he said.

Alexa will charge for hosting applications that use the platform. The charges include US$1 per processor per hour for computing capacity, US$1 a year for 1G-byte of storage, $1 per 50G-bytes of data processed by the system, US$1 per gigabyte of data transferred into or out of the system, and US$1 for every 4,000 search requests the system responds to from published search engines using the service.

Typke expects the pricing will "be okay for people like me," he said. He's identified a number of ways to control the cost of his melody search, including updating the core data less frequently, or restricting the search to a smaller subset of Alexa's total data.

"I still need to get a feeling for how much I can do with one hour of computing power," he said. "Getting the 1,000 files for the prototype took just minutes."

The API is designed for the C programming language. It can be used to build "Web services" which can be integrated into other systems or published through Amazon.com's Web services platform, Alexa said.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Peter Sayer

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?