MongoDB competes on speed and flexibility

MongoDB's New York conference showed off a variety of use cases

While debate rages on over the value of nonrelational, or NoSQL, databases, two case studies presented at a New York conference this week point to the benefits of using the MongoDB non-SQL data store instead of a standard relational database.

Representatives from both The New York Times and social networking service Foursquare, speaking at the MongoNYC conference held Wednesday in New York, explained why they used MongoDB. They praised MongoDB's ability to scale up and ingest lots of data, as well as its ease of reconfiguration.

"SQL databases have grown into these weird monstrosities. They don't really map to the problems you actually have, so you try to work around their warts," said Harry Heymann, the Foursquare engineer who oversees the company's servers, during his presentation. "MongoDB is a practical database for problems that engineers in the real world have. It was developed by people who built large-scale Web apps."

For The New York Times, MongoDB has been "awesome for flexible research and development," said Jake Porway, a New York Times data scientist, in his talk. Porway works for the news organization's research and development group, which looks at ways digital technology can enhance the presentation of news.

Porway also praised MongoDB's ability to ingest large amounts of data. "Mongo eats this data up," he said.

The New York Times used MongoDB, the open-source NoSQL data store developed by 10Gen, for its experimental Cascade data visualization tool.

Cascade visually demonstrates how links to New York Times stories get copied by multiple Twitter users, showing how messages get passed from one user to the next. "This was an exploratory tool that helps us understand how people share" information, Porway said.

Cascade depicts the number of people who pass a story link on to others, as well as how long it takes to pass this data around.

The New York Times posts 600 pieces of content every day, often putting links to those pieces of content on Twitter. Links to these stories get rebroadcast across Twitter an average of about 25,000 times a day, Porway said. The Cascade system saves all the Twitter messages, as well as the number of times each story link was forwarded and clicked on. All told, it produces about 100 GB of data each month.

"This allows us to [answer] questions that are really big, like what is the best time of day to tweet? What kinds of tweets get people involved? Is it more important for our automated feeds to tweet, or for our journalists?" Porway said.

The three-dimensional visualizations can show huge spikes in activities, which the user can then dig into to find more details, such as the actual messages.

The three-dimensional visualizations use data that has been collected in MongoDB. One table stores the actual Twitter messages. Another stores the data on the number of times users clicked on a story link, which is provided by link-shortening service The data store also ingests user access log files from The New York Times' own servers.

Porway noted that the Labs is constantly looking at new ways to analyze the data. He appreciates the fact it is easy to change database structures in MongoDB. For example, relational databases require that each field be associated with a particular data type, which can slow attempts to repurpose the data for new uses, Porway said. MongoDB does not have this requirement. "We are a research group, so we are constantly changing what we are looking for," Porway said.

Speed was another factor in using MongoDB. MongoDB has a distributed architecture, so it can easily scale up a data store across multiple servers. "We're pulling data from a fair number of different sources, so we need someplace where we can really dump data quickly," Porway said.

In the case of Foursquare, MongoDB is now saving all the data generated by the service's users. Formerly, the company used PostGres, but it is in the process of migrating its data off that relational database.

Foursquare is a location-based social network. As users travel about, they can post, or "check in," that they are at a certain location, such as a restaurant. It's designed to help people discover acquaintances who are nearby. Eventually, it will evolve into a city guide, one that can offer recommendations of nearby retail establishments, Heymann said.

Foursquare has 9 million users, who do 3 million "check-ins" per day. So far, the service has amassed about 750 million check-ins across 4 million places. Overall, Foursquare has 2.3 billion records and gets about 15,000 queries per second.

The biggest reason for switching to MongoDB, Heymann said, was for its auto-sharding, or the ability to split a database across different servers. At first, Foursquare kept all its data on a single machine. Eventually, the collection got so big that two machines were needed. Now the service runs across 40 virtual machines, organized into eight clusters, on Amazon's Elastic Compute Cloud (EC2). Heymann noted that he could have written an automatic sharding feature for PostGres but that would have required a lot of work. It was simpler to take advantage of the capability already embedded in MongoDB.

MongoDB also has some other features that Foursquare found handy. One is that MongoDB makes the data accessible in a manner that is more easily understandable for object-oriented programmers, when compared to the syntax required by SQL. SQL "is not the way most programmers think these days," Heymann said.

Another good feature is automatic failover, so that when a node fails for some reason, operations are redirected to the backup node. "This is something we could have done with SQL databases, but again, it is something we didn't have to do" with MongoDB, Heymann said.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags databasessoftwareapplicationsFoursquareThe New York Times10Gen

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments


James Cook University - Master of Data Science Online Course

Learn more >


Sansai 6-Outlet Power Board + 4-Port USB Charging Station

Learn more >



Back To Business Guide

Click for more ›

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?