MongoDB competes on speed and flexibility

MongoDB's New York conference showed off a variety of use cases

While debate rages on over the value of nonrelational, or NoSQL, databases, two case studies presented at a New York conference this week point to the benefits of using the MongoDB non-SQL data store instead of a standard relational database.

Representatives from both The New York Times and social networking service Foursquare, speaking at the MongoNYC conference held Wednesday in New York, explained why they used MongoDB. They praised MongoDB's ability to scale up and ingest lots of data, as well as its ease of reconfiguration.

"SQL databases have grown into these weird monstrosities. They don't really map to the problems you actually have, so you try to work around their warts," said Harry Heymann, the Foursquare engineer who oversees the company's servers, during his presentation. "MongoDB is a practical database for problems that engineers in the real world have. It was developed by people who built large-scale Web apps."

For The New York Times, MongoDB has been "awesome for flexible research and development," said Jake Porway, a New York Times data scientist, in his talk. Porway works for the news organization's research and development group, which looks at ways digital technology can enhance the presentation of news.

Porway also praised MongoDB's ability to ingest large amounts of data. "Mongo eats this data up," he said.

The New York Times used MongoDB, the open-source NoSQL data store developed by 10Gen, for its experimental Cascade data visualization tool.

Cascade visually demonstrates how links to New York Times stories get copied by multiple Twitter users, showing how messages get passed from one user to the next. "This was an exploratory tool that helps us understand how people share" information, Porway said.

Cascade depicts the number of people who pass a story link on to others, as well as how long it takes to pass this data around.

The New York Times posts 600 pieces of content every day, often putting links to those pieces of content on Twitter. Links to these stories get rebroadcast across Twitter an average of about 25,000 times a day, Porway said. The Cascade system saves all the Twitter messages, as well as the number of times each story link was forwarded and clicked on. All told, it produces about 100 GB of data each month.

"This allows us to [answer] questions that are really big, like what is the best time of day to tweet? What kinds of tweets get people involved? Is it more important for our automated feeds to tweet, or for our journalists?" Porway said.

The three-dimensional visualizations can show huge spikes in activities, which the user can then dig into to find more details, such as the actual messages.

The three-dimensional visualizations use data that has been collected in MongoDB. One table stores the actual Twitter messages. Another stores the data on the number of times users clicked on a story link, which is provided by link-shortening service The data store also ingests user access log files from The New York Times' own servers.

Porway noted that the Labs is constantly looking at new ways to analyze the data. He appreciates the fact it is easy to change database structures in MongoDB. For example, relational databases require that each field be associated with a particular data type, which can slow attempts to repurpose the data for new uses, Porway said. MongoDB does not have this requirement. "We are a research group, so we are constantly changing what we are looking for," Porway said.

Speed was another factor in using MongoDB. MongoDB has a distributed architecture, so it can easily scale up a data store across multiple servers. "We're pulling data from a fair number of different sources, so we need someplace where we can really dump data quickly," Porway said.

In the case of Foursquare, MongoDB is now saving all the data generated by the service's users. Formerly, the company used PostGres, but it is in the process of migrating its data off that relational database.

Foursquare is a location-based social network. As users travel about, they can post, or "check in," that they are at a certain location, such as a restaurant. It's designed to help people discover acquaintances who are nearby. Eventually, it will evolve into a city guide, one that can offer recommendations of nearby retail establishments, Heymann said.

Foursquare has 9 million users, who do 3 million "check-ins" per day. So far, the service has amassed about 750 million check-ins across 4 million places. Overall, Foursquare has 2.3 billion records and gets about 15,000 queries per second.

The biggest reason for switching to MongoDB, Heymann said, was for its auto-sharding, or the ability to split a database across different servers. At first, Foursquare kept all its data on a single machine. Eventually, the collection got so big that two machines were needed. Now the service runs across 40 virtual machines, organized into eight clusters, on Amazon's Elastic Compute Cloud (EC2). Heymann noted that he could have written an automatic sharding feature for PostGres but that would have required a lot of work. It was simpler to take advantage of the capability already embedded in MongoDB.

MongoDB also has some other features that Foursquare found handy. One is that MongoDB makes the data accessible in a manner that is more easily understandable for object-oriented programmers, when compared to the syntax required by SQL. SQL "is not the way most programmers think these days," Heymann said.

Another good feature is automatic failover, so that when a node fails for some reason, operations are redirected to the backup node. "This is something we could have done with SQL databases, but again, it is something we didn't have to do" with MongoDB, Heymann said.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags databasessoftwareapplicationsFoursquareThe New York Times10Gen

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Tom Pope

Dynabook Portégé X30L-G

Ultimately this laptop has achieved everything I would hope for in a laptop for work, while fitting that into a form factor and weight that is remarkable.

Tom Sellers


This smart laptop was enjoyable to use and great to work on – creating content was super simple.

Lolita Wang


It really doesn’t get more “gaming laptop” than this.

Jack Jeffries


As the Maserati or BMW of laptops, it would fit perfectly in the hands of a professional needing firepower under the hood, sophistication and class on the surface, and gaming prowess (sports mode if you will) in between.

Taylor Carr


The MSI PS63 is an amazing laptop and I would definitely consider buying one in the future.

Christopher Low

Brother RJ-4230B

This small mobile printer is exactly what I need for invoicing and other jobs such as sending fellow tradesman details or step-by-step instructions that I can easily print off from my phone or the Web.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?