MongoDB competes on speed and flexibility

MongoDB's New York conference showed off a variety of use cases

While debate rages on over the value of nonrelational, or NoSQL, databases, two case studies presented at a New York conference this week point to the benefits of using the MongoDB non-SQL data store instead of a standard relational database.

Representatives from both The New York Times and social networking service Foursquare, speaking at the MongoNYC conference held Wednesday in New York, explained why they used MongoDB. They praised MongoDB's ability to scale up and ingest lots of data, as well as its ease of reconfiguration.

"SQL databases have grown into these weird monstrosities. They don't really map to the problems you actually have, so you try to work around their warts," said Harry Heymann, the Foursquare engineer who oversees the company's servers, during his presentation. "MongoDB is a practical database for problems that engineers in the real world have. It was developed by people who built large-scale Web apps."

For The New York Times, MongoDB has been "awesome for flexible research and development," said Jake Porway, a New York Times data scientist, in his talk. Porway works for the news organization's research and development group, which looks at ways digital technology can enhance the presentation of news.

Porway also praised MongoDB's ability to ingest large amounts of data. "Mongo eats this data up," he said.

The New York Times used MongoDB, the open-source NoSQL data store developed by 10Gen, for its experimental Cascade data visualization tool.

Cascade visually demonstrates how links to New York Times stories get copied by multiple Twitter users, showing how messages get passed from one user to the next. "This was an exploratory tool that helps us understand how people share" information, Porway said.

Cascade depicts the number of people who pass a story link on to others, as well as how long it takes to pass this data around.

The New York Times posts 600 pieces of content every day, often putting links to those pieces of content on Twitter. Links to these stories get rebroadcast across Twitter an average of about 25,000 times a day, Porway said. The Cascade system saves all the Twitter messages, as well as the number of times each story link was forwarded and clicked on. All told, it produces about 100 GB of data each month.

"This allows us to [answer] questions that are really big, like what is the best time of day to tweet? What kinds of tweets get people involved? Is it more important for our automated feeds to tweet, or for our journalists?" Porway said.

The three-dimensional visualizations can show huge spikes in activities, which the user can then dig into to find more details, such as the actual messages.

The three-dimensional visualizations use data that has been collected in MongoDB. One table stores the actual Twitter messages. Another stores the data on the number of times users clicked on a story link, which is provided by link-shortening service The data store also ingests user access log files from The New York Times' own servers.

Porway noted that the Labs is constantly looking at new ways to analyze the data. He appreciates the fact it is easy to change database structures in MongoDB. For example, relational databases require that each field be associated with a particular data type, which can slow attempts to repurpose the data for new uses, Porway said. MongoDB does not have this requirement. "We are a research group, so we are constantly changing what we are looking for," Porway said.

Speed was another factor in using MongoDB. MongoDB has a distributed architecture, so it can easily scale up a data store across multiple servers. "We're pulling data from a fair number of different sources, so we need someplace where we can really dump data quickly," Porway said.

In the case of Foursquare, MongoDB is now saving all the data generated by the service's users. Formerly, the company used PostGres, but it is in the process of migrating its data off that relational database.

Foursquare is a location-based social network. As users travel about, they can post, or "check in," that they are at a certain location, such as a restaurant. It's designed to help people discover acquaintances who are nearby. Eventually, it will evolve into a city guide, one that can offer recommendations of nearby retail establishments, Heymann said.

Foursquare has 9 million users, who do 3 million "check-ins" per day. So far, the service has amassed about 750 million check-ins across 4 million places. Overall, Foursquare has 2.3 billion records and gets about 15,000 queries per second.

The biggest reason for switching to MongoDB, Heymann said, was for its auto-sharding, or the ability to split a database across different servers. At first, Foursquare kept all its data on a single machine. Eventually, the collection got so big that two machines were needed. Now the service runs across 40 virtual machines, organized into eight clusters, on Amazon's Elastic Compute Cloud (EC2). Heymann noted that he could have written an automatic sharding feature for PostGres but that would have required a lot of work. It was simpler to take advantage of the capability already embedded in MongoDB.

MongoDB also has some other features that Foursquare found handy. One is that MongoDB makes the data accessible in a manner that is more easily understandable for object-oriented programmers, when compared to the syntax required by SQL. SQL "is not the way most programmers think these days," Heymann said.

Another good feature is automatic failover, so that when a node fails for some reason, operations are redirected to the backup node. "This is something we could have done with SQL databases, but again, it is something we didn't have to do" with MongoDB, Heymann said.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is

Join the PC World newsletter!

Error: Please check your email address.

Tags 10GendatabasesThe New York TimesapplicationsFoursquaresoftware

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles


PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?