Facebook open-source cache squeezes more from flash disks

Facebook's homebuilt Flashcache can reduce wear and tear on SSDs while boosting data throughput to users

Facebook found hotspots in its cache, where frequently consulted data could cluster in small areas, causing bottlenecks

Facebook found hotspots in its cache, where frequently consulted data could cluster in small areas, causing bottlenecks

Facebook continues to push the boundaries of storage and server technology in order to more quickly serve its billion users, and the results are being offered as open-source technology that can also benefit other companies.

Recently, Facebook updated its internally developed caching software, called Flashcache, to more efficiently use the thousands of solid-state drives (SSDs) that the social networking giant deploys to store frequently consulted data.

The newly released Flashcache 3.0 is able to make better decisions about what data to cache, while reducing the amount of wear and tear on expensive flash disks.

"With these improvements, Flashcache has become a building block in the Facebook stack," wrote Domas Mituzas, a Facebook database engineer who authored a blog post explaining the updates to the open-source software.

The work aims to improve overall Facebook performance without unduly driving up operating costs.

"While the cost per GB for flash is coming down, it's still not where it needs to be," Mituzas wrote. Given the premium prices commanded for SSDs, Facebook doesn't want to wear out these disks too quickly. "SSDs have limited write cycles, so we have to make sure that we're not writing too much."

Flashcache is one of a number of software projects that Facebook originally developed in house that the company has also released as open source. Earlier this year, for instance, the company also released a virtual machine, called HipHop, that speeds the processing of PHP code.

The company hopes that other organizations could reuse such programs as HipHop and Flashcache and eventually contribute to their further development. Like other open-source caching software such as memcache and Redis, Flashcache can be used to speed the responsiveness of a heavily visited website or popular Web application.

Facebook originally created Flashcache to boost the responsiveness of the MySQL databases that store user data. The software can be loaded onto the Linux kernel as a module without making any changes to the kernel itself.

The idea behind Flashcache is to use SSDs to hold the material that is most requested by users. SSDs tend to be faster than traditional rotating platter hard drives, though they are also more expensive by the GB when compared to hard drives. So it would not be cost-effective for Facebook to store all of its data on SSDs, especially if the vast majority of Facebook user data is rarely consulted.

Although designed to work with MySQL and the MySQL InnoDB database storage engine, Flashcache can be used as a general caching mechanism for Linux systems.

Flashcache can also speed times it takes to write data to disk, from the user's perspective, by saving newly updated data on SSD first and then writing it to the hard drives later.

The updated Flashcache module improves performance in read-write distribution, cache eviction and write efficiency.

Analyzing Flashcache performance, Facebook had found that most of its caches have a small subset of data that is read much more frequently than most of the other data.

With the previous version of Flashcache, 50 percent of a cache's contents accounted for 80 percent of disk operations. Such a concentration of frequently consulted material could cause performance bottlenecks.

To improve Flashcache's read-write distribution, the engineers developed a number of techniques to automatically position the data so that cache reads are distributed more evenly across the SSD. Now 50 percent of the cache accounts for 50 percent of the disk operations.

To improve the process of determining which data to move off the cache, a process called cache eviction, Flashcache switched from using the FIFO (first in first out) algorithm--in which the oldest data in the cache is removed first to make room for new data--to a LRU (least recently used) algorithm, which discards the data that hasn't been requested for the longest period of time.

Improvements were also made in write efficiency.

Previously the software would write to disk only when it had a certain amount of data that was ready to be written. This resulted in uneven performance across different caches, however. So, Facebook engineers developed an approach that would write the cached data to disk whenever a copy of that data was requested by a user, which resulted in a smoother flow of write operations.

Thanks to these improvements, the updated caching mechanism has an average hit rate -- or information that is requested by users that resides in cache--of 80 percent, up from 60 percent in the previous version. This means more data is served more quickly.

Updating the software has also slashed server I/O (input/output) required to read data by 40 percent, and reduced the I/O required to write data by 75 percent. For a company that is running thousands of servers, such a reduction in traffic can help make more efficient use of servers and keep hardware costs manageable.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags open sourceFacebooksoftwaremiddleware

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments



Victorinox Werks Professional Executive 17 Laptop Case

Learn more >



Back To Business Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?