Facebook open-source cache squeezes more from flash disks

Facebook's homebuilt Flashcache can reduce wear and tear on SSDs while boosting data throughput to users

Facebook found hotspots in its cache, where frequently consulted data could cluster in small areas, causing bottlenecks

Facebook found hotspots in its cache, where frequently consulted data could cluster in small areas, causing bottlenecks

Facebook continues to push the boundaries of storage and server technology in order to more quickly serve its billion users, and the results are being offered as open-source technology that can also benefit other companies.

Recently, Facebook updated its internally developed caching software, called Flashcache, to more efficiently use the thousands of solid-state drives (SSDs) that the social networking giant deploys to store frequently consulted data.

The newly released Flashcache 3.0 is able to make better decisions about what data to cache, while reducing the amount of wear and tear on expensive flash disks.

"With these improvements, Flashcache has become a building block in the Facebook stack," wrote Domas Mituzas, a Facebook database engineer who authored a blog post explaining the updates to the open-source software.

The work aims to improve overall Facebook performance without unduly driving up operating costs.

"While the cost per GB for flash is coming down, it's still not where it needs to be," Mituzas wrote. Given the premium prices commanded for SSDs, Facebook doesn't want to wear out these disks too quickly. "SSDs have limited write cycles, so we have to make sure that we're not writing too much."

Flashcache is one of a number of software projects that Facebook originally developed in house that the company has also released as open source. Earlier this year, for instance, the company also released a virtual machine, called HipHop, that speeds the processing of PHP code.

The company hopes that other organizations could reuse such programs as HipHop and Flashcache and eventually contribute to their further development. Like other open-source caching software such as memcache and Redis, Flashcache can be used to speed the responsiveness of a heavily visited website or popular Web application.

Facebook originally created Flashcache to boost the responsiveness of the MySQL databases that store user data. The software can be loaded onto the Linux kernel as a module without making any changes to the kernel itself.

The idea behind Flashcache is to use SSDs to hold the material that is most requested by users. SSDs tend to be faster than traditional rotating platter hard drives, though they are also more expensive by the GB when compared to hard drives. So it would not be cost-effective for Facebook to store all of its data on SSDs, especially if the vast majority of Facebook user data is rarely consulted.

Although designed to work with MySQL and the MySQL InnoDB database storage engine, Flashcache can be used as a general caching mechanism for Linux systems.

Flashcache can also speed times it takes to write data to disk, from the user's perspective, by saving newly updated data on SSD first and then writing it to the hard drives later.

The updated Flashcache module improves performance in read-write distribution, cache eviction and write efficiency.

Analyzing Flashcache performance, Facebook had found that most of its caches have a small subset of data that is read much more frequently than most of the other data.

With the previous version of Flashcache, 50 percent of a cache's contents accounted for 80 percent of disk operations. Such a concentration of frequently consulted material could cause performance bottlenecks.

To improve Flashcache's read-write distribution, the engineers developed a number of techniques to automatically position the data so that cache reads are distributed more evenly across the SSD. Now 50 percent of the cache accounts for 50 percent of the disk operations.

To improve the process of determining which data to move off the cache, a process called cache eviction, Flashcache switched from using the FIFO (first in first out) algorithm--in which the oldest data in the cache is removed first to make room for new data--to a LRU (least recently used) algorithm, which discards the data that hasn't been requested for the longest period of time.

Improvements were also made in write efficiency.

Previously the software would write to disk only when it had a certain amount of data that was ready to be written. This resulted in uneven performance across different caches, however. So, Facebook engineers developed an approach that would write the cached data to disk whenever a copy of that data was requested by a user, which resulted in a smoother flow of write operations.

Thanks to these improvements, the updated caching mechanism has an average hit rate -- or information that is requested by users that resides in cache--of 80 percent, up from 60 percent in the previous version. This means more data is served more quickly.

Updating the software has also slashed server I/O (input/output) required to read data by 40 percent, and reduced the I/O required to write data by 75 percent. For a company that is running thousands of servers, such a reduction in traffic can help make more efficient use of servers and keep hardware costs manageable.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the PC World newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags open sourcemiddlewaresoftwareFacebook

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Matthew Stivala

HP OfficeJet 250 Mobile Printer

The HP OfficeJet 250 Mobile Printer is a great device that fits perfectly into my fast paced and mobile lifestyle. My first impression of the printer itself was how incredibly compact and sleek the device was.

Armand Abogado

HP OfficeJet 250 Mobile Printer

Wireless printing from my iPhone was also a handy feature, the whole experience was quick and seamless with no setup requirements - accessed through the default iOS printing menu options.

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?