Four companies rethink databases for the cloud
- — 24 June, 2011 10:42
Several companies are developing new database technologies to solve what they see as the shortcomings of traditional, relational database management systems in a cloud environment. Four of them described the approaches they're taking during a panel at the GigaOm Structure conference on Thursday.
The basic problem they're trying to solve is the difficulty of scaling today's RDBMS systems across potentially massive clusters of commodity x86 servers, and doing so in a way that's "elastic," so that an organization can scale its infrastructure up and down as demand requires.
"The essential problem, as I see it, is that existing relational database management systems just flat-out don't scale," said Jim Starkey, a former senior architect at MySQL and one of the original developers of relational databases.
Starkey is founder and CTO of NimbusDB, which is trying to address those problems with a "radical restart" of relational database technology. Its software has "nothing in common with pre-existing systems," according to Starkey, except that developers can still use the standard SQL query language.
NimbusDB aims to provide database software that can scale simply by "plugging in" new hardware, and that allows a large number of databases to be managed "automatically" in a distributed environment, he said. Developers should be able to start small, developing an application on a local machine, and then transfer their database to a public cloud without having to take it offline, he said.
"One of the big advantages of cloud computing is you don't have to make all the decisions up front. You start with what's easy and transition into another environment without having to go offline," he said.
NimbusDB's software is still at an "early alpha" stage, and Starkey didn't provide a delivery date Thursday. The company expects to give the software away free "for the first couple of nodes," and customers can pay for additional capacity, he said. Its product is delivered as software, rather than a service, but not open-source software, Starkey said.
Xeround aims to solve similar problems as NimbusDB but with a hosted MySQL service that's been in beta with about 2,000 customers and went into general availability last week, said CEO Razi Sharir. It, too, wants to offer the elasticity of the cloud with the familiarity of SQL coding.
"We're a distributed database that runs in-memory, that splits across multiple virtual nodes and multiple data centers and serves many customers at the same time," he said. "The scaling and the elasticity are handled by our service automatically."
Xeround is designed for transactional workloads, and the "sweet spot" for its database is between 2GB and 50GB, Sharir said.
Its service is available in Europe and the U.S., hosted by cloud providers including Amazon and Rackspace. While Xeround is "cloud agnostic," cloud database customers in general need to run their applications and database in the same data center, or close to each other, for performance reasons.
"If your app is running on Amazon East or Amazon Europe, you'd better be close to where we're at. The payload [the data] needs to be in the same place" as the application, he said.
Unlike Xeround, ParAccel's software is designed to run analytics workloads, and the sweet spot for its distributed database system is "around the 25TB range," said CTO Barry Zane.
"We're the epitome of big data," he said. ParAccel's customers are businesses that rely on analyzing large amounts of data, including financial services, retail and online advertising companies.
One customer, interclick, uses ParAccel to analyze demographic and click-through data to let online advertising firms know which ads to display to end users, he said. It has to work in near real-time, so interclick runs an in-memory database of about 2TB on a 32-node cluster, Zane said. Other customers with larger data sets use a disk-based architecture.
ParAccel also lets developers write SQL queries, but with extensions so they can use the MapReduce framework for big-data analytics.
"SQL is a really powerful language, it's very easy to use for amazingly sophisticated stuff, but there's a class of things SQL can't do," he said. "So what you've seen occurring at ParAccel, and frankly at our competitors, is the extensibility to do MapReduce-type functions directly in the database, rather than try to move terabytes of data in and out to server clusters."
Cloudant, which makes software for use on-premise or in a public cloud, was the only company on the panel that has developed a "noSQL" database. It was designed to manage both structured and unstructured data, and to shorten the "application lifecycle," said co-founder and chief scientist Mike Miller.
"Applications don't have to go through a complex data modelling phase," he said. The programming interface is HTTP, Miller said. "That means you can sign up and just start talking to the database from a browser if you wanted to, and build apps that way. So, we're really trying to lower the bar and make it easier to deploy."
"We also have integrated search and real-time analytics, so we're trying to bring concepts from the warehouse into the database itself," he said.
The company's software is hosting "tens of thousands of applications" on public clouds run by Amazon EC2 and SoftLayer Technologies, according to Miller.
Cloudant databases vary from a gigabyte all the way to 100TB, he said. Customers are running applications for advertising analytics, "datamart-type applications," and "understanding the connections in a social graph -- not in an [extract, transform and load] workflow kind of way using Hadoop, but in real time," he said.
While cloud databases can solve scaling problems, they also present new challenges, the panelists acknowledged. The quality of server hardware in the public cloud is "often a notch down," said Zane, so companies for whom high-speed analytics are critical may still want to buy and manage their own hardware, he said.
And while many service providers claim to be "cloud agnostic," the reality is often different, Miller said. Cloud software vendors need to do "a lot of reverse engineering" to figure out what the architectures at services like Amazon EC2 look like "behind the curtain," in order to get maximum performance from their database software.
Still, Sharir and Zane were both optimistic that "big data analytics" would be the"killer application" for their products. For Starkey it is simply "the Web."
"Everyone on the Web has the same problem, this very thin pipe trying to get into database systems," he said. "Databases don't scale, and it shows up in a thousand places."