Providing protection and ensuring timely recoverability of databases has always represented a unique challenge for IT. Unlike file system data, databases usually appear to backup applications as large monolithic containers, and as data volumes have increased, so has the problem.
Not too many years ago, a database of 50GB to 100GB in size was considered huge. In contrast, today it is no longer rare to see terabyte-size transactional databases and data warehouses in the tens of terabytes. So, how do you effectively and efficiently protect such environments? Traditional approaches, such as nightly backups, are simply inadequate, and the currently accepted best practice of multiple split mirrors (e.g. BCVs) can be too costly at today's capacity levels.
The first step in developing any protection strategy is, of course, to understand recovery requirements (recovery time objective and recovery point objective). With very large databases, it is also particularly important to seriously consider retention requirements. We routinely encounter environments that maintain copies of databases that are 30 days old or more. Often, upon further investigation, you realize that these older copies are of little or no real business value. Clearly, maintaining such copies of multiterabyte databases can become absurdly wasteful.
As strange as it may sound, in some cases, it may be possible to avoid recovery entirely. Consider the example of a data warehouse where original source data may be readily available, making it feasible and more cost-effective to re-create the warehouse rather than to recover it.
In other situations, leveraging a hybrid of several well-established technologies, such as database log shipping in conjunction with snapshot capabilities (far less storage-intensive than mirrors) can adequately provide the required levels of protection against both physical and logical data loss. Additionally, newer approaches like continuous data protection are well worth investigating.
On another level, one might question the wisdom of building enormous monolithic databases. There are a growing number of examples of application environments that employ an architecture consisting of multiple smaller databases with a common parallel query layer, making protection as well as scalability easier. Unfortunately, these design decisions are typically made several levels above those lowly individuals tasked with ensuring data protection, who still have to be able to recover -- bad design or not.
Jim Damoulakis is chief technology officer at GlassHouse Technologies, a provider of independent storage services. He can be reached at firstname.lastname@example.org.