Bush's exit to put new e-records system to the test
- 24 November, 2008 09:06
For members of the Bush administration, January 20, 2009, marks the end of a job. However, for the staff of the US National Archives and Records Administration (NARA), it's just the beginning of a project unprecedented in size and scope: sorting, indexing, preserving and ensuring access to all the records, both paper and electronic, created by the administration over the past eight years.
In some ways, this is nothing new. Since 1978, when the Presidential Records Act was established, NARA has been tasked with taking custody of, controlling, preserving and providing access to all presidential and vice presidential records that have administrative, historical, informational or evidentiary value. The act requires that the day the president leaves office, presidential records become the legal responsibility of the archivist of the US.
However, given the rise in electronic communications, the volume of electronic records has exploded. Consider that NARA received only a few hundred thousand e-mail messages from the first Bush presidency and 32 million from the Clinton White House, according to Ken Thibodeau, director of NARA's Electronic Records Archives (ERA) Program, whose mission is to meet the many challenges stemming from increasing use of computers in government, including building a new archiving system, scheduled for completion in 2011. In comparison, it expects a whopping 140TB of data from the current Bush administration, more than 50 times what it received from the Clinton years. About 20TB of that is e-mail, Thibodeau says.
It hasn't helped that the Bush administration has been slow in providing NARA with needed information about the types and volume of data that will need to be archived. It wasn't until this summer that an intensive effort began to share information, Thibodeau says.
Much of the discussion has centered on how the White House will provide records in a format that is reasonably easy to use, since some of the systems are highly proprietary. "There's still some risk that some of it may not work exactly right, but we have a contingency plan: If that happens, we'll re-create the systems they have and access the records that way," he says.
Adding to the drama, questions have been raised about millions of missing e-mails from between March 2003 and October 2006. In early November, a lawsuit brought by Citizens for Responsibility and Ethics in Washington and the National Security Archive was upheld, challenging the White House's failure to properly store and recover millions of emails. In 2002, the Executive Office of the President stopped using the Automated Records Management System that had been in place since 1994, which automatically backed up all e-mails, but failed to install any other backup program.
Page BreakBut despite the controversy and opinions to the contrary, Thibodeau says NARA is prepared. In 1998, NARA began the process of building a system to preserve all types of electronic records created anywhere in the US government, enable online transactions and collaboration with other agencies over the life cycle of government records, and provide access to these records to the public and government officials. The system, scheduled to be built in five increments, is slated for completion in 2011. The first increment, just completed in June, provides functional archives to preserve electronic data in its original format, enables disposition of agreements and scheduling, and receives unclassified and sensitive data from federal agencies.
By December 5, the second increment that will handle the presidential records portion of the ERA system will be ready for the onslaught -- or as ready as it can be "when you're staring at 100TB of data bearing down on you," Thibodeau says. Even in this increment, however, the system will be used just by NARA staff and four pilot agencies, with public access slated for a later release.
The US Government Accountability Office (GAO) has questioned the ERA's readiness, especially since the project has endured some bumps along the way, including delays and cost overruns estimated at US$16.3 million. The life-cycle cost for the complete ERA system, scheduled to be completed in 2011, has been estimated at US$453 million, including development contract costs, program management, research and development, and program office support.
As recently as September, after studying the system's progress, the GAO urged NARA to create a mitigation plan in case it could not process the incoming records by January 20, 2009. In a report to the congressional committees (download PDF) , the GAO said, "If it cannot ingest the electronic records from the Bush administration in a way that supports the search, processing and retrieval of records immediately after the presidential transition, it will not be able to meet the requirements of the Congress, the former and incumbent presidents, and the courts for information in these records in a timely fashion."
Thibodeau says there is no noteworthy risk that the system would not be ready. If there are data formats the system can't ingest and index in a reasonable amount of time, he says, the short-term solution will be to recreate the applications used for those records and preserve and provide access for them that way.
The human element
But as Deb Logan, an analyst at Gartner, points out, system readiness is one thing; human limitation is another. According to Logan, the onerous task will be sorting through the unclassified and unprocessed data that the Bush administration will leave behind. The fact is, she says, the federal government itself has insufficient records management practices and systems in place, which means they'll basically be dumping raw data on NARA.
"It would be one thing if the stuff had to be moved seamlessly to a records repository, but it's just eight years of stuff," she says. "It will be nearly impossible to get it under control without a massive expenditure of human resources because the technology is not there."
According to NARA, it took about 400 days to process just the 2TB of data it received from the Clinton administration. Since it had no system at the time, it archived this data by recreating the Clinton administration's computer systems that originally held the records -- 17 in all -- and developed simple search interfaces that NARA personnel could use to access requested information.
Logan says part of the blame lies with federal agencies themselves, pointing to a GAO survey that concluded federal agencies have failed across the board to fulfill their records management obligations, "not out of malice or neglect but out of the nature of the volume of electronic communications and the time frame in which they have to do it," she says. "Anyone who's putting an optimistic face on the job is not being realistic."
Optimism may be relevant from a technology point of view, she acknowledges, but not from an information management point of view. "From my side of NARA, I don't deal with what's in the records, just whether we can get them into the system," she notes. "We allow the library staff to deal with the content."
An unprecedented effort
The system itself had its challenges, which Thibodeau says are a natural outcome of creating a system the scope and scale of the ERA. After all, the system is not just intended to preserve presidential records.
Under the Federal Records Act, it also works with federal agencies to preserve all of their relevant records, which amounts to about 2% of all the records they create. These records are submitted, appraised and archived continuously, not in batch modes at the end of each term, as presidential records are.
Page BreakThe system is charged with the following:
- Ingesting electronic records from federal agencies.
- Managing records storage in a way that guarantees their integrity and availability.
- Enabling users to search descriptions and business data about all types of records and to search and retrieve their contents.
- Supporting records management functions such as scheduling, appraisal, description and requests to transfer custody.
- Preserving records in the formats in which they were received, as well as creating backup copies for off-site storage.
To that end, the system is a mix of off-the-shelf and custom-built components, based on a service-oriented architecture and incorporating Oracle's database technology, EMC's Documentum for records management, search technology and a Web-based front end. It also incorporates a hierarchical storage system from Hitachi that blends servers from EMC, Hitachi and Sun, as well as the Hitachi Content Archive Platform, which automatically indexes records as they enter the system, enabling immediate search capability.
The first glitch with the system was a missed deadline by Lockheed Martin, which NARA contracted with to build the system, in September 2007 (see timeline at the end of this story). Thibodeau says this occurred in part because shortly after rewarding the contract to Lockheed, NARA discovered it needed to cut the budget in half, which resulted in rescoping the system's initial capabilities. This effort took the better part of a year, according to Thibodeau, as well as the time and attention of Lockheed engineering management.
To speed things up, NARA and Lockheed also decided to use a two-pronged approach to developing the system. In this approach, the first prong -- or the base system, which was completed in June -- manages record schedules, requests record transfers and stores records. NARA plans to beta-test this system for a year, working with just four agencies from which it accepts records. So far, Thibodeau says, there have been 16 records transfers. Other functionalities, such as the ability to automatically inspect and appraise records, were delayed for later increments.
The second prong is the system dedicated to the presidential records, originally called the Executive Office of the President (EOP) system, and now referred to as Search and Access ERA. This system is being developed in parallel with the base system, and the two will be merged as originally envisioned by 2011.
Page BreakTesting was completed in early November, Thibodeau says, although security testing is still ongoing. "There were no show-stoppers, so we're optimistic that we'll turn it on in December before the onslaught in January," he says. Other functionality will continue to be built through 2011. If NARA is rewarded its next appropriation of money, it expects to build the public-access capability within a year, Thibodeau says.
Other slowdown factors
Adam Jansen, president of Dkives Consulting, agrees that the phased approach is the way to go. Formerly the digital archivist for the state of Washington, Jansen built an electronic records archiving system for the state that serves 750 users and stores 75 million records -- from 150-year-old census books to e-mails accumulated over the most recent governor's eight-year term. The system stores a million Web pages from 400-plus agencies, and the state is about to release several hundred hours' worth of searchable full text, digital audio and tape of legislative committee hearings.
He calls NARA's project "a hugely ambitious project, and it's very difficult to bite off that big of a chunk all at once," he says. While with Washington's state government, Jansen says his team started with a few types of records and expanded from there. In four years, he says, the system went through three distinct iterations, with tweaking and reinventing along the way, especially when it came to ingesting records.
As for the ERA's bumpy history, Jansen also faults government bureaucracy and NARA's failure to seek out advice from others who had implemented such systema. "There were people who'd done research, and I'm not sure the lessons learned were researched and taken to heart," he says. "Having run a program similar to this for five years in Washington, I got almost no interaction with them despite efforts to do so."
But Thibodeau says when the project began in 1998, there was little information available. At that time, it took the agency two years just to research the feasibility of developing such a system, and it created a program management office to support it. "The biggest system we'd acquired before this was under $10 million," he says. "When you're doing something over $100 million, it's much more complicated, so we wanted to make sure we were competent to do it."
NARA also dedicated three years to eliciting and validating requirements, which culminated in inviting both the IT industry and the general public to comment on the requirements, he says. During this phase, NARA also organized two conferences, one for prospective users and another for industry, to discuss its plans and get feedback. "We wanted there to be no question of what we were building," Thibodeau says, claiming there have been no changes to the requirements over the course of the system build.
Countdown to January
Logan says the problem of managing electronic records won't be resolved until the government agencies themselves do a better job of electronic records management, including classifying, de-duplicating and purging data through the use of systems such as archiving, records and policy management, content monitoring/filtering, and content analytics tools.
Right now, she says, it's too easy to just keep buying more storage and keeping everything, and what's important to keep is intertwined with what's trivial. Not to mention that with no clear guidance or policy on data handling, she says, there's the risk of political appointees in outgoing administrations shredding data rather than turning it over.
"We've created a huge volume of stuff, and it's going to be impossible to sort it with any level of precision," she says. "The longer it sits around, the more you lose context and run the chance that the data formats will become extinct. I think the result will be a great loss of information for the future." Logan apologizes for seeming so pessimistic, "but I've been covering this for nine years, and the progress has been minimal."