Your everyday office applications store more hidden information in the documents you create and work on than you may be aware of. Here's how to make sure you don't make data public that should remain private.
Hidden Data in Your Documents
Why You Should Care: Embarrassing comments or proprietary information may reach unintended audiences.
Scenario: You're the accountant for a company whose board of directors has asked you to provide an executive summary of expenditures and revenue--a task that requires you to collaborate with several people. The draft of your report--written in Microsoft Word--changes hands for several days as various participants insert notes, questions, and maybe even a few impolitic gripes.
You clean up the document before submitting it to the board, but the comment "Do they really expect people to buy this crap?" (along with the name of the person who added it, and the time and date of its creation) hasn't vanished--it's simply hiding.
Among the types of metadata that spreadsheets, Word documents, and PowerPoint presentations may retain in hidden (but readily discoverable) form are the name and initials of anyone who has ever written or edited the document; template data; document revisions; editing comments; the name of your computer, the company whose name appears in the license information for your copy of Microsoft Office, and the name of the hard drive or server where you saved the document.
Embedded information such as file server names or user names--which often get recycled as log-in credentials--make data theft that much easier for a corporate spy. If you wouldn't publish the details of your company's internal network for the whole world to see, why would you give away chunks of that information embedded in the documents you produce for the Web? Some of this information is readily available, while extracting other parts from the document may entail using binary-level file editors.
Fix: For Office XP and 2003, you can download Microsoft's Remove Hidden Data tool. For Office 2007 documents, you can use the Document Inspector command to view and (optionally) delete unwanted metadata remnants from Word, Excel, and PowerPoint files.
Why You Should Care: Allowing sensitive private data to reach the public through carelessness reflects badly on you.
Scenario: Academics, medical researchers, and courts of law sometimes need to publish versions of their private research notes, data, or reports for public consumption. When these documents contain sensitive personal information--names, Social Security numbers, street addresses, and phone numbers--it's often easier to redact, or cover up, those parts of the document, rather than revising the original file.
The black bars that cover the names of people, locations, and dates in printed documents are adequate for a paper report, but digital documents are another matter. There have been high-profile instances in which third parties easily uncovered supposedly redacted information because the person working with the document didn't know that anyone equipped with the full version of Adobe Acrobat can remove a black bar painted over text in a PDF file. If you need to release information publicly without disclosing sensitive details, you must take care that the redactions you make are irreversible.
Fix: NASA has the clearest online instructions explaining the process. In Word documents, it's easy to save a new copy of the file that you plan to redact: Make that sure Revisions Mode is turned off, and then type text over the text you need to redact. To redact content in PDF files, you must either use a third-party plug-in (such as Appligant's US$249 Redax utility) or cover the text with black bars in the PDF file, convert the PDF to a TIFF image, and then reconvert the TIFF to a PDF. The only downside is that readers lose the ability to search text in a PDF that has been converted in this way.