Conrad Jacoby’s E-Discovery Update: Minimizing E-Mail Archive Data Conversion Issues

Working with electronic documents is messy even under the best circumstances. Documents sometimes get corrupted as they get e-mailed from one person to another. Different versions of the same program may use different formatting information, scrambling documents in unexpected ways when conflicting pieces of header information decide to wage war. Within e-discovery, electronic document management is further complicated by the fact that most discovery document review platforms and ESI [electronically stored information] processing systems are optimized to work with specific types of digital information—to the exclusion of other types. Put more bluntly, many systems in common use today require electronic source files to be converted from one data format to another before they can be indexed, searched, or further processed.

Although some conflicts arise when transitioning between Corel WordPerfect, and OpenOffice files, data conversion problems are most common when working with electronic mail archives. Novell GroupWise, Lotus Notes, Eudora, cc:Mail, and Mozilla Thunderbird are all e-mail systems or clients that have grown less common over the past five years (or were never particularly common), and relatively few ESI review and processing systems contain full functionality to work with all these e-mail archives in their original formats. Instead, as a work-around, these e-mail archives are converted into the much more common Microsoft-based .PST format and processed that way.

Round Pegs in Square Holes

E-mail conversion is done without a second thought in many e-discovery projects, and the results are often satisfactory to both producing and requesting parties. However, each major e-mail archive architecture uses a fundamentally different method for storing information about e-mail messages, and sometimes the only way to fit the round e-mail peg into a square hole is to bang it really hard with a conversion hammer and accept that some collateral damage will occur. Converting Lotus Notes .NSF files into Outlook-friendly .PST files inevitably requires the loss of some unique information it contains, though most information remains intact. Conversely, when Lotus Notes “views” are converted into Outlook-style folders, e-mail messages that exist only once in a Notes .NSF file may be repeatedly duplicated into multiple folders, creating double, triple, or quadruple the number of actual messages to review and potentially produce. While it’s easy to fault the conversion tool that creates this duplicative e-mail, some of this problem is fundamental to the different ways in which each system stores messages. Absent the ability to work with all source documents in their original application environment, some amount of data confusion is highly likely, if not inevitable.

Avoiding Conversion Confusion

Legal teams working with ESI that requires some type of conversion would benefit from taking a number of steps to minimize the complications that arise when data must be transformed from one format to another. These strategies do not address the underlying technical issues that sometimes cause data loss—those can only be addressed through technology. Rather, they suggest ways to confront and compensate for potential issues before they become critical problems.

A. Understand whether your ESI processing includes format conversion

One misstep a legal team may take, whether it’s producing or requesting ESI, is not researching whether relevant ESI has been converted from its original format. A legal team that does not understand what artifacts are injected into the ESI during its processing is a legal team that will have trouble explaining the significance—or insignificance—of how the data is presented. For example, e-mail archives converted into Microsoft .PST format may show all mail items into a single inbox folder, even if these materials were separated into multiple categories in the original system. On production or on review, this may make it appear that the producing party selectively produced only portions of a custodian’s total e-mail—even though this is only an artifact of the conversion process and all e-mail messages and header information is present. Similarly, transforming Lotus Notes mail “view” information into separate folders within a Microsoft .PST file may make it appear that the producing party is trying to bury the requesting party with unnecessarily duplicative information, when the actual objective is to preserve the insight a reviewer might get by seeing how a custodian organized his or her data.

Asking about data conversion is also a good way to validate the service bureau with which the legal team is working. A reputable vendor should be able to explain exactly which, if any, electronic source files they cannot process in native format. Moreover, a conscientious vendor will be able to explain the potential data loss resulting from the processes they use, either directly or by bringing in a support representative from the developer of the conversion utility that they use. Practitioners should be leery of service bureaus that blithely state that they process all ESI in native format without providing supporting details. Answers like that suggest that the vendor doesn’t fully understand what its tools are doing.

B. Keep the requesting party updated

Miscommunication is the root of most e-discovery disputes, and ESI format conversion is no different. A producing party should inform the requesting party as soon as it becomes aware that ESI being prepared for production will be passed through a conversion utility. The disclosure need not be full of technical jargon; it may be enough to reveal that the source files are being sent to a specific vendor for processing. However, it can only benefit the producing party to also mention if source files in less-common formats have been collected and are being prepared for production. Creating a clear record of disclosure will help support any showing of reasonable and defensible measures that were taken. In addition, mentioning that some of the source files are in less-universal formats like Novell GroupWise and Lotus Notes may prompt the requesting party to explicitly request that these materials be produced in an alternate format, reinforcing the need to convert the materials from their source format and removing a potential disputed issue.

C. Preserve a copy of the original source data

Most ESI conversions are relatively automated and rarely challenged. However, as a litigation tactic, it’s always possible that the authenticity and accuracy of a key piece of digital evidence will be challenged. A producing party can fend off such challenges by retaining a copy of the digital source materials in the exact format they were harvested. Keeping such an archive makes it possible to retrace steps that were taken to process key documents from their origins to the point of production. Repeating a process with a fresh copy of the source data will easily confirm or dismiss the possibility that extraneous data was introduced or data lost as the result of format conversion.

One problem with this suggestion is that ESI harvested from large organizations may be voluminous, and it may seem wasteful to some clients or legal teams to keep this data around when it is not actively accessed and may be subject to additional legal holds. However, the relatively modest expense of purchasing hard drives to hold these files is a fairly reasonable insurance policy against the catastrophe of having key evidence unavailable for substantive use in filings and at trial.


Technology continues to advance, and ESI processing tools are gaining sophistication in working with more and more native formats at the same time that many organizations are phasing out their less-mainstream software. Data conversion will be an ongoing issue as long as companies develop their own systems or litigation requires digging back into ancient (4-year old) backup tapes. However, by recognizing these issues, lawyers can proactively work to keep them from disrupting the development of their legal case.

Posted in: Case Management, Computer Security, Conflicts, Digital Archives, Disaster Planning, Discovery, E-Discovery, Email, Email Security, Information Architecture