Greg Osinoff, Vice President of Sales for DolphinSearch
Lynn Frances, Director of Technical Communication for DolphinSearch
Discovery, as most people know it, is becoming a thing of the past. When we think of discovery, we often think of boxes and boxes of paper and a room full of people reading documents, but with the advent of the electronic office, discovery today involves hard drives, PST files (email mailboxes), and OCR (optical character recognition) of paper documents.
According to every major research report, the average number of email messages received by end-users is rising dramatically. Ferris Research estimates the average number of email messages received by end-users is around 34 per day. The average size of these messages has increased 192% to 286 kilobytes. Research firm IDC predicts that e-mail volume will nearly quadruple to 35 billion messages per day by 2005. The growth trends are just explosive!
With the stark reality of a digital future staring us in the face, attorneys are forced, in ever increasing numbers, to turn to technology to assist them in the control, organization and review of this information. Yet generally, attorneys do not fully understand the new challenges that are facing them.
The size of these digital collections can be paralyzing. It is quite possible that a Gigabyte of Outlook email files (known as PST files) can be the equivalent of 150,000 pieces of paper. It is not unheard of anymore, for a law firm to be faced with 10, 20 or more gigabytes of electronic data in a single litigation.
As electronic discovery becomes increasingly more common and the volume of data continues to climb, attorneys will be forced to turn to technology to assist them in the review process, just to get through the collection. It seems logical that the data mining tools available to them will need to become smarter as well.
As the litigation support industry has gone electronic, many firms have utilized software like Concordance or Summation to create databases of, and organize, the discovery documents. To accomplish this, dozens of coders must read each document, determine what they consider to be the relevant issues contained in the document, then input that information, along with basic document metadata: date, author, document type, into the database.
When asked about the efficacy of these kinds of products when searching for relevant information, Clayton Morehead, Manager of Litigation Support and Client Technology for Carlton Fields, LLP, lamented that “you can’t search the text of the documents unless they’ve been OCR’d. These technologies rely on the quality of the coders. At some point, you’re going to lose important information.”
Recognizing this problem, LexisNexis partnered with a technology firm called DolphinSearch, to bring a comprehensive solution to the litigation support industry. LexisNexis Litigation Support by DolphinSearch provides full text concept search for electronically assisted subjective coding (issue, relevance, privilege coding).
The addition of metadata extraction and TIFF generation, makes this offering complete. “What’s most attractive is the concept search,” said Morehead. “It’s so phenomenal. It’s taking the legal team some time to adjust to how powerful it is and how well it works.”
To illustrate this, he described a training session in which he asked the Legal Assistants to throw out possible searches. Someone suggested that they search for a Senate bill related to a particular case.
Morehead was amazed to discover that the system returned documents that didn’t mention the bill specifically, but did contain the name of the legislator who had proposed it. “Although the name of the bill did not appear anywhere in the document, DolphinSearch technology was able to learn that this legislator was integrally tied to the subject matter we were looking for. This technology is able to pick out information that, until now, would have been painstakingly found,” said Morehead.
Carlton Fields used LexisNexis Litigation Support by DolphinSearch for a large case and saved their client more than $125,000. The firm was also able to cut the coding time from seven months to two months. The technology, however, is equally suited for both large and small cases. “Small clients that don’t want to spend the money on traditional coding can have DolphinSearch do the coding for them,” said Morehead. “It’s also very appropriate for large cases that would require months of coding. The client might not be averse to the costs of manual coding, but might be time averse.”
Until recently, searching for relevant information in discovery document sets has been a slow and deliberate process. But the application of advanced technology is taking litigation support to new levels.
The system is capable of performing four different types of search: Concept Search delivers documents containing the query term and related terms, Word Search delivers documents containing or excluding specific words, Field Search searches by specific metadata, Combination Search combines any of these three search types.
The results list is customizable. The user decides what document data is displayed. The system always shows the title, document relevance ranking (it’s level of relevance to the search query) and the number of query terms appearing in the document (not the number of occurrences of the terms). From this screen, individual documents and groups of documents can be assigned a tag (issue, privilege, non-responsive) or added to a folder.
The exploded document is a tool used to find the most relevant section. While keeping the document order intact, it allows the user to navigate through the relevant sections of the document. This dramatically saves review time for issue and privilege coders. Context is King
Ultimately the quality of a successful search is directly related to the context in which a question is being asked.
To illustrate this point, think of the properties of a basketball. Now, think of the properties of a basketball if you were on the Titanic and it was sinking. The fact that the basketball is a floatation device probably never entered your mind when the question was originally asked. While the question was exactly the same, the results of a successful search are really dependent on the context of the question itself.
The Human Mind – A More Powerful Database Model
The human mind is a much more sophisticated model than the databases we currently employ. A good portion of its power is derived from how it is designed. This design is known as a neural network.
First, the human mind disambiguates the meaning of a word by looking at it in relation to the words that surround it. In other words, meaning is derived by patterns of words or context – not by individual words.
For example, the word “strike” by itself is ambiguous. Without context, you do not know if the speaker is talking about baseball or a labor issue. However, if he said, “Roger Clemens was throwing strikes,” you are able to understand the meaning of the word by the pattern of words around it (the context).
Second, the brain processes information very differently from a database, in that a neural network is designed as an interconnected system of processing elements. These elements are self-organizing and learn from the data itself.
Rather than being pre-programmed, these systems learn to recognize patterns. The patterns represent context and meaning.
Therefore, we have the ability to learn relationships between words, so that most people will know that you are speaking of Kofi Anan when you refer to the UN Secretary General.
A neural network’s primary advantage is that it can solve problems that are too complex for conventional technologies … problems that do not have an algorithmic solution, or for which an algorithmic solution is too complex to be defined.
What a Litigator NeedsIt seems that what litigators need, in order to truly automate the data mining process, is a tool with the same kind of pattern recognition capacity and the ability to learn ad hoc relationships that the human mind possesses.
Imagine being able to ask your search system to learn everything there is to know about George Bush. The system learns, on the fly, that he is sometimes called the President or “W”. More importantly, the system delivers those documents back to you even though the document does not refer to him as George Bush.
This type of natural intelligence is the next “holy grail” in search. It will be these types of technologies that allow attorneys to manage the ever-increasing volumes of data confronting them.
The promise of neural networks, coupled with pattern matching that is integrated into full text searching offers us the potential of artificial learning modeled on the human brain.
This translates into great benefits for attorneys. “With DolphinSearch, our legal assistants and attorneys can be more concerned with legal aspects of the case rather than reading each and every document and recording it,” Morehead concluded. “This enables our legal team to focus more on the details of the case and less on the details of the documents.”