Overview - BCS Technology Assisted Review

Moving Litigation Forward 1

Technology Assisted Review: Don’t Worry About the Software, Keep Your Eye on the Process

By Joe Utsler, BlueStar Case Solutions

Technology Assisted Review (TAR)

has become accepted widely in

the world of litigation support

because it offers the promise of

making eDiscovery less burdensome

to litigation teams and less costly

to litigants.

The loudest buzz around TAR is the amazing software breakthroughs that have been achieved in recent years, such as the emergence of predictive coding tools. Yet while these tools have certainly propelled TAR toward the zenith of its utility, TAR is about so much more than just predictive coding.

In fact, as these new software tools come to market, they are likely to become increasingly similar and perhaps even somewhat interchangeable. This is not a criticism of the software vendors in the legal services marketplace; it’s just a reflection of the fact that our industry is full of very smart people who know where the technology is headed and are able to quickly develop cutting-edge software tools that meet their customers’ changing needs.

While the technology will inevitably become quite similar from one eDiscovery services provider to the next, there are two other sides to the triangle that will not: People and Process. Indeed, when it comes to the ability to successfully implement TAR in an eDiscovery workflow, the key is having the right experts and the most effective process in place in order to make it work.


The TechnologyMagistrate Judge James C. Francis of the Southern District of New York recently provided an important perspective on the importance of understanding the technology that makes eDiscovery possible: “eDiscovery is pervasive. It’s like understanding civil procedure. You’re not going to be a civil litigator without understanding the rules of civil procedure. Similarly, you’re no longer going to be able to conduct litigation of any complexity without understanding eDiscovery. The absence of technical knowledge is a distinct competitive disadvantage.”

It’s certainly not necessary—or even possible—for all members of a litigation team to be conversant in the arcane particulars of how software tools allow eDiscovery professionals to collect, cull, filter, review, produce and archive electronic evidence. But a basic understanding of the technology in TAR is important.

TAR is an eDiscovery process that involves alternating human and computer review of electronic documents. “The value, accuracy and precision promised by TAR are highly contingent on the superb skills of the lawyers training the software on a sample set of documents,” according to Jeremy Schaper at BlueStar Case Solutions. “Once the sample set is analyzed by attorneys, predictive coding [software] utilizes algorithms that mirror complex thinking and continuous fine-tuning to provide more accurate and consistent results than traditional manual review.”

There are a number of software tools on the market that power TAR and, although each one has slightly different capabilities, there are six core technology functions that litigation professionals should understand in order to grasp the technology behind TAR:

1. Clustering – This analytics technology places the documents into groups or categories of related materials. Clustering allows for similar documents to be reviewed together and for content-based prioritization of the review.

2. Email Threading – An email thread contains all of the emails sent between correspondents and is very useful for documentation purposes, but becomes an annoyance in eDiscovery since the large numbers of replies can become overwhelming. Email threading technology groups email and attachments into sequential conversations, allowing reviewers to read the entire interchange together. Threading technologies typically allow reviewers to suppress duplicate copies of the same email, while still tracking every instance of the document. Some email threading tools can further reduce the number of documents the reviewer needs to look at by identifying the minimum number of documents that need to be read in order to see the entire conversation.

3. Near-Dupes – As the name suggests, this technology examines a set of documents and identifies those that are “near duplicates” of each other in order to save the time of having to review each one independently. While it is sometimes true that reviewers will decide to “bulk tag” documents that fall within a certain level of similarity (say, 98% or greater), it is typically true that they will still review all documents, but group highly similar documents together, in order to ensure consistency. Documents may be scored based on either the first document that occurs in the collection or on the median document identified by the sofware.

4. Concept Search – There are really two approaches that are referred to as “concept search”—the first uses straight math to identify a pattern, which is essentially described above as “clustering”. The second approach is the semantic approach, which tries to determine the intent and context of a document (or text block),


and then uses linguistic tools—such as a thesaurus—to identify other documents that contain the same ideas. This allows users to find related results, even if the matching document doesn’t contain any of the same terms as the example text (e.g., abbreviations, misspellings, acronyms, synonyms, etc.). For example, query expansion takes a keyword or technical term and creates a list of all related terms so researchers can expand their search for documents containing those terms as well.

5. Machine Learning – This is the crucial method underpinning what is often referred to as “predictive coding”—the most-hyped aspect of TAR. Predictive coding uses human expertise to define case-specific classifications (e.g., privileged, responsive or non-responsive) within a subset of documents, which are then used as examples to “train” the software. The software then applies the classification model in order to assess and intelligently categorize the entire document collection. This obviously creates a tremendous amount of time savings by surfacing the most relevant documents as quickly as possible.

The PeopleBy its very nature, TAR is fueled by innovative technology. As previously noted, some TAR processes revolve around the use of computerized systems that “learn” from small document sets and then extrapolate those rules to a large document collection. Other TAR processes incorporate statistical models and sampling techniques to group and cull the document population. The common thread is the reliance on state-of-the-art software and technology solutions.

Given the central role that technology plays in the brave new world of TAR, it’s easy for us to overlook a simple truth: the human role in TAR is the most important one.

The fact is that any TAR process is only as good as the human beings who build, develop and operate the review. As with any tool, it is crucial that the TAR software is deployed correctly and appropriately. All statistical methods entail assumptions that must inform their use. Algorithms and models can mirror human decisions, but a machine can’t decide on its own what is relevant to a particular matter and what is not; that requires subject matter expertise and experts capable of nuanced human judgment. Only then can a TAR model be applied to the larger universe of documents to generate useful results.

This human training of the TAR model includes an understanding of the case issues, cultivation of an accurate “training set” for the review, making adjustments to the TAR model on the fly, and ensuring thorough quality control on the back end of the workflow. Human reviewers are also necessary to assess the outputs from the TAR system and to make informed decisions about what to do with documents. Machines may be faster and more consistent than humans, but they don’t have the creativity and discernment necessary to make narrow judgments about the critical question of relevance of a document.


Despite the impressive development of TAR, it cannot replace attorneys. Rather, it is a set of tools that enhances the efficiency of a great practitioner when used properly, and wastes resources when used incompetently. While computers are faster and more consistent than humans, they cannot exercise creativity or discern exceptions.

The name says it all: Technology Assisted Review. The technology is crucial, but it’s just an assistant. The real secret to the success of TAR in eDiscovery is the role played by the human beings who make it happen.

The ProcessThe standard BlueStar workflow behind a predictive analytics TAR review couples leading technology with expert attorney reviewers who have an in-depth understanding of the underlying issues in a case. It follows the standards set out in the Computer Assisted Review Reference Model (CARRM). In this model, a sample of the documents to be reviewed is collected as the basis for training the TAR system.

In order to be effective, the sample needs to effectively represent the entire body. The sample is collected through a combination of random selection and the use of keywords and concepts. It should be large enough that observations about the sample are representative of the whole—that is, if 25% of the sample set is responsive, you would expect that the entire collection contains 25% responsive documents, give or take a few percentage points. The preparation of this statistically significant “seed set” is critical, since it is the basis for the decisions that the TAR project will rely upon.

Once that is done, an experienced human review team codes a seed set of the documents in a case for responsiveness to an issue. The TAR technology then uses those coding decisions to analyze the entire collection and “predict” the way the review team would code the remainder of the collection. Next, the human team does

a second “validation” round of review, and their results are compared against the TAR decisions to assess the accuracy of the process.

If the system’s results do not match those of the human review team closely enough (typically a 95% confidence level), additional validation rounds are performed. The TAR system learns from each subsequent round, fine-tuning its responses until it can reliably predict how the expert attorney reviewers would evaluate a document. Once the desired confidence level is reached, the documents may move on to a final prioritized review of the entire collection or, if the TAR coding is going to be used for production, one more statistical sample is taken for a final review to confirm that the results are consistent and reliable. If prepared properly, the TAR system will have learned to code the documents as well as the original human reviewers, often at a significant saving of both time and money.


1. Set Goals

The process of deciding the outcome of the Computer Assisted Review process for a specific case. Some of the outcomes may be reduction and culling of not-relevant documents, prioritization of the most substantive documents and quality control of the human reviewers.

2. Set Protocol

The process of building the human coding rules that take into account the use of CAR technology. CAR technology must be taught about the document collection by having the human reviewers submit documents to be used as examples of a particular category, e.g. relevant documents. Creating a coding protocol that can properly incorporate the fact pattern of the case and the training requirements of the CAR system takes place at this stage. An example of a protocol determination is to decide how to treat the coding of family documents during the CAR training process.

3. Educate ReviewerThe process of transferring the review protocol information to the human reviewers prior to the start of the CAR Review.

4. Code DocumentsThe process of human reviewers applying subjective coding decisions to documents in an effort to adequately train the CAR system to “understand” the boundaries of a category, e.g. relevancy.

5. Predict ResultsThe process of the CAR system applying the information “learned” from the human reviewers and classifying a selected document corpus with pre-determined labels.

6. Test Results

The process of human reviewers using a validation process, typically statistical sampling, in an effort to create a meaningful metric of CAR performance. The metrics can take many forms; they may include estimates in defect counts in the classified population or use information retrieval metrics like Precision, Recall and F1.

7. Evaluate ResultsThe process of the review team deciding if the CAR system has achieved the goals anticipated by the review team.

8. Achieve GoalsThe process of ending the CAR workflow and moving to the next phase in the review lifecycle, e.g. privilege review.

The Major Steps in the Computer Assisted Review Reference Model (CARRM) Process

Source: www.edrm.net/resources/carrm


Conclusion

When TAR burst onto the litigation support scene, it was primarily viewed as an option only for large-scale complex litigation. There was truth to this at the time and, indeed, some of the early use cases for TAR were built around commercial litigation with huge volumes of electronic data that needed to be collected and reviewed.

That initial premise isn’t true anymore. TAR is becoming increasingly appropriate for a broader scope of cases. There are a few key drivers for this evolutionary trend:

• Cost – the software needed to power TAR has become less expensive as more vendors have entered the space and pricing has become more competitive.

• Regulations – the new FRCP guidelines for eDiscovery make it possible to use smaller document samples, which allows litigation teams to obtain more accurate results with data sets that are appropriate for smaller cases.

• Scale – a variety of new efficiencies are now possible as litigation support teams learn how to deploy a lean TAR-fueled workflow, unlocking important economies of scale that make it feasible to take TAR “downstream” to a wider range of matters.

There are a couple of important implications in this trend. First of all, it has the potential to equalize the playing field between larger and smaller law firms. There is no question that solo practitioners and small law firms—the firms that tend to handle the smaller litigation matters—are the least equipped to deal with the explosion of electronic data. This is pure economics: larger firms have the depth of resources to invest in eDiscovery technologies and professionals that smaller firms just don’t have. However, as TAR becomes more viable for a broader range of commercial litigation, it will likely become an extremely potent weapon in smaller firms’ efforts to hold their own when it comes to electronic discovery.

Second, it may also introduce tremendous efficiencies into smaller-scale business litigation. These disputes, which are just as critical to the business success of the plaintiffs and defendants involved as big-ticket litigation is to a Fortune 500 company, have previously not been appropriate for TAR in the eDiscovery workflow. As the circumstances change and TAR goes downstream to smaller matters, litigation parties in smaller disputes may be able to take advantage of the same review technologies as litigants in complex commercial litigation.

TAR isn’t just for the huge and complicated matters anymore; it’s becoming increasingly valuable and cost-effective for a wide range of cases. But no matter how large the case or complex the matter, the technology used in TAR is just one piece of the puzzle. In order for it to work, the right people must deploy it using the right proces.

BlueStar Case Solutions is a leader in bringing together the best technology, people and process to make TAR work. With BlueStar as your partner, you will have the best eDiscovery tools and personnel on the case, including dedicated project managers, experienced eDiscovery analysts, and high-capacity processing facilities with 24/7 operations.

For more information, please go to www.bluestarcs.com

7226 South Wabash Avenue, Suite 500 • Chicago, IL 60604 P: 866.287.9496 • F: 312.939.3220 • [email protected] • www.bluestarcs.com

Moving Litigation Forward

Joe Utsler is BlueStar’s head of business development and strategy. Mr. Utsler

regularly consults with law firms, corporations and government agencies across the

nation on matters ranging from workflow optimization to information governance.

He has dedicated 23 years in the discovery and legal technology space, and is

considered an industry expert.

Mr. Utsler has been a driving force in the development and direction of litigation

review technology, having spent several years as the Product Manager and

Evangelist for Concordance with LexisNexis and Dataflight Software, he served as

Vice President of Product Strategy for IPRO, as well as Director of North American

Product Marketing for Nuix. Mr. Utsler began his legal technology career with the

litigation support team at Sidley & Austin, where he administered all of the Los

Angeles discovery databases, and oversaw document reviews for a number of large

clients, including AT&T and Suzuki America.

BlueStar Case Solutions is here for you. Contact BlueStar today for a free litigation spend analysis, e-discovery

consultation, or simply for more information.

Call 1.800.471.8571Email [email protected]

Visit our website for additional resources, case studies and testimonials: www.bluestarcs.com

Contact BlueStar online and we will get back with you within 24 hours: www.bluestarcs.com/contact

About the Author –