Everson On eDiscovery: Understanding Predictive Coding
Author: Eric Everson, MBA, MSIT-SE
(About the Author: Eric Everson is a software engineer
turned law scholar. He is currently a 3L
at Florida A&M University College of Law where he serves as the President
of the Electronic Discovery Law Student Association.)
The specialization of eDiscovery is rapidly emerging as one
loaded with acronyms and unique terminology; as practitioners terms like TAR
(technology assisted review), CAR (computer assisted review), or intelligent review
get tossed around in exchange for a process also known as predictive coding. Don’t let
this technical sounding term scare you, as understanding at least the basics of
predictive coding can have a major impact on your litigation costs.
First, what is
predictive coding? To define it
broadly, predictive coding is a machine- learning process wherein software is
used to intake documents to build a training set which is then used to model
the data to establish patterns. These patterns
are used to assign prediction scores which allow us to gauge the computer’s
statistical accuracy. Here is a great
way to think of it:
In the midst of discovery
between two corporations, 800,000 documents are produced for review. You know you don’t need all 800,000 of them
and in fact there are probably only about 80 files that are truly going to be
valuable as evidence in your matter.
Predictive coding is the process that allows you to put a computer to
task for accurately narrowing these documents to a more manageable volume. You do not load all 800,000 documents from
the beginning, first you start by training the software with what are called training
sets. These training sets are very
important because this is where we get our accuracy percentage which we will
use to narrow the volume of documents.
Once you have your training sets established, you decide which percentage
(also known as the prediction score) will get you closer to the 80 documents
you want from the mountain of 800,000. A
good measure to start with might be a prediction score of 75%; this will likely
narrow your 800,000 documents to 8,000 and from there you can retrain the
software to narrow even farther as you work your way toward the 80 documents
you need.
Next, why do you care
about predictive coding? The bottom
line is drawn from that age old adage, “time is money”. Why would you spend time (or money) on having
lawyers or paralegals try to work their way through a mountain of 800,000
documents? With predictive coding
software, we divert these resources from this part of the discovery process and
put the computer to task. A single successful
law suit that utilizes predictive coding software will pay for the software
many times over. It is an investment
that law firms cannot afford to miss in an environment where the volume of
digital data is predicted (pun intended) to multiply 50 times over the next ten
years.
Finally, when do you
need predictive coding software? The
thing about predictive coding software is that it is a tool that can be
extremely valuable under the right circumstances. Predictive coding is not always necessary in
a law suit. There may not be a huge
number of files subject to the litigation or alternatively a keyword search may
be equally sufficient given the context of discovery. It is important to know that predictive
coding software is a tool that is available when the right circumstances arise.
In conclusion, predictive coding offers a technology-based
approach to reducing the costs associated with complex litigation. We are at a time in history when terms like
cloud computing and big data are being increasingly used. Additionally, given new mediums of content
generation (email, text, images, video, tweets, likes, +1’s, shares, etc) the
volume of data is explosive. Learning to
use tools like predictive coding software today, will prepare you to litigate
in the face of big data tomorrow.
#eDiscovery
Are you on Twitter?
Follow me @IntleDiscovery
---------------------------------------------------------------------------------
About the Author: Eric
Everson is a 3L law student at Florida A&M University – College of Law
where he will graduate in May 2013.
Prior to law school he earned an MBA and Masters in Software Engineering
while working for ten years in executive leadership within the U.S. telecommunications
industry. The views and opinions
presented in this blog are his own and are not to be construed as legal
advice. Eric Everson currently serves on
the Board of Governors for The Florida Bar Young Lawyers Division Law Student
Division and is the President of the Electronic Discovery Law Student
Association at Florida A&M University – College of Law. Follow @IntleDiscovery