As technology continues to evolve, companies have had to become more creative when it comes to managing vast amounts of data. However, technology can also be used to deal with this issue via analytical applications and tools.

Trustpoint.One takes this challenge seriously, and we’ve built our technology and software around data analytics. In practice, this approach improves eDiscovery processes by weeding out irrelevant data and keeping the most important data for litigation.

Here are four analytics strategies for eDiscovery you can implement to effectively manage data.

Email Threading

Email threading segments emails by gathering forwards, replies and reply-alls together. This makes it easy to track related emails and reconstruct email threads. Trustpoint.One’s agnostic approach to technology deployment allows for organization and management of emails using::

  • Email threads
  • People involved in an email conversation
  • Email attachments (if the Parent ID is provided along with the attachment item)
  • Duplicate emails

Redundancies can be identified by text analysis of email threads, minimizing the number of documents that need to be reviewed. This reduces the amount of data to be reviewed, and it saves money in personnel and resources, as well.

Near-Duplicate Identification

Another useful analytical tool that’s closely related to email threading is near-duplicate identification. Like email threading, near-duplicate identification is a common textual analytics tool. It calculates document similarity via the textual content of a document.

For example, this tool could identify two different types of documents, such as an email and a PDF attachment containing the same, or nearly the same, content. The value of this approach is that you can tag these documents and either group them together or exclude them based on your needs.

Near-duplicate identification can also serve as an important quality control measure. For instance, there could be a situation where reviewers have found a cache of privileged documents in the data set. But what if some of these privileged documents were missed? You could easily find near-duplicates of the documents already collected and re-review the new set for inconsistencies.

Keyword Expansion

Keyword expansion is a simple analytics tool that can help you during eDiscovery. This common practice takes your set of defined keywords and identifies conceptually similar keywords based on your data set. Note that it finds conceptually similar terms, not just synonyms of your keyword list.

This helps build on your initial keyword list, and it should give you a better understanding of the important issues related to the case. Additionally, it helps you collect documents you would have otherwise missed—for example, those containing project code names.

Along with cluster visualization, this is an expedient way to help organize your data set.

Cluster Visualization

Cluster visualization is just what it sounds like. It’s the process of displaying data in a visual graph, often using word or subject clusters.

There’s only so much information a person can interpret when looking at a table view list of documents. Cluster visualization solves that problem. The advantage of this strategy is that it allows for reviewers to analyze and gain insights from data at a glance, making it much more efficient.

Additionally, it’s helpful for sorting information during eDiscovery—especially data with conceptual content instead of numerical data. You can quickly see which subjects in the data set appear most frequently, and you can also group conceptually similar documents.

Reviewers can further sort data by overlaying keywords to identify and flag the clusters that correspond to relevant terms. Overall, this will facilitate a better understanding of a case.

Are these four strategies part of your eDiscovery routine? If not, Trustpoint.One can handle your eDiscovery needs and more. Contact us today for more information.