Candice Chan-Glasgow, Director, Review Services and Counsel, and Catia Amorim, Associate


March 4, 2019


Even after de-duplication and some initial culling, it’s not unusual to still be faced with a significant number of electronic records in a given litigation matter. 


Tools that help lawyers to “make sense” of a plethora of data and that assist in identifying key records are thus increasingly crucial to efficient and effective eDiscovery.  Fortunately, it’s possible to organize your electronic data into smaller buckets of conceptually similar documents.  One of the analytic tools that makes this possible is called “concept clustering”.


The concept clustering tool leverages “concepts” by assessing the meaning and context of terms contained within documents and by looking for relationships.  Documents that are ultimately found to be conceptually similar are documents that share semantically similar content or ideas.  Importantly, these are not documents that simply share textual similarities.


To create a cluster, the user simply selects the documents to be clustered.  The clustering tool then evaluates the data, finds the conceptual similarities, and organizes the data into clusters and sub-clusters.  The tool then labels each group appropriately, having regard to the conceptual content of the clustered documents.


The potential benefits of concept clustering to eDiscovery counsel are considerable and include:


  1. High level overview of the concepts in the data, allowing for quick determination of which clusters are likely to contain key documents and, conversely, which clusters contain irrelevant information;
  2. Easier identification of relationships and context within the documents, which fosters a more accurate and efficient review;
  3. Increased consistency and reduced risk of error as similar documents are reviewed together; and
  4. Better informed search queries.  For instance, a particular cluster may be used as a base for a further targeted search of key documents.  Moreover, concept clusters may highlight documents which were missed by a keyword search alone.

Strategies that help reduce the cost of eDiscovery without compromising defensibility should be employed by eDiscovery counsel wherever possible.  Concept clustering goes a long way toward achieving that end, and is just one of the time-saving and cost-effective tools that Heuristica uses in its iterative review strategy.