What is FreeEed?

FreeEed™ is fun and cool software for eDiscovery. Here is a slide overviewpopcorn edsico, "FreeEed popcorn." Our video on how to download, install and start using FreeEed is found at this link here (passcode 'freeeed').


It is based on an open source project published by SHMsoft  and released under the Apache 2.0 License.  It is built with Hadoop and other Big Data technologies.  FreeEed is intended for use in eDiscovery, as an engine and a kernel for the company's search application, or as an investigator's tool. It works on a Windows, Mac, or Linux workstation and a Hadoop cluster.


Scaling is an especially important aspect of FreeEed. Since it is based on Big Data technologies and works on hoards of computers, you can fire up 100 servers and process the data a hundred times faster, for the same price. The Hadoop management is a one-click operation provided in the FreeEed Player.

How it works

Processing is organized by the Hadoop framework.  The input data is staged by zipping it in archives of a set size. Then in processing each file is read from the archive, assigned a unique ID, and processed with Tika, which extracts text and metadata. Metadata, text, and the file itself are delivered as processed results.


The primary building blocks of the system are HDFS, Hadoop, Tika, LuceneHive.


The companion application, FreeEed Review, offer document review. 

It is integrated with Elasticsearch so that the users get the advantage of the ELK stack (Elasticsearch, Logstash, Kibana)


Each FreeEed project will create its own Lucene/Elasticseasrch index for later searches.





Metadata results are output as a CSV file, while the native files and the extracted text are stored in a zip file(s). The end results can be used for culling and producing native files for legal review. You can use FreeEedUI for review or load it into Concordance.

With the compilation and professional support available for enterprise use, FreeEed brings high performance, scalability, and reliability to data processing at a fraction of the cost of proprietary products.


Supported input file formats


Other capabilities

  • Text extraction
  • Data culling
  • Native/Text/Metadata results delivery
  • Optical Character Recognition (OCR)
  • Imaging (PDF creation)
  • Instant search
  • Deduplication (configurable for emails)
  • Text analytics
  • Social media analytics