|
What is it
FreeEed (TM) is an open source project published by SHMsoft and released under the Apache 2.0 License (other licenses available). It is based on Hadoop and other Big Data technologies. It is intended for use in eDiscovery, as an engine and a kernel for the company's search application, or as an investigator's tool. It works on a Windows, Mac, or Linux workstation, on a Hadoop cluster, and will shortly work on Amazon Web Services (AWS) EC2.
How it works
Processing is organized by the Hadoop framework. The input data is staged by zipping it in archives of a set size. Then in processing each file is read from the archive, assigned a unique ID, and processed with Tika, which extracts text and metadata. Metadata, text, and the file itself are delivered as processed results.
The current and future building blocks of the system are HDFS, Hadoop, HBase, Tika, Lucene, Solr, Mahout, Hive, and Pig.
See more on our blog here.
Indexing
Each project will create its own Lucene/Solr index for later searches.
Output
Metadata results are output as a CSV file, while the native files and the extracted text are stored in a zip file(s). The end results can be used for culling and producing native files for legal review.
With this compilation and professional support available for enterprise use, FreeEed brings high performance, scalability and reliability to data processing at a fraction of the cost of proprietary products.
Supported file formats
MS Office and other formats PST processing
In the works
GUI analytic searching (currently command line driven) computer-suggested tags and relevance (linguistic analysis with Mahout and NLTK)
About SHMsoft, the publisher of FreeEed (TM)
Site Painting
|