Advanced Information Retrieval Engine (AIRE)

AIRE (Advanced Information Retrieval Engine) is a portable IR engine written in 100% pure Java™. The system is designed to provide an indexing foundation for new IR techniques. The indexing system supports different retrieval strategies giving it state-of-the-art search capabilities. The indexing system is designed to be flexible so new document types can be easily added. Current support for pdf, ps, HTML, SGML and email is available or underdevelopment.

See AIRE in action at the Institute of Design search page.

AIRE Feature List
Portable IR indexer and query engine written in 100% pure Java™
Distributed Indexing - Indexing speed scale for Multi-Processor or NOW configurations
Boolean, VSM, Probabilistic retrieval methods
Stemming:
 

Porter stemming
  K-Stem equivalence classes
Automatic phrase detection and generation for better searching and indexing
N-Term phrase search abilities
Duplicate document detection
Web enabled Java Servlet Query Interface
Indexing and querying for millions of documents of different types
Fast retrieval data structures.
Flexible object oriented architecture designed to be extended.
Configurable indexing options, for position information storage, stemming, stop words, etc.
Advanced ranking algorithms implemented, with hooks for customized algorithms.
TReC - Topic P/R test system for evaluation of new search strategies.
Index Document types supported or underdevelopment:
  SGML - TReC document parsing
  HTML - HTML Document support
  Text - Text file support
  PDF - PDF support
  Email
  XML