IR Lab Logo
Dr. Nazli Goharian
Home Biography Research Publications Classes
CS429
Introduction to Information Retrieval

Course Description:
Overview of fundamental issues of information retrieval with theoretical foundations. The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find documents relevant to the user request and to find them fast. The course covers the architecture and components of the search engine such as parser, stemmer, index builder, and query processor. The students learn the material by building a prototype of such a search engine.
Course Goals - Students should be able to:
  • Explain the information retrieval storage methods (Inverted Index and Signature Files)
  • Explain the information retrieval evaluation metrics
  • Explain retrieval models, such as Boolean model, Vector Space model, Probabilistic model, and Language Models.
  • Explain retrieval utilities such as Stemming, Relevance Feedback, N-gram, Clustering, and Thesauri, and Parsing and Token recognition.
  • Design and implement a search engine prototype using the storage methods, retrieval models and utilities.
  • Apply the research ideas into their experiments in building a search engine prototype.
Prerequisite:
CS 331 or CS401 - Data Structures and Algorithms & Strong programming knowledge
Teaching Assistant:
Fall 2005: Alana Platt; Fall 2006: Saket Mengle, Fall 2007: Alana Platt platala@iit.edu; Office hours: see Blackboard.
Text:
D. Grossman and O. Frieder, Information Retrieval: Algorithms and Heuristics, Second Edition 2004, Springer Publishers, ISBN 1-4020-3004-5 (paperback).
Handouts:
The course handouts are available on the Blackboard for most topics that are covered in the class.
Grading & Due Dates (Tentative- Will be finalized by the 1st day of the class!):
Project 35% Project is divided into several parts. The % for each part will be announced.
Access project description at http://blackboard.iit.edu
Group Research Presentation 5% Presentations will be given in the class on the Main campus.
See Blackboard for date and detail!
Exams ( 3 exams ) total of 60% Date: TBD - will be announced in the class and on the Blackboard. Three exams will be given. All the Main campus students who are registered in the INTERNET or IITV section have to take the exams on the same day and time as live section in the class on the Main campus! The "very remote" students have to discuss at the beginning of the semester with me to get permission for taking the exam in a remote location. Unless this is planned with me ahead of time, I assume that the remote student shows up to the class on the same day and time of the exams to take the exams. The remote students who are not very remote will be asked to show up to the class to take their exams as anyone else in the class at the same time.
Project:
Students will implement an Information Retrieval search engine. Any programming language may be used. The project is partitioned into several assignments. A search engine prototype will be given to the class that has some functionalities. The students have the option to use this search engine for implementing the functionalities they are required to implement for their assignments. They may however choose not to use this software and write their own engine from scrach. In either case the assignments are the same for the class with the same due dates. The software can be accessed via the Blackboard. Each student may be called to give a demo for each of the parts of the project/assignments. Failing to appear for the demo, or failing to answer the questions on the submitted assignment will be a zero for that project part/assignment.
Research Paper Presentation:
Students will present (20-30 minutes) an overview of a technical research paper in information retrieval. The students select the papers among a set of pre-selected papers, which will be provided to the class. The process and due date for the paper selection will be announced. All presentations will be done on the Main campus.
Course Outline (Tentative!):
(All the following topics can be found in the text books listed above) Slides
Introduction, Overview of IR Introduction
IR Utilities: Parser/Tokenizer, phrase Recognition, Stemming, N-Grams Parser & Stemmer
Efficiency: Indexing - inverted index, memory based and sort inversion; Signature Files Efficiency: Indexing
IR Strategies and Models: Boolean, Vector Space Model; Similarity Measures in Information Retrieval, Pivoted Normalizations Boolean & Vector Space Models
IR Evaluation IR Evaluation
IR Strategy: Probablistic Model Probablistic Model
IR Utility: Relevance Feedback and other Query Expansions Relevance Feedback
Efficiency : Compression Efficiency :Compression
Efficiency: Top Docs, Query Threshold Efficiency: Index Prunning & Query Thresholding
Clustering Clustering
>
IR Strategy: Language Models Language Models
World Wide Web World Wide Web
IR Utility: Passage Based Retrieval Passage Based Retrieval
Efficiency: Duplicate Document Detection
Relational Approach Research Paper Presentations Students Presentations
Late Assignment Policy:
Assignments MUST be submitted on or before their due date. No late assignment will be assigned a grade. In fact the grade for a late submission will be a zero! Please try not to ask for an extension as no extension will be awarded. If you are not able to finish your assignment by the due date and time, simply submit whatever of the assignment you have done to get some points rather than a zero. The students are encouraged to re-work on the incomplete assignments. This does not change the grade for that assignment, however will be considered if the final grade is in border-line. As the server might be down or other problem may occur while you are planning to submit your work, it is strongly recommended to submit before the due date/time to make sure your submission is successful. Do not forget that this is your responsibility to make your submission on time!
Academic Integrity:
Each member of this course bears responsibility for maintaining the highest standards of academic integrity. All breaches of academic integrity must be reported immediately.Based on the departmental regulation, the student, who cheats for the first time receives a zero and in the case of the "second offense" the student receives an "E" for the course. If the student receives a zero as the result of cheating does not get an "A" for the course.
CS422
Introduction to Data Mining

CS425
Database Organization (Database Design and Applications)