|
Introduction to Information Retrieval
|
Course Description:
Overview of fundamental issues of information retrieval with theoretical foundations.
The Information-retrieval techniques and theory, covering both effectiveness and run-time
performance of information-retrieval systems are covered. The focus is on algorithms and
heuristics used to find documents relevant to the user request and to find them fast.
The course covers the architecture and components of the search engine such as parser,
stemmer, index builder, and query processor. The students learn the material by building a
prototype of such a search engine.
|
Course Goals - Students should be able to:
- Explain the information retrieval storage methods
(Inverted Index and Signature Files)
- Explain the information retrieval
evaluation metrics
- Explain retrieval models, such as Boolean model,
Vector Space model, Probabilistic model, and Language Models.
- Explain retrieval utilities such as Stemming, Relevance
Feedback, N-gram, Clustering, and Thesauri, and Parsing and Token recognition.
- Design and implement a search engine prototype using
the storage methods, retrieval models and utilities.
- Apply the research ideas into their experiments in
building a search engine prototype.
|
Prerequisite:
CS 331 or CS401 - Data Structures and Algorithms & Strong programming knowledge
|
Teaching Assistant:
Fall 2005: Alana Platt; Fall 2006: Saket Mengle, Fall 2007: Alana Platt platala@iit.edu;
Office hours: see Blackboard.
|
Text:
D. Grossman and O. Frieder,
Information Retrieval: Algorithms and Heuristics,
Second Edition 2004, Springer Publishers, ISBN 1-4020-3004-5 (paperback).
|
Handouts:
The course handouts are available on the Blackboard for most topics that are covered in the class.
|
Grading & Due Dates (Tentative- Will be finalized by the 1st day of the class!):
| Project |
35% |
Project is divided into several parts. The % for each part will be announced.
Access project description at
http://blackboard.iit.edu
|
| Group Research Presentation |
5% |
Presentations will be given in the class on the Main campus.
See Blackboard for date and detail!
|
| Exams ( 3 exams ) |
total of 60% |
Date: TBD - will be announced in the class and on the Blackboard. Three exams will be given. All the Main campus students who are registered in the INTERNET or IITV section have to take the exams on the same day and time as live section in the class on the Main campus! The "very remote" students have to discuss at the beginning of the semester with me to get permission for taking the exam in a remote location. Unless this is planned with me ahead of time, I assume that the remote student shows up to the class on the same day and time of the exams to take the exams. The remote students who are not very remote will be asked to show up to the class to take their exams as anyone else in the class at the same time.
|
|
Project:
Students will implement an Information Retrieval search engine. Any programming language
may be used. The project is partitioned into several assignments. A search engine prototype will be given to the class that has some functionalities. The students have the option to use this search engine for implementing the functionalities they are required to implement for their assignments. They may however choose not to use this software and write their own engine from scrach. In either case the assignments are the same for the class with the same due dates. The software can be accessed via the Blackboard. Each student may be called to give a demo for each of the parts of the project/assignments. Failing to appear for the demo, or failing to answer the questions on the submitted assignment will be a zero for that project part/assignment.
|
Research Paper Presentation:
Students will present (20-30 minutes) an overview of a technical research paper in
information retrieval. The students select the papers among a set of pre-selected papers, which will be provided to the class.
The process and due date for the paper selection will be announced. All presentations
will be done on the Main campus.
|
Course Outline (Tentative!):
|
Late Assignment Policy:
Assignments MUST be submitted on or before their due date. No late assignment will be assigned a grade. In fact the grade for a late submission will be a zero! Please try not to ask for an extension as no extension will be awarded. If you are not able to finish your assignment by the due date and time, simply submit whatever of the assignment you have done to get some points rather than a zero. The students are encouraged to re-work on the incomplete assignments. This does not change the grade for that assignment, however will be considered if the final grade is in border-line. As the server might be down or other problem may occur while you are planning to submit your work, it is strongly recommended to submit before the due date/time to make sure your submission is successful. Do not forget that this is your responsibility to make your submission on time!
|
Academic Integrity:
Each member of this course bears responsibility for maintaining the highest standards of
academic integrity. All breaches of academic integrity must be reported immediately.Based on the
departmental regulation, the student, who cheats for the first time receives a zero and in the case
of the "second offense" the student receives an "E" for the course. If the student receives a zero
as the result of cheating does not get an "A" for the course.
|