IR Lab Logo
Dr. Nazli Goharian
Home Biography Research Publications Classes
CS422
Introduction to Data Mining

Course Description:
This course will provide an introductory look at concepts and techniques in the field of data mining. After covering the introduction and terminologies to Data Mining, the techniques used to explore the large quantities of data for the discovery of meaningful rules and knowledge such as market basket analysis, nearest neighbor, decision trees, neural networks, Naive Bayes, and clustering are covered. The students learn the material by implementing different data preprocessing and mining techniques throughout the semester. This is a very project intensive course. (3-0-3).
Prerequisite:
CS 331 or CS 401 or CS403 and VERY STRONG Programming knowledge (see the project description!). The class is VERY intensive in projects. If you are not VERY comfortable in programming, you may find this course overly difficult. You will need to use a software with several thousands of JAVA code to enhance the software to implement various algorithms that will be given during the semester. For each of these 4-5 projects you have to run many experiments that are very time consuming. ONLY IF you are interested in the course subject AND have either VERY STRONG programming knowledge OR have LOTS of time to allocate in it, should stay in this course.
Text:
J. Han, M. Kamber. Data Mining Concepts and Techniques, Morgan Kaufmann
Teaching Assistant:
TBA ; Office Hours: see Blackboard.
Course Announcements/ Discussion Board:
http://blackboard.iit.edu
Grading & Due Dates (Tentative- will be finalized by the 1st day of the class):
Projects (Individual) 35% 4-5 individual projects. You, including INTERNET and TV section students, are REQUIRED to show up in the Main Campus to give demo for your projects (multiple projects). Failing to answer questions in the project demo is a zero in that project even if your project is implemented completely!
Research Paper Presentation (Group work) 5% Size of the group will be determined (consider group of two). You, including INTERNET and TV section students, are REQUIRED to show up in the class to give your paper presentation (each member of the group). You are also required to be in the class for the paper presentation of the others. Tentative schedule for this is during the last two weeks of the class, excluding final exam week.
Exams 60% IMPORTANT: All students, including INTERNET and TV section, MUST take the exams IN THE CLASS on the same day and time in the Main Campus. Make sure not to register for another class at the same time, as the exams of the classes may overlap!! NO makeup exams will be given if you do not show up in the exams, unless it has been an emergency health related matter, in which case you need to bring me a doctor note and contact me ASAP!
Projects:
The projects require implementation of data pre-processing and data mining algorithms, and performing experimentations. Any programming language may be used. However, a data mining prototype engine written in JAVA is given to the students and they are encouraged to use this protoype to implement the algorithms and approaches that are covered in the class. Student may be called to give a demo for each of the parts of the projects. Failing to appear for the demo, or failing to answer the questions on the submitted project will be a zero for that project.
Group Research Paper Presentation:
Students will present an overview of a technical research paper for 20 minutes. The students in groups select the papers among a set of pre-selected papers. The process and due date for the paper selection will be announced. Each group will be asked to present an overview of the paper. Presentations must be extremely well rehearsed – failure to properly prepare for the presentation will result in an extremely poor grade on the presentation. All presentations will be done on the Main Campus. All students have to attend all sessions of the presentations and be active in evaluating each other's presentations. Failing to do so, the student will loose points!
Course Outline:
Introduction to Data Mining
Data preprocessing
Classification & Cross Validation
Evaluation
Naive Bayes
Neural Networks
Decision Tree
Rule Based Classification
K-Nearest Neighbor
Ensemble Methods
Association rules
Cluster analysis
Students Presentations
Late Assignment Policy:
Assignments MUST be submitted on or before their due date. No late assignment will be assigned a grade.In fact the grade for a late submission will be a zero! Please try not to ask for an extension as no extension will be awarded. If you are not able to finish your assignment by the due date and time, simply submit whatever of the assignment you have done to get some points rather than a zero. The students are encouraged to re-work on the incomplete assignments within a week from the submission due. This does not change the grade for that assignment, however will be considered if the final grade is in border-line. Note that the re-submission must be a complete work of that project. As the server might be down or other problem may occur while you are planning to submit your work, it is strongly recommended to submit before the due date/time to make sure your submission is successful. Do not forget that this is your responsibility to make your submission on time!
Academic Integrity:
Each member of this course bears responsibility for maintaining the highest standards of academic integrity. All breaches of academic integrity will be dealt with severely and must be reported immediately. The student, who cheats for the first time receives a zero and in the case of the "second offense" the student receives an "E" for the course, i.e, "Fails". If the student receives a zero as the result of cheating then he/she does not get an "A" for the course, regardless of the total class points earned.
CS425
Database Organization (Database Design and Applications)

CS429
Introduction to Information Retrieval