Data Mining (FSS 2022)
The course provides an introduction to advanced data analysis techniques as a basis for analyzing business data and providing input for decision support systems. The course will cover the following topics:
- The Data Mining Process
- Data Representation and Preprocessing
- Clustering
- Classification
- Regression
- Association Analysis
- Text Mining
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets. The team projects take place in the last third of the term. Within the projects, students realize more sophisticated data mining projects of personal choice and report about the results of their projects in the form of a written report as well as an oral presentation.
Exam Review
The exam review for FSS2022 will take place on Thursday, September 22th 2022 at 9:00 in room B6 C1.01. Please contact Alexander Brinkmann if you like to attend the exam review.
Instructors
Time and Location
- Lecture: Wednesday, 10.15 – 11.45, online ZOOM (Christian Bizer)
Due to the current Corona siutation, the kickoff session, the team formation session as well as the Q&A-sessions will be held online via ZOOM. Video recordings will be provided for the other lectures. See Outline below. - Exercises: Students should attend one of the three exercise groups. Two exercises will be held offline and will have a restricted number of places. These places which will be assigned to interested students on a weekly basis. The third exercise will be held online via ZOOM and the amount of places in this exercise will not be restricted.
- Thursday, 10.15 – 11.45, Room A 104 (B6 , Bauteil A)
- Thursday, 12.00 – 13.30, online ZOOM
- Thursday, 13.45 – 15.15, Room A 104 (B6 , Bauteil A)
- Lecture: Wednesday, 10.15 – 11.45, online ZOOM (Christian Bizer)
Final exam
- 75 % written exam
- 25 % project work (20% report, 5% presentation)
Registration
- For attending the course, please register for the lecture in Portal 2. The course is limited to 90 participants. There will be no “first come – first serve”. Students in higher semesters and students that have failed the course in HWS2021 will be preferred, equally ranked students will be drawn randomly.
- You don't have to register for the Exercise.
Outline
The lectures and question-and-answer sessions set in bold are held live via ZOOM. For the other lectures, video recordings will be provided.
Week | Wednesday | Thursday |
16.02.2022 | Lecture: Introduction to Data Mining | Exercise: Preprocessing/ |
23.02.2022 | Video Lecture: Cluster Analysis | Exercise: Cluster Analysis |
02.03.2022 | Video Lecture: Classification 1 | Exercise: Classification |
09.03.2022 | Video Lecture: Classification 2 | Exercise: Classification |
16.03.2022 | Video Lecture: Classification 3 Question and Answer Session 1 | Exercise: Classification |
23.03.2022 | Video Lecture: Regression | Exercise: Regression |
30.03.2022 | Video Lecture: Text Mining | Exercise Text Mining |
06.04.2022 | Video Lecture: Association Analysis Introduction to the Student Projects and Group Formation Question and Answer Session 2 | Exercise Association Analysis Preparation of Project Outlines |
- Easter Break - | ||
27.04.2022 | Feedback on Project Outlines | Project Work |
04.05.2022 | Feedback on demand | Project Work |
11.05.2022 | Feedback on demand | Project Work |
18.05.2022 | Feedback on demand | Project Work |
25.05.2022 | Feedback on demand | Project Work |
29.05.2022 | Submission of project reports (Deadline: 23:59) | |
01.06.2022 | Presentation of project results (offline, room A5, B144) | Presentation of project results (offline, room Schloss O151) |
07.06.2022 | Final exam (offline, room B6 A001, 8:30) |
|
For all students which are not familiar with Python/
Lecture Videos, Slides and Exercises
Lecture Videos and Slides:
- 16.02.2022: Lecture Video Introduction (Slideset Introduction and Organization FSS2022)
- 23.02.2022: Lecture Video Cluster Analysis (Slideset Cluster Analysis)
- 30.02.2022: Lecture Video Classification – Part 1 (Slideset Classification – Part 1)
- 02.03.2022: Lecture Video Classification – Part 2 (Slideset Classification – Part 2)
- 09.03.2022: Lecture Video Classification – Part 3 (Slideset Classification – Part 3)
- 16.03.2022: Lecture Video Regression (Slideset Regression)
- 23.03.2023: Lecture Video Text Mining (Slideset: Text Mining)
- 30.03.2022: Lecture Video Association Analysis (Slideset: Association Analysis)
- 06.04.2022: Introduction to the Student Projects (Slideset Introduction to the Student Projects)
Exercises:
- 16.02.2022 – Introduction to Python (Slides and Notebooks)
- 17.02.2022 – Simple Preprocessing and Visualization (Task and Notebooks | Solution)
- 24.02.2022 – Cluster Analysis (Task and Notebooks | Solution)
- 03.03.2022 – Classification Part 1 (Task and Notebooks | Solution)
- 10.03.2022 – Classification Part 2 (Task and Notebooks | Solution)
- 17.03.2022 – Classification Part 3 (Task and Notebooks | Solution)
- 24.03.2022 – Regression (Task and Notebooks | Solution)
- 31.03.2022 – Text Mining (Task and Notebooks | Solution)
- 07.04.2022 – Association Analysis (Task and Notebooks | Solution)
Additional material will be found in the ILIAS group of the course.
Literature
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar: Introduction to Data Mining, 2nd Global Edition, Pearson.
Vijay Kotu, Bala Deshpande: Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner. Morgan Kaufmann.
Aurélien Géron: Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly.
Software
Videos and Screen Casts
- Video recordings of the Data Mining I lectures and screen casts of the exercises are available here.
Course Evaluations