Data Mining (FSS 2021)
The course provides an introduction to advanced data analysis techniques as a basis for analyzing business data and providing input for decision support systems. The course will cover the following topics:
- The Data Mining Process
- Data Representation and Preprocessing
- Clustering
- Classification
- Regression
- Association Analysis
- Text Mining
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets. The team projects take place in the last third of the term. Within the projects, students realize more sophisticated data mining projects of personal choice and report about the results of their projects in the form of a written report as well as an oral presentation.
The webpage about the HWS 2020 edition of this course is found in the lecture archive.
Exam Review
The review for the exam of FSS2021 will take place offline at the University in the week of July 12th. The exact place and time will be published here later. For attending the exam review you need to register via email with Alexander Brinkmann until Friday July 9th.
There is no second exam for FSS2021. The next opportunity to retake the project and exam is in HWS2021/
Instructors
Time and Location
- Lecture: Wednesday, 10.15 – 11.45, WIM-ZOOM Room 04 (Prof. Dr. Christian Bizer)
- Exercise 1: Thursday, 10.15 – 11.45, WIM-ZOOM Room 13 (RapidMiner, Anna Primpeli)
- Exercise 2: Thursday, 12.00 – 13.30, WIM-ZOOM Room 16 (Python, Ralph Peeters)
- Exercise 3: Thursday, 13.45 – 15.15, WIM-ZOOM Room 15 (Python, Alexander Brinkmann)
Note: there are three parallel exercise groups, you are supposed to attend only one.
Final exam
- 75 % written exam
- 25 % project work (20% report, 5% presentation)
Registration
- For attending the course, please register for the lecture in Portal 2. The course is limited to 80 participants. There will be no “first come – first serve”. Students in higher semesters and students that have failed the course in HWS2020 will be preferred, equally ranked students will be drawn randomly.
- We offer three alternative times (Thursdays 10.15, 12.00, and 13.45) for the exercise session. Choose one and attend the exercise at the corresponding time (you don't have to register for it).
Lecture Videos, Slides and Exercises
Slides:
- 3.03.2021: Lecture Video Introduction (Slideset Introduction and Organization FSS2021)
- 10.03.2021: Lecture Video Cluster Analysis (Slideset Cluster Analysis)
- 17.03.2021: Lecture Video Classification – Part 1 (Slideset Classification – Part 1)
- 24.03.2021: Lecture Video Classification – Part 2 (Slideset Classification – Part 2)
- 14.04.2021: Lecture Video Classification – Part 3 (Slideset Classification – Part 3)
- 21.04.2021: Lecture Video Regression (Slideset Regression)
- 28.04.2021: Lecture Video Text Mining (Slideset: Text Mining)
- 05.05.2021: Lecture Video Association Analysis (Slideset: Association Analysis)
- 12.05.2021: Introduction to the Student Projects (Slideset Introduction to the Student Projects)
- 16./17.6.2021: Presentation of the Student Projects (Slideset Presentation of the Student Projects)
Exercises:
- 2021/
03/03 – Introduction to Python (Slides and Notebooks | Solution) - 2021/
03/04 – Visualization: RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution) - 2021/
03/11 – Cluster Analysis: RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution) - 2021/
03/18 – Classification I : RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution) - 2021/
03/25 – Classification II : RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution) - 2021/
04/15 – Classification III : RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution) - 2021/
04/22 – Regression : RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution) - 2021/
04/29 – Text Mining : RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution | Quiz) - 2021/
05/06 – Association Analysis : RapidMiner (Slides, Task, Data | Solution) Python (Notebooks, Task and Data | Solution | Quiz)
Additional material will be found in the ILIAS group of the course.
Outline
Week | Wednesday | Thursday |
3.03.2021 | Introduction to Data Mining | Exercise Preprocessing/ |
10.03.2021 | Lecture Clustering | Exercise Clustering |
17.03.2021 | Lecture Classification 1 | Exercise Classification |
24.03.2021 | Lecture Classification 2 | Exercise Classification |
14.04.2021 | Lecture Classification 3 | Exercise Classification |
21.04.2021 | Video Lecture Regression | Exercise Regression |
28.04.2021 | Video Lecture Text Mining | Exercise Text Mining |
5.05.2021 | Video Lecture Association Analysis | Exercise Association Analysis |
12.05.2021 | Introduction to the Student Projects and Group Formation | Preparation of Project Outlines |
19.05.2021 | Feedback on Project Outlines | Project Work |
26.05.2021 | Project Work | Feedback on demand |
2.06.2021 | Feedback on demand | Project Work |
09.06.2021 | Submission of project report (Deadline: 13.06) | Preparation of presentation |
16.06.2021 | Presentation of project results | Presentation of project results |
23.06.2021 | Final exam (online) |
For all students which are not familiar with Python/
Literature
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar: Introduction to Data Mining, 2nd Global Edition, Pearson.
Vijay Kotu, Bala Deshpande: Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner. Morgan Kaufmann.
Aurélien Géron: Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly.
Software
Videos and Screen Casts
- Video recordings of the Data Mining I lectures and screen casts of the exercises are available here.
Course Evaluations