Data Mining (HWS2019)
The course provides an introduction to advanced data analysis techniques as a basis for analyzing business data and providing input for decision support systems. The course will cover the following topics:
- Goals and Principles of Data Mining
- Data Representation and Preprocessing
- Clustering
- Classification
- Regression
- Association Analysis
- Text Mining
- Systems and Applications (e.g. Retail, Finance, Web Analysis)
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets. The team projects take place in the last third of the term. Within the projects, students realize more sophisticated data mining projects of personal choice and report about the results of their projects in the form of a written report as well as an oral presentation.
Time and Location
- Lecture: Wednesday, 10.15 – 11.45, Room A5 6, C015
- Exercise 1: Thursday, 12.00 – 13.30, Room B6 26, A104 Nicolas Heist (RapidMiner)
- Exercise 2: Thursday, 13.45 – 15.15, Room A5 6, C012 Sven Hertling (Python)
- Exercise 3: Thursday, 15.30 – 17.00, Room A5 6, C012 Ralph Peeters (Python)
Note: there are three parallel exercise groups, you are supposed to attend only one.
Instructors
Final exam
- 75 % written exam
- 25 % project work (20% report, 5% presentation)
Registration
- For attending the course, please register for the lecture in Portal 2. The course is limited to 80 participants. There will be no “first come – first serve”. Students in higher semesters will be preferred, equally ranked students will be drawn randomly.
- We offer three alternative times (Thursdays 12.00, 13.45 and 15.30) for the exercise session. Choose one and attend the exercise at the corresponding time (you don't have to register for it).
Slides and Exercises
Slides:
- 2019/
09/04: Introduction and Organization (PDF, 5 MB) - 2019/
09/11: Cluster Analysis (PDF, 2 MB) - 2019/
09/18: Classification Part 1 (PDF, 2 MB) - 2019/
09/25: Classification Part 2 (PDF, 2 MB) - 2019/
10/02: Classification Part 3 (PDF, 2 MB) - 2019/
10/09: Regression (PDF, 2 MB) - 2019/
10/16: Text Mining (PDF, 2 MB) - 2019/
10/23: Association Analysis (PDF, 1 MB)
Exercises:
- 2019/
09/05: RapidMiner (Slides (PDF, 2 MB) | Task (PDF, 207 kB) | Data) Python (Slide/ Task Notebooks | Task (PDF, 136 kB) | Data) - 2019/
09/12: RapidMiner (Slides (PDF, 1 MB) | Task (PDF, 95 kB) | Data) Python (Slide/ Task Notebooks | Task (PDF, 59 kB) | Data) - 2019/
09/19: RapidMiner (Slides (PDF, 2 MB) | Task (PDF, 92 kB)) Python (Slide/ Task Notebooks | Task (PDF, 92 kB)| Data) - 2019/
09/26: RapidMiner (Slides (PDF, 1 MB) | Task (PDF, 89 kB) | Data) Python (Slide/ Task Notebooks | Task (PDF, 82 kB) | Data) - 2019/
10/10: RapidMiner (Slides (PDF, 1 MB) | Task (PDF, 101 kB) | Data) Python (Slide/ Task Notebooks | T (PDF, 82 kB)ask (PDF, 821 kB) | Data) - 2019/
10/17: RapidMiner (Slides (PDF, 862 kB) | Task (PDF, 96 kB) | Data) Python (Slide/ Task Notebooks | Task (PDF, 51 kB) | Data) - 2019/
10/24: RapidMiner (Slides (PDF, 801 kB) | Task (PDF, 86 kB) | Data) Python (Slide/ Task Notebooks | Task (PDF, 86 kB) | Data)
Solutions and additional material can be found in the ILIAS group of the course.
Outline
For all students which are not familiar with Python/
Week | Wednesday | Thursday |
02.09.2019 | Introduction to Data Mining Introduction to Python (see above) | Exercise Preprocessing/ |
09.09.2019 | Lecture Clustering | Exercise Clustering |
16.09.2019 | Lecture Classification 1 | Exercise Classification |
23.09.2019 | Lecture Classification 2 | Exercise Classification |
30.09.2019 | Lecture Classification 3 | Holiday (no exercise) |
07.10.2019 | Lecture Regression | Exercise Regression |
14.10.2019 | Lecture Text Mining | Exercise Text Mining |
21.10.2019 | Lecture Association Analysis | Exercise Association Analysis |
28.10.2019 | Introduction to Student Projects and Group Formation (Attendance obligatory) | Preparation of Project Outlines |
04.11.2019 | Feedback on demand | Project Work |
11.11.2019 | Feedback on demand | Project Work |
18.11.2019 | Feedback on demand | Project Work |
25.11.2019 | Submission of project results | Presentation of project results |
02.12.2019 | Presentation of project results |
Literature
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.
Vijay Kotu, Bala Deshpande: Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner. Morgan Kaufmann.
Aurélien Géron: Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly.
Software
Videos and Screen Casts
- Video recordings of the Data Mining I lectures and screen casts of the exercises are available here.
Course Evaluations
- Evaluation from HWS 2018 (PDF, 662 kB)
- Evaluation from HWS 2017 (PDF, 875 kB)
- Evaluation from FSS 2017 (PDF, 800 kB)
- Evaluation from HWS 2016 (PDF, 734 kB), Correction (PDF, 534 kB)
- Evaluation from FSS 2016 (PDF, 174 kB)
- Evaluation from HWS 2015 (PDF, 182 kB)
- Evaluation from FSS 2015 (PDF, 160 kB)
- Evaluation from HWS 2014 (PDF, 188 kB)
- Evaluation from FSS 2014 (PDF, 164 kB)
- Evaluation from HWS 2013 (PDF, 179 kB)
- Evaluation from FSS 2013 (PDF, 159 kB)