Data Mining II (FSS 2022)
Building on the Data Mining fundamentals course, this course deepens the theory and practice of advanced data mining topics, such as:
- Data Preprocessing
- Dimensionality Reduction
- Anomaly Detection
- Time Series Analysis and Forecasting
- Parameter Tuning
- Ensemble Methods
- Neural Networks and Deep Learning
- Model Validation
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets.
Like in the previous years, participants will take part in the annual Data Mining Cup (DMC), an international student competition in data mining, as part of the project work. In addition to the DMC submission, the approaches and results of the project have to be compiled into a written project report, and presented in a plenary session.
Exam Review
The exam review for the retake exam of FSS2022 will take place on Wednesday, November 16th at 2pm in B6 C1.01. Please write a short mail to Nico if you are planning to come to the review.
Time and Location
At the moment, we assume that the course can be held in presence, but we are closely monitoring the pandemic situation, and we are prepared to switch to an online or hybrid setting.
Lecture:
- Tuesday, 13.45 – 15.15, A5, 6, B144 (starts on February 22nd)
For students who cannot attend the lecture (e.g., due to visa problems or quarantine), we will provide lecture recordings from the previous year.
We'll have three alternatives for the exercise:
- Exercise: Monday, 10.15 – 11.45, online: ZOOM-LEHRE-101
- Exercise: Monday, 12.00 – 13.30, online: ZOOM-LEHRE-101
- Exercise: Monday, 13.45 – 15.15, A5, 6, C012
The exercises start on February 28th.
All exercises are equivalent, you are supposed to attend one out of the three.
Instructors
Lecture Slides
- February 22nd: Organization (PDF, 1 MB), Data Preprocessing (PDF, 2 MB)
- March 1st: Ensembles (PDF, 2 MB)
- March 8th: Time Series (PDF, 3 MB)
- March 15th: Neural Networks and Deep Learning (PDF, 4 MB)
- March 22nd: Anomaly Detection (PDF, 3 MB)
- March 29th: Hyperparameter Tuning (PDF, 1 MB) (video only, no live lecture!)
- April 5th: Model Validation (PDF, 1 MB), Model Inspection (bonus episode) (PDF, 1021 kB) (video only, no live lecture!)
Participation
- The course is open to students of the Master Business Informatics, Master Data Science, and Lehramt Informatik.
- Registration is done via Portal2.
- In case there are more registration than places (96), places will be allocated automatically by Portal2.
Outline
Week Lecture Exercise 14.2. -- -- 21.2. Introduction & Data Preprocessing -- 28.2. Ensembles Introduction & Data Preprocessing 7.3. Time Series Ensembles 14.3. Neural Networks & Deep Learning Time Series 21.3. Anomaly Detection Neural Networks & Deep Learning
28.3. Hyperparameter Tuning Anomaly Detection
4.4. Model Verification Hyperparameter Tuning
11.4. Easter Break Easter Break 18.4. Easter Break Easter Break 25.4. DMC Session Model Verification
2.5. DMC Session -- 9.5. DMC Session -- 16.5. DMC Session -- 23.5. DMC Session -- Data Mining Cup Timeline (see here)
Literature
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.
Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann.
Bing Liu: Web Data Mining, 2nd Edition, Springer.
Further literature on specific topics will be announced in the lecture.
Software
- We will use Python with a number of different packages (scikit-learn, etc.), which will be announced during the exercises.
- You are invited to work with other tools (RapidMiner, R, etc.) if you like.
Lecture Videos
- Video recordings of the Data Mining II lectures are available here (accessible from within the university network or VPN).