Data Mining II
Building on the Data Mining fundamentals course, this course deepens the theory and practice of advanced data mining topics, such as:
- Data Preprocessing
- Regression and Forecasting
- Dimensionality Reduction
- Anomaly Detection
- Time Series Analysis
- Parameter Tuning
- Ensemble Methods
- Deep Learning
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets.
Like in the previous years, participants will take part in the annual Data Mining Cup (DMC), an international student competition in data mining, as part of the project work. In addition to the DMC submission, the approaches and results of the project have to be compiled into a written project report, and presented in a plenary session.
Time and Location
Lecture:
- Tuesday, 13.45 – 15.15, A1.04 <- might be changed!
We'll have two alternatives for the exercise:
- Exercise: Monday, 10.15 – 11.45, A 5, 6, C012
- Exercise: Monday, 12.00 – 13.30, A5, 6, C015
Both of these dates are offered, and you have to decide for one.
Instructors
Final exam
- 60 % written exam
- 40 % project work
Exam Review
- The exam review for the first and second exam from FSS2018 will take place on : Thursday, 27 September, 9am, in room C1.01 (building B6, 26).
Slides and Excercises
- Slides and exercises will be posted here. Exercise solutions will be made available via ILIAS.
Participation FSS 2018
- The course is open to students of the Master Business Informatics, Master Data Science, and Lehramt Informatik.
- Due to popular demand, we have doubled the capacity to 64 participants.
- Registration is done via the ILIAS group.
- Registration will be opened Friday, 9 February, 9:00 am using this link.
- Allocation of places is done by FCFS (limit 64 students)
Outline
- 13.02.: Organization (slides), Preprocessing (slides)
- 16.02/19.02.: Exercise 1 – Data Preprocessing
- 20.02.: Regression (slides)
- 23.02/26.02.: Exercise 2 – Regression
- 27.02.: Anomaly Detection (slides)
- 02.03/05.03.: Exercise 3 – Anomaly Detection
- 06.03.: Ensembles (slides)
- 09.03./12.03.: Exercise 4 – Ensembles
- 13.03.: Time Series (slides)
- 16.03./19.03.: Exercise 5 – Time Series
- 20.03.: Neural Networks (slides)
- 23.03: Exercise 6 – Neural Networks
- Easter Break
- 10.04.: Parameter Tuning (slides), DMC kick off (slides)
- 17.04.: DMC intermediate presentation
- 24.04.: DMC intermediate presentation
- 08.05.: DMC intermediate presentation
- 15.05.: DMC intermediate presentation
- 22.05.: DMC final presentation
Literature
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.
Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann.
Bing Liu: Web Data Mining, 2nd Edition, Springer.
Further literature on specific topics will be announced in the lecture.
Software
- We will use the most recent version of RapidMiner. Licence key handling will be discussed within the first sessions of this course.
- You are invited to work with other tools (Python, R, etc.) if you like.
Lecture Videos
- Video recordings of the Data Mining II lectures are available here (accessible from within the university network or VPN).