Data Mining II (FSS 2024)

Building on the Data Mining fundamentals course, this course deepens the theory and practice of advanced data mining topics, such as:

Data Preprocessing
Dimensionality Reduction
Anomaly Detection
Time Series Analysis and Forecasting
Parameter Tuning
Ensemble Methods
Neural Networks and Deep Learning
Model Validation

The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets.

Like in the previous years, students enrolled in the course will participate in a larger data mining competition (details to be announced). In addition to the submission of an entry to the competition, the approaches and results of the project have to be compiled into a written project report, and presented in a plenary session.

Time and Location

Lecture:

Tuesday, 13.45 – 15.15, A5, 6, C013 (starts on February 13th!)

We'll have two alternatives for the exercise:

Exercise: Monday, 12.00 – 13.30, A5, 6, C013
Exercise: Monday, 13.45 – 15.15, A5, 6, C013

Both exercises are equivalent, you are supposed to attend one.

Instructors

Lecture Slides
Lecture slides will be made available here as the course progresses.
13.02.: Organization, Data Preprocessing
27.02.: Ensembles
05.03.: Time Series
18.03.: Neural Networks and Deep Learning
08.04.: Anomaly Detection (and challenge introduction)
23.04.: Hyperparameter Optimization
07.05.: Model Validation
Participation
The course is open to students of the Master Business Informatics, Master Data Science, and Lehramt Informatik.
Registration is done via Portal2.
In case there are more registration than places, places will be allocated automatically by Portal2.

Week	Exercise (Monday)	Lecture (Tuesday)
12.2.	--	Introduction & Data Preprocessing
19.2.	Introduction & Data Preprocessing	--
26.2.	--	Ensembles
4.3.	Ensembles	Time Series
11.3.	Time Series	--
18.3.	--	Neural Networks & Deep Learning
25.3.	--	Easter Break
1.4.	--	Easter Break
8.4.	Neural Networks & Deep Learning	Anomaly Detection & Challenge Kick-off
15.4.	Anomaly Detection	Challenge Session
22.4.	--	Hyperparameter Tuning
29.4.	Hyperparameter Tuning	Challenge Session
6.5.	--	Model Verification
13.5.	Model Verification	Challenge Session
20.5.	--	Challenge Session

Deadlines for the challenge (see here):

May 24th: submission of predictions
May 26th: submission of reports

Literature
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.
Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann.
Bing Liu: Web Data Mining, 2nd Edition, Springer.
Further literature on specific topics will be announced in the lecture.
Software
We will use Python with a number of different packages (scikit-learn, etc.), which will be announced during the exercises.
You are invited to work with other tools (RapidMiner, R, etc.) if you like.
Lecture Videos
Video recordings of the Data Mining II lectures are available here (accessible from within the university network or VPN).

Data Mining II (FSS 2024)

Time and Location

Instructors

Lecture Slides

Participation

Outline

Literature

Software

Lecture Videos

Data and Web Science Group