Data Mining II
Important: This is the Web page of the course which took place in FSS 2019. The Web page of the most recent course can be found here.
Building on the Data Mining fundamentals course, this course deepens the theory and practice of advanced data mining topics, such as:
- Data Preprocessing
- Regression and Forecasting
- Dimensionality Reduction
- Anomaly Detection
- Time Series Analysis
- Parameter Tuning
- Ensemble Methods
- Deep Learning
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets.
Like in the previous years, participants will take part in the annual Data Mining Cup (DMC), an international student competition in data mining, as part of the project work. In addition to the DMC submission, the approaches and results of the project have to be compiled into a written project report, and presented in a plenary session.
Time and Location
Lecture:
- Tuesday, 13.45 – 15.15, EO 145 (Schloss Ehrenhof Ost / castle) <- changed!
We'll have two alternatives for the exercise:
- Exercise: Monday, 10.15 – 11.45, A 5, 6, C012
- Exercise: Monday, 12.00 – 13.30, A5, 6, C015
Both of these dates are offered, and you have to decide for one.
Instructors
Final exam
Unlike in the previous years (and unlike, e.g., Data Mining 1), the project is not graded. Your final grade will be based solely and entirely on the final exam.
The exam review for the first exam from FSS2019 will take place on Monday, 19 August, at 8am in B6 C1.01.
The exam review for the second exam from FSS2019 will take place on Thursday, 19 September, at 10am in B6 C1.01.
Slides and Excercises
Slides
- 19.02.: Organization, Data Preprocessing
- 26.02.: Regression
- 05.03.: Anomaly Detection
- 12.03.: Ensembles
- 19.03. Time Series
- 26.03. Neural Networks and Deep Learning
- 02.04. Parameter Tuning, Introduction to the Data Mining Cup
Exercises
Participation
- The course is open to students of the Master Business Informatics, Master Data Science, and Lehramt Informatik.
- Registration is done via Portal2.
- In case there are more registration than places (64), places will be allocated automatically by Portal2.
Outline
Note:Since the introduction to team projects takes place on 12 February during the lecture slot, and some of you may want to attend, we'll start the lecture in the second week, i.e., February, 18th. The exercises will then begin on February, 25th.
Week Topic 11.2. No Lecture (due to introduction of team projects)
18.2. Organization, Preprocessing
25.2. Regression 4.3. Anomaly Detection
11.3. Ensemble Methods
18.3. Neural Networks 25.3. Time Series 1.4. Parameter Tuning
8.4. DMC intermediate presentation
15.4. Easter Break 22.4. Easter Break 29.4. DMC intermediate presentation
6.5. DMC intermediate presentation
13.5. DMC final selection
Timeline Data Mining Cup:
- Team registration: from 5 March 2019
- Task is announced: 4 April 2019
- Deadline for submissions: 16 May 2019
- Presentation & award ceremony: 3 July 2019
Literature
Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.
Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann.
Bing Liu: Web Data Mining, 2nd Edition, Springer.
Further literature on specific topics will be announced in the lecture.
Software
- We will use Python with a number of different packages (scikit-learn, etc.), which will be announced during the exercises.
- You are invited to work with other tools (RapidMiner, R, etc.) if you like.
Lecture Videos
- Video recordings of the Data Mining II lectures are available here (accessible from within the university network or VPN).