Data Mining II (FSS 2020)

Retake Exam: The second exam will take place on 14 September 2020.

Corona Update:

The lecture will be conducted online via Cisco Webex during the lecture slots (Tuesdays, 1.45 pm). Please join the video conference here: https://heiko-317.my.webex.com/heiko-317.my/j.php?MTID=m11fc300a81473dba84be771e8f23a6fa

Exercises will be provided. We are working on a solution for Q&A session for the exercises.

The Data Mining Cup has been postponed; the task will be out on 14 April. Please see the updated schedule below!

The exam will take place at the announced time and date (i.e., 2 June, 10.30–11.30) as an online open book exam.


Building on the Data Mining fundamentals course, this course deepens the theory and practice of advanced data mining topics, such as:

  • Data Preprocessing
  • Dimensionality Reduction
  • Anomaly Detection
  • Time Series Analysis and Forecasting
  • Parameter Tuning
  • Ensemble Methods
  • Neural Networks and Deep Learning
  • Model Validation

The course consists of a lecture together with accompanying practical exercises as well as student team projects.  In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets.

Like in the previous years, participants will take part in the annual Data Mining Cup (DMC), an international student competition in data mining, as part of the project work. In addition to the DMC submission, the approaches and results of the project have to be compiled into a written project report, and presented in a plenary session.

Time and Location

Lecture:

  • Tuesday, 13.45 – 15.15,  B6 A1.01

We'll have two alternatives for the exercise:

  • Exercise: Monday, 10.15 – 11.45, A 5, 6, C012
  • Exercise: Monday, 12.00 – 13.30, A5, 6, C015

Both of these dates are offered, and you have to decide for one.

Final exam

Unlike in the previous years (and unlike, e.g., Data Mining 1), the project is not graded. Your final grade will be based solely and entirely on the final exam.

  • Slides and Excercises

  • Participation

    • The course is open to students of the Master Business Informatics, Master Data Science, and Lehramt Informatik.
    • Registration is done via Portal2.
    • In case there are more registration than places (64), places will be allocated automatically by Portal2.
  • Outline

    Note:The lecture starts the lecture in the second week, i.e., on February, 18th. The exercises will then begin on February, 24th.

    DateTopic
    18.2.

    Introduction & Data Preprocessing

    25.2.Ensembles
    3.3.Time Series
    10.3.

    Neural Networks & Deep Learning

    17.3.NO LECTURE
    24.3.Hyperparameter Tuning (online lecture)
    31.3.

    Anomaly Detection (online lecture)

    7.4.Easter Break
    14.4.Easter Break
    21.4.

    DMC Task Brainstorming (most likely online)

    28.4.Model Verification (most likely online)
    5.5.DMC project work (tba)
    12.5.DMC project work (tba)
    19.5.DMC project work (tba)
    26.5.DMC project work (tba)

    Data Mining Cup Timeline (see here):

    14.04.: Task Publication

    29.05.: Internal submission of reports and solutions (prequisite for taking part in the exam)

    30.06.: Official submission of solutions

    Note: we will be available for consulting and feedback to those who still want to tune their solutions after the exam period. On 29 June, we will select the two solutions to submit to the DMC.

  • Literature

    1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.

    2. Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann.

    3. Bing Liu: Web Data Mining, 2nd Edition, Springer.

    Further literature on specific topics will be announced in the lecture.

  • Software

    • We will use Python with a number of different packages (scikit-learn, etc.), which will be announced during the exercises.
    • You are invited to work with other tools (RapidMiner, R, etc.) if you like.
  • Lecture Videos

    • Video recordings of the Data Mining II lectures are available here (accessible from within the university network or VPN).