Data Mining II (FSS 2023)

Building on the Data Mining fundamentals course, this course deepens the theory and practice of advanced data mining topics, such as:

  • Data Preprocessing
  • Dimensionality Reduction
  • Anomaly Detection
  • Time Series Analysis and Forecasting
  • Parameter Tuning
  • Ensemble Methods
  • Neural Networks and Deep Learning
  • Model Validation

The course consists of a lecture together with accompanying practical exercises as well as student team projects.  In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets.

Like in the previous years, participants will take part in the annual Data Mining Cup (DMC), an international student competition in data mining, as part of the project work. In addition to the DMC submission, the approaches and results of the project have to be compiled into a written project report, and presented in a plenary session.

Time and Location

Lecture:

  • Tuesday, 13.45 – 15.15,  SN 163 (starts on February 21st)

We'll have two alternatives for the exercise:

  • Exercise: Monday, 12.00 – 13.30, A5, 6, C012
  • Exercise: Monday, 13.45 – 15.15, A5, 6, C012

The exercises start on February 27th.

All exercises are equivalent, you are supposed to attend one.

  • Lecture Slides

  • Participation

    • The course is open to students of the Master Business Informatics, Master Data Science, and Lehramt Informatik.
    • Registration is done via Portal2.
    • In case there are more registration than places, places will be allocated automatically by Portal2.
  • Outline

    WeekLectureExercise
    14.2.----
    21.2.Introduction & Data Preprocessing--
    28.2.EnsemblesIntroduction & Data Preprocessing
    7.3.Time SeriesEnsembles
    14.3.Neural Networks & Deep LearningTime Series
    21.3.New: KDD Cup Kick Off

    Neural Networks & Deep Learning

    28.3.Hyperparameter Tuning

    --

    4.4.Easter Break

    Easter Break

    11.4.Easter BreakEaster Break
    18.4.

    KDD Cup

    Hyperparameter Tuning
    25.4.Model Verification

    --

    2.5.KDD CupHoliday
    9.5.Anomaly DetectionModel Verification
    16.5.KDD CupAnomaly Detection
    23.5.KDD Cup--

    KDD Cup Timeline: see here

  • Literature

    1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson.

    2. Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann.

    3. Bing Liu: Web Data Mining, 2nd Edition, Springer.

    Further literature on specific topics will be announced in the lecture.

  • Software

    • We will use Python with a number of different packages (scikit-learn, etc.), which will be announced during the exercises.
    • You are invited to work with other tools (RapidMiner, R, etc.) if you like.
  • Lecture Videos

    • Video recordings of the Data Mining II lectures are available here (accessible from within the university network or VPN).