Photo credit: Anna Logue

Data Mining (FSS 2020)

The course provides an introduction to advanced data analysis techniques as a basis for analyzing business data and providing input for decision support systems. The course will cover the following topics:

  • The Data Mining Process
  • Data Representation and Preprocessing
  • Clustering
  • Classification
  • Regression
  • Association Analysis
  • Text Mining

The course consists of a lecture together with accompanying practical exercises as well as student team projects.  In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets. The team projects take place in the last third of the term. Within the projects, students realize more sophisticated data mining projects of personal choice and report about the results of their projects in the form of a written report as well as an oral presentation.

The webpage about the HWS 2019 edition of this course is found in the lecture archive.

Corona-related Reorganization of the Course

The lectures and exercises of this course will be continued online until Easter. Depending on whether the on-site teaching at the University of Mannheim is continued after the Easter break or not, the student projects and the project coaching will take place on-site or online. We for now plan to do the project presentations and the exams on-site. 

As we still need to record a video for the Regression lecture, we have shifted this online lecture to the last week before Easter and will continue next week with the lecture and exercises on Association Analysis. See updated schedule below. 

 

  • Instructors

  • Time and Location

    • Lecture: Wednesday, 10.15 - 11.45, Room A5 6,  B144 (Prof. Dr. Christian Bizer)
    • Exercise 1: Thursday, 10.15 - 11.45, Room B6 26, A104 (RapidMiner, Anna Primpeli)
    • Exercise 2: Thursday, 12.00 - 13.30, Room B6 26, A104 (Python, Ralph Peeters)
    • Exercise 3: Thursday, 13.45 - 15.15, Room B6 26, A104 (Python, Ralph Peeters)

    Note: there are three parallel exercise groups, you are supposed to attend only one.

  • Final exam

    • 75 % written exam
    • 25 % project work (20% report, 5% presentation)
  • Registration

    • For attending the course, please register for the lecture in Portal 2. The course is limited to 80 participants. There will be no „first come - first serve“. Students in higher semesters and students that have failed the course in HWS2019 will be preferred, equally ranked students will be drawn randomly.
    • We offer three alternative times (Thursdays 12.00, 13.45 and 15.30) for the exercise session. Choose one and attend the exercise at the corresponding time (you don't have to register for it).

Lecture Videos, Slides and Exercises

Slides:

Exercises:

Additional material will be found in the ILIAS group of the course.

Outline

Week Wednesday Thursday
12.02.2020

Introduction to Data Mining
Introduction to Python (see below table)

Exercise Preprocessing/Visualization

19.02.2020 Lecture Clustering Exercise Clustering
26.02.2020 Lecture Classification 1 Exercise Classification 
04.03.2020 Lecture Classification 2 Exercise Classification 
11.03.2020 Lecture Classification 3 Online Exercise Classification 
18.03.2020 Video Lecture Association Analysis Online Exercise Association Analysis
25.03.2020 Video Lecture Text Mining Online Exercise Text Mining
01.04.2020 Video Lecture Regression Online Exercise Regression
22.04.2020 Introduction to Student Projects 
and Group Formation (hopefully on-site again)
Preparation of Project Outlines
29.04.2020 Feedback on Project Outlines Project Work
06.05.2020 Feedback on demand Project Work
13.05.2020 Feedback on demand Project Work
20.05.2020 Feedback on demand Project Work
27.05.2020 Presentation of project results Presentation of project results

For all students which are not familiar with Python/Jupyter Notebooks, we offer an introduction on Wednesday, February 12th, 2020 between 15:30 and 17:00 in room A5, 6 C 013.