Data Mining (FSS 2023)
The course provides an introduction to advanced data analysis techniques as a basis for analyzing business data and providing input for decision support systems. The course will cover the following topics:
- The Data Mining Process
- Data Representation and Preprocessing
- Clustering
- Classification
- Regression
- Association Analysis
- Text Mining
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets. The team projects take place in the last third of the term. Within the projects, students realize more sophisticated data mining projects of personal choice and report about the results of their projects in the form of a written report as well as an oral presentation.
Exam Review
The exam review for FSS2023 will take place on Wednesday, 13 September 2023, starting from 11:00.
You have to register for the exam review by writing a mail to Alexander Brinkmann until Tuesday, 6 September 2023.
Instructors
Time and Location
- Lecture: Wednesday, 10.15 – 11.45, Room A5 B1.44
- Exercises: Students should attend one of the three exercise groups. The contents are identical.
- Thursday, 10.15 – 11.45, Room B6 A 1.04 (Alex)
- Thursday, 12.00 – 13.30, Room B6 A 1.04 (Ralph)
- Thursday, 13.45 – 15.15, Room B6 A 1.04 (Keti)
Grading
- 75 % written exam (we offer only a single exam and no re-take as the course is offered every semester)
- 25 % project work (20% report, 5% presentation)
Registration
- For attending the course, please register for the lecture in Portal 2. The course is limited to 90 participants. There will be no “first come – first serve”. Students in higher semesters and students that have failed the course in HWS2022 will be preferred, equally ranked students will be drawn randomly.
- You don't have to register for the Exercise.
Outline
Week | Wednesday | Thursday |
15.02.2023 | Lecture: Introduction to Data Mining | Exercise: Preprocessing/ |
22.02.2023 | Lecture: Cluster Analysis | Exercise: Cluster Analysis |
01.03.2023 | Lecture: Classification 1 | Exercise: Classification |
08.03.2023 | Lecture: Classification 2 | Exercise: Classification |
15.03.2023 | Lecture: Classification 3 | Exercise: Classification |
22.03.2023 | Lecture: Regression | Exercise: Regression |
29.03.2023 | Lecture: Text Mining | Exercise: Text Mining |
- Easter Break - | ||
19.04.2023 | Introduction to the Student Projects and Group Formation | Preparation of project outline |
26.04.2023 | Lecture: Association Analysis | Exercise: Association Analysis |
03.05.2023 | Feedback on project outlines | Project Work |
10.05.2023 | Project Work | Feedback on demand |
17.05.2023 | Project Work | Feedback on demand |
24.05.2023 | Project Work | Feedback on demand |
28.05.2023 | Submission of project reports (Deadline: 23:59) | |
31.05.2023 | Presentation of project results | |
XX.06.2023 | Final exam |
For all students which are not familiar with Python/
Lecture Slides and Exercises
Lecture Slides:
- 15.02.2023: Slideset Introduction and Organization
- 22.02.2023: Slideset Cluster Analysis
- 01.03.2023: Slideset Classification Part 1
- 08.03.2023: Slideset Classification Part 2
- 15.03.2023: Slideset Classification Part 3
- 22.03.2023: Slideset Regression
- 29.03.2023: Slideset Text Mining
- 19.04.2023: Slideset Introduction to the Student Projects
- 19.04.2023: Document Example Exam Questions
- 26.04.2023: Slideset Association Analysis
Exercises:
- 15.02.2023 – Introduction to Python (Slides and Notebooks | Solution)
- 16.02.2023 – Simple Preprocessing and Visualization (Task and Notebooks | Solution)
- 23.02.2023 – Cluster Analysis (Task and Notebooks | Solution)
- 02.03.2023 – Classification I (Task and Notebooks | Solution)
- 09.03.2023 – Classification II (Task and Notebooks | Solution)
- 16.03.2023 – Classification III (Task and Notebooks | Solution)
- 23.03.2023 – Regression (Task and Notebooks | Solution)
- 30.03.2023 – Text Mining (Task and Notebooks | Solution)
- 27.04.2023 – Association Analysis (Task and Notebooks | Solution)
Additional material will be found in the ILIAS group of the course.
Literature
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar: Introduction to Data Mining, 2nd Global Edition, Pearson.
Aurélien Géron: Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly.
Software
Videos and Screen Casts
- Video recordings of the Data Mining I lectures and screen casts of the exercises are available here.
Course Evaluations