Data Mining (HWS 2020)
The course provides an introduction to advanced data analysis techniques as a basis for analyzing business data and providing input for decision support systems. The course will cover the following topics:
- The Data Mining Process
- Data Representation and Preprocessing
- Clustering
- Classification
- Regression
- Association Analysis
- Text Mining
The course consists of a lecture together with accompanying practical exercises as well as student team projects. In the exercises the participants will gather initial expertise in applying state of the art data mining tools on realistic data sets. The team projects take place in the last third of the term. Within the projects, students realize more sophisticated data mining projects of personal choice and report about the results of their projects in the form of a written report as well as an oral presentation.
The webpage about the FSS 2020 edition of this course is found in the lecture archive.
Corona Update
The lectures and exercises of this course, as well as the project presentations, will be conducted online via Zoom. If possible, we will provide lecture recordings. For the moment, the exam is planned to be conducted on campus.
Exam Review
The exam review for the exam of FSS2020 will take place on Friday, 9 October 2020, from 11:00–12:00, building B6, 26 room A1.04. Please email Anna Primpeli beforehand so that she can bring a copy of your exam.
There is no second exam for FSS2020. The next opportunity to retake the project and exam is in HWS2020/
Instructors
Time and Location
- Lecture: Wednesday, 10.15 – 11.45, WIM-ZOOM-02
- Exercise 1: Thursday, 12.00 – 13.30, WIM-ZOOM-02 (RapidMiner with Nicolas)
- Exercise 2: Thursday, 13.45 – 15.15, WIM-ZOOM-02 (Python with Sven)
- Exercise 3: Thursday, 15.30 – 17.00, WIM-ZOOM-02 (Python with Ralph)
Note: there are three parallel exercise groups, you are supposed to attend only one.
Final exam
- 75 % written exam
- 25 % project work (20% report, 5% presentation)
Registration
- For attending the course, please register for the lecture in Portal 2. The course is limited to 80 participants. There will be no “first come – first serve”. Students in higher semesters and students that have failed the course in FSS2020 will be preferred, equally ranked students will be drawn randomly.
- We offer three alternative times (Thursdays 12.00, 13.45 and 15.30) for the exercise session. Choose one and attend the exercise at the corresponding time (you don't have to register for it).
Lecture Videos, Slides and Exercises
Slides
- 30.09.: Introduction
- RapidMiner Exercise: Slides | Tasks | Data
- Python Exercise: Notebooks, Task and Data
- 07.10.: Clustering
- RapidMiner Exercise: Slides | Tasks | Data
- Python Exercise: Notebooks, Task and Data
- 14.10.: Classification Part 1
- RapidMiner Exercise: Slides | Tasks
- Python Exercise: Notebooks, Task and Data
- 21.10.: Classification Part 2
- RapidMiner Exercise: Slides | Tasks | Data
- Python Exercise: Notebooks, Task and Data
- 29.10.: Student Project Slides
- 04.11.: Regression
- RapidMiner Exercise: Slides | Tasks | Data
- Python Exercise: Notebooks, Task and Data
- 18.11.: Text Mining
- RapidMiner Exercise: Slides | Tasks | Data
- Python Exercise: Notebooks, Task and Data
- 25.11.: Association Analysis
Additional material (exercise solutions, lecture recordings) will be found in the ILIAS group of the course.
Outline
Since the autumn term 2020 starts later due to the Corona pandemic, we'll have a slightly condensed lecture period.
Week | Wednesday | Thursday |
28.09.2020 | Lecture: Introduction to Data Mining | Exercise: Introduction to Python / RapidMiner |
05.10.2020 | Lecture: Clustering | Exercise: Introduction |
12.10.2020 | Lecture: Classification 1 | Exercise: Clustering |
19.10.2020 | Lecture: Classification 2 | Exercise: Classification 1 |
26.10.2020 | Kick off group projects | Exercise: Classification 2 |
02.11.2020 | Lecture: Regression | Project feedback |
09.11.2020 | Project feedback | Exercise: Regression |
16.11.2020 | Lecture: Text Mining | Project feedback |
23.11.2020 | Lecture: Association Analysis (changed!) | Exercise: Text Mining |
30.11.2020 | Results Presentation(changed!) | Results Presentation |
Important dates for the group projects:
- Monday, 2 November, 23:59: Submission of project proposals
- Wednesday, 23 December, 23:59: Submission of final reports
Literature
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar: Introduction to Data Mining, 2nd Global Edition, Pearson.
Vijay Kotu, Bala Deshpande: Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner. Morgan Kaufmann.
Aurélien Géron: Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly.
Software
Videos and Screen Casts
- Video recordings of the Data Mining I lectures and screen casts of the exercises are available here.
Course Evaluations