SM445/ CS 707: Seminar “Machine Learning at Scale” (HWS 2025)
This term's data analytics seminar focuses on: Machine Learning at Scale.
Efficiency and scale is key in the many modern ML applications, which may involve large training datasets, large amounts of compute, large models, high inference costs, large context windows, retrieval components accessing large vector databases, low-latency high-throughput inference, deployment on edge devices, and so on. In this seminar, we will be looking at techniques that make this possible, including parallel processing and various approaches for cost reduction.
Schedule
Please find the schedule here (PDF, 132 kB).
Organization
- This seminar is organized by Prof. Dr. Rainer Gemulla, Simon Forbat, and Julie Naegelen.
- Available for up to 8 Master students (4 ECTS) and up to 4 Bachelor students (5 ECTS).
- Prerequisites: Solid background in machine learning (MSc students), Wirtschaftsinformatik IV (BSc students)
Goals
In this seminar, you will
- Read, understand, and explore scientific literature
- Summarize a current research topic in a concise report (10 single-column pages + references)
- Give two presentations about your topic (3 minutes flash presentation, 15 minutes final presentation)
- Moderate a scientific discussion about the topic of one of your fellow students
- Review (drafts of) reports of fellow students
Registration
Please register via Portal2 until September 1st.
If you are accepted into the seminar, attend the kickoff on Sept 9th and provide at least 4 topic areas of interest of your preference (your own and / or example topic areas; see below) by September 14th via email to Julie Naegelen.
The actual topic assignment takes place soon afterwards; we will notify you via email. Our goal is to assign one of your preferred topic areas to you.
Topic areas and topics
You will be assigned a larger topic area in an active, relevant field of machine learning based your preferences. Your goals in this seminar are
- Provide a short, concise overview of this topic area (1/4). A good starting point may be a book chapter, survey paper, or recent research paper. Here you take a birds-eyes view and are expected to discuss the main goals, challenges, and relevance of your topic area. Topic areas are selected at the beginning of the seminar.
- Present a self-selected topic within this topic area in more detail (3/4). A good starting point is a recent or highly-influential research paper. Here you dive deep into one particular topic and are expected to discuss and explain the concrete problem statement, concrete solution or contribution, as well as your own thoughts. The actual topic is selected before the first tutor meeting.
You are generally free to propose your topic area of interest as long as it aligns with the overall theme and objectives of the seminar.
Potential topic areas include:
- Parallel training
- Parallel inference
- Data management
- Lifecycle management
- Cost-savings (quantization, model distillation)
- Vector databases
- Systems for machine learning
- Sampling
- Federated learning
- Mixture of Experts (MoE)
Important Conferences and Journals
Find below a list of some of the most important conferences/
- International Conference on Machine Learning (ICML)
- Conference on Neural Information Processing Systems (NeurIPS)
- International Conference on Learning Representations (ICLR)
- Annual Meeting of the Association for Computational Linguistics (ACL)
- International Conference on Computer Vision (ICCV)
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
- AAAI Conference on Artificial Intelligence (AAAI)
- International Joint Conferences on Artificial Intelligence (IJCAI)
- International Conference on Knowledge Discovery and Data Mining (KDD)
- …
These websites typically allow to browse their papers and provide search options.
Supplementary materials and references
- “Giving Conference Talks” (PDF, 1 MB) by Prof. Dr. Rainer Gemulla
- "Writing for Computer Science" by Justin Zobel, Springer, 2014