a visualisation of the internet as a graph of connected ip addresses

SM445/CS 707: Seminar “Machine Learning at Scale” (HWS 2025)

This term's seminar: Machine Learning at Scale.

Scale is key in the effectiveness of most modern ML applications: be it large training datasets to ensure generalisation, scaling laws/large context windows for transformers, training on huge data streams, storing and retrieving from huge (vector) databases, or serving millions of requests at once when the service is shipped to production. In this seminar we will be looking at the technical and implementation background to make this possible. 

Organization

  • This seminar is organized by Prof. Dr. Rainer Gemulla, Simon Forbat, and Julie Naegelen.
  • Available for up to 8 Master students (4 ECTS) and up to 4 Bachelor students (5 ECTS).
  • Prerequisites: Solid background in machine learning (MSc students), Einführung in Data Science (BSc students)

Goals

In this seminar, you will

  • Read, understand, and explore scientific literature
  • Summarize a current research topic in a concise report (10 single-column pages + references)
  • Give two presentations about your topic (3 minutes flash presentation, 15 minutes final presentation)
  • Moderate a scientific discussion about the topic of one of your fellow students
  • Review a (draft of a) report of a fellow student

Schedule

TBD

Registration

Please register via Portal 2 until TBD

If you are accepted into the seminar, provide at least 4 topics areas of your preference (your own and / or example topics; see below) by TBD via email to TBD. The actual topic assignment takes place soon afterwards; we will notify you via email. Our goal is to assign one of your preferred topic areas to you.

Topic areas and topics

You will be assigned a topic area in an active, relevant field of machine learning based your preferences. Your goals in this seminar are

  1. Provide a short, concise overview of this topic area (1/4).  A good starting point may be a book chapter, survey paper, or recent research paper. Here you take a birds-eyes view and are expected to discuss the main goals, challenges, and relevance of your topic area. Topic areas are selected at the beginning of the seminar.
  2. Present a self-selected topic within this area in more detail (3/4). A good starting point is a recent or highly-influential research paper. Here you dive deep into one particular topic and are expected to discuss and explain the concrete problem statement, concrete solution or contribution, as well as your own thoughts. The actual topic is selected before the first tutor meeting.

You are generally free to propose your topic area of interest as long as it aligns with the overall theme and objectives of the seminar.

Suggested topics, grouped by area: TBD; for example:

  1. Parallel training
  2. Parallel inference
  3. Data management  
  4. Lifecycle management
  5. Cost-savings (quantization, model distillation)  
  6. Vector databases
  7. Systems for machine learning  
  8. Sampling  
  9. Federated learning

Supplementary materials and references