Vier Studierende stehen in der Eingangshalle des B6-Gebäudes

Seminar CS715: Solving Complex Tasks using Large Language Models (FSS 2026)

The focus of this semester's seminar are LLM agents as well as retrieval augmented generation (RAG). The seminar focuses mainly on experimental topics and a small selection of literature topics. The goal of the experimental topics is to verify the utility of specific approaches by applying them to tasks beyond the tasks used in the respective papers for illustration and evaluation. The goal of the literature topics is to summarize the state-of-the-art concerning a specific aspect of the application of LLMs or LLM-based agents and to compare specific approaches in this area using a systematic set of criteria.

Organization

Goals

In this seminar, you will

  • read, understand, and explore scientific literature
  • critically summarize the state-of-the-art concerning your topic
  • for experimental topics, experimentally verify the utility of prompt engineering or  agent-based methods
  • give a presentation about your topic (before the submission of the report)

Schedule

  1. Please register for the seminar via the centrally-coordinated seminar registration in Portal2.
  2. After you have been accepted into the seminar, please email your three preferred topics from the list below to Aaron. We will assign topics to students according to your preferences.
  3. Attend the kickoff meeting on February 24th. In the kickoff meeting we will discuss general requirements for the reports and presentations as well as answer initial questions about the topics.
  4. You will be assigned a mentor, who provides guidance and one-to-one meetings over the course of the seminar.
  5. Work individually throughout the semester: explore literature, perform experiments (if you are assigned an experimental topic, also please note: we cannot reimburse you for any LLM API costs incurred), create a presentation, and write a report.
  6. Give your presentation in a block seminar on May 4th, 2026.
  7. Write and submit your seminar thesis until July 3rd, 2026.

Topics FSS2026

Retrieval-Augmented Generation (RAG)

1. Experimental Topic: Wikipedia Article Generation Using Web-RAG

  • Zhang, J. et al., 2025. WIKIGENBENCH: Exploring full-length Wikipedia generation under real-world scenario. In Proceedings of the 31st International Conference on Computational Linguistics, pp. 5191-5210.
  • Yang, Z., et al., 2025. WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation. arXiv preprint arXiv:2503.19065
  • Reeves, N. and Simperl, E., 2025. Machines in the Margins: A Systematic Review of Automated Content Generation for Wikipedia. Proceedings of the ACM on Human-Computer Interaction9(7), pp.1–30.

2. Experimental Topic: Verifying Scientific Claims using Web-RAG Agents

  • Wadden, D. et al., 2020. Fact or Fiction: Verifying Scientific Claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550.
  • Dmonte et al. 2024, Claim Verification in the Age of Large Language Models: A Survey. arXiv:2408.14317, 2024.
  • Asai, A., He, J., Shao, R. et al. Synthesizing scientific literature with retrieval-augmented language models. Nature (2026). 

3. Experimental Topic: Related Work Section Generation using Web-RAG

  • Zhengliang Shi, Z.,  et al. 2023. Towards a Unified Framework for Reference Retrieval and Related Work Generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5785–5799.
  • Zhang, Z.,  et al. 2025. Mixture of Knowledge Minigraph Agents for Literature Review Generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, No. 24, pp. 26012-26020.
  • Luo, Z., et al., 2025. Llm4sr: A survey on large language models for scientific research. arXiv preprint arXiv:2501.04306.
  • Asai, A., He, J., Shao, R. et al. Synthesizing scientific literature with retrieval-augmented language models. Nature (2026). 

4. Experimental Topic: RAG-Driven Data Cleaning with PyDI

  • Ahmad, M.S. et al., 2023. RetClean: Retrieval-based Data Cleaning using Foundation Models and Data Lakes. arXiv preprint arXiv:2303.16909.
  • Chen, M. et al., 2025. Empowering Tabular Data Preparation with Language Models: Why and How?. arXiv preprint arXiv:2508.01556.
  • https://github.com/wbsg-uni-mannheim/PyDI

LLM Agents

5. Experimental Topic: LLMs (Agents) for Data Normalization

  • Brinkmann, A., Baumann, N. and Bizer, C., 2024. Using LLMs for the Extraction and Normalization of Product Attribute Values. In Advances in Databases and Information Systems (ADBIS 2024). Lecture Notes in Computer Science, vol 14918. Springer, Cham, pp.217–230.
  • Chen, M. et al., 2025. Empowering Tabular Data Preparation with Language Models: Why and How?. arXiv preprint arXiv:2508.01556.
  • https://github.com/wbsg-uni-mannheim/PyDI

6. Experimental Topic: Small vs. Large LLMs for Training Data Generation for Entity Matching

  • Zhang, Z. et al., 2025. A Deep Dive Into Cross-Dataset Entity Matching with Large and Small Language Models. In Proceedings of the 28th International Conference on Extending Database Technology.
  • Tan, Z. et al., 2024. Large Language Models for Data Annotation and Synthesis: A Survey. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 930–957).
  • https://github.com/wbsg-uni-mannheim/MatchGPT/tree/main/LLMForEM

7. Literature Topic: Query Answering over Data Lakes containing Structured Data and Documents

  • Li, Z. et al., 2025. DocDB: A Database for Unstructured Document Analysis. Proceedings of the VLDB Endowment, 18(12), pp.5387-5390.
  • Shankar, S. et al., 2025. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing. Proceedings of the VLDB Endowment, 18(9), pp.3035-3048.
  • Sun, Z. et al., 2025. QUEST: Query Optimization in Unstructured Document Analysis. Proceedings of the VLDB Endowment, 18(11), pp.4560-4573.

8. Experimental Topic: Reducing the Resource Consumption of LLM Agents

  • Du, S. et al., 2025. A Survey on the Optimization of Large Language Model-based Agents. arXiv preprint arXiv:2503.12434.
  • Zhang, Q. et al., 2025. Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents. In The Thirty-ninth Annual Conference on Neural Information Processing Systems.

9. Experimental Topic: Resource-efficient Agentic Plan Caching for Entity Matching

  • Zhang, Q. et al., 2025. Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents. In The Thirty-ninth Annual Conference on Neural Information Processing Systems.
  • Peeters, R. et al., 2025. Entity Matching using Large Language Models. In Proceedings of the 28th International Conference on Extending Database Technology.
  • https://github.com/wbsg-uni-mannheim/PyDI

10. Experimental Topic: Effectiveness and Efficiency of Data Serialization Formats for LLMs

  • Yang, J. et al., 2025. StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs. arXiv preprint arXiv:2505.20139.
  • TOON Format, 2024. Token-Oriented Object Notation. Available at: https://github.com/toon-format/toon
  • ZON Format, 2024. Zero Overhead Notation. Available at: https://github.com/ZON-Format/ZON
  • Other formats that should be compared:  JSON, XML, Markup Tables, ….

11. Experimental Topic: Descriptive Agent Trajectory Mining: What did the agent do?

  • Mohammadi, M. et al., 2025: Evaluation and Benchmarking of LLM Agents: A Survey. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025).
  • Ou, T. et al., 2025: AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.
  • Peeters, R. et al., 2025: WebMall--A Multi-Shop Benchmark for Evaluating Web Agents. arXiv preprint arXiv:2508.13024.
  • van der Aalst, W., 2016: Process Mining – Data Science in Action. Springer.

12. Experimental Topic: Diagnostic Agent Trajectory Mining: How do agents fail?

  • Mohammadi, M. et al., 2025: Evaluation and Benchmarking of LLM Agents: A Survey. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025).
  • Zhu, et el., 2025: Where do LLM agents fails and how can they learn from failures? arXiv:2509.25370.
  • Peeters, R. et al., 2025: WebMall--A Multi-Shop Benchmark for Evaluating Web Agents. arXiv preprint arXiv:2508.13024.
  • van der Aalst, W., 2016: Process Mining – Data Science in Action. Springer.

Getting started

The following survey articles and tutorial are good starting points for getting an overview of the topics of the seminar: