Seminar CS715: Solving Complex Tasks using Large Language Models (HWS 2025)
The focus of this semester's seminar are LLM-based agents as well as using LLMs for information integration tasks. The seminar focuses mainly on experimental topics and a small selection of literature topics. The goal of the experimental topics is to verify the utility of specific approaches by applying them to tasks beyond the tasks used in the respective papers for illustration and evaluation. The goal of the literature topics is to summarize the state of the art concerning a specific aspect of the application of LLMs or LLM-based agents well as to compare specific approaches in this area using a systematic set of criteria.
Organization
- This seminar is organized by Prof. Dr. Christian Bizer, Dr. Ralph Peeters, Aaron Steiner.
- The seminar is available for master and bachelor students of the Data Science, Social Data Science, and Business Informatics programs.
Goals
In this seminar, you will
- read, understand, and explore scientific literature
- critically summarize the state-of-the-art concerning your topic
- for experimental topics, experimentally verify the utility of prompt engineering or agent-based methods
- give a presentation about your topic (before the submission of the report)
Schedule
- Please register for the seminar via the centrally-coordinated seminar registration in Portal2.
- After you have been accepted into the seminar, please email us your three preferred topics from the list below. We will assign topics to students according to your preferences.
- Attend the kickoff meeting on September 18th, 10:15 in which we will discuss general requirements for the reports and presentations as well as answer initial questions about the topics.
- You will be assigned a mentor, who provides guidance and one-to-one meetings over the course of the seminar.
- Work individually throughout the semester: explore literature, perform experiments (if you are assigned an experimental topic, also please note: we cannot reimburse you for any LLM API costs incurred), create a presentation, and write a report.
- Give your presentation in a block seminar on November 10th, 2025.
- Write and submit your seminar thesis until January 20th, 2026.
Topics HWS2025
Web Agents
1. Experimental Topic: Advanced Web Agents for Online Shopping
- Ning et al., A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models, in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2025, pp. 6140–6150.
- He et al., WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models, in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024, pp. 6864–6890.
- Peeters et al., WebMall – A Multi-Shop Benchmark for Evaluating Web Agents, arXiv:2508.13024, 2025.
2. Experimental Topic: Token-Efficient Web Agents for Online Shopping
- Chezelles et al., The BrowserGym Ecosystem for Web Agent Research, Transactions on Machine Learning Research, 2024
- Zhou et al., A Survey on Efficient Inference for Large Language Models, arXiv:2404.14294, 2024.
- Peeters et al., WebMall – A Multi-Shop Benchmark for Evaluating Web Agents, arXiv:2508.13024, 2025.
3. Experimental Topic: Evaluating Memory Design for Agents on Long-running Tasks
- Zhang et al., A Survey on the Memory Mechanism of Large Language Model based Agents, ACM Trans. Inf. Syst., 2025.
- Maharana et al., Evaluating Very Long-Term Conversational Memory of LLM Agents, in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024, pp. 13851–13870.
- Wang et al., Augmenting Language Models with Long-Term Memory, Advances in Neural Information Processing Systems, vol. 36, pp. 74530–74543, 2023.
4. Experimental Topic: Robustness of Web Agents under Adversarial Conditions
- Yang et al., GUI‑Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real‑World Anomalies. arXiv:2506.14477, 2025.
- Nitu et al., Machine‑Readable Ads: Accessibility and Trust Patterns for AI Web Agents interacting with Online Advertisements. arXiv:2507.12844, 2025.
- Abuelsaad et al., Agent‑E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems. arXiv:2407.13032, 2024.
5. Literature Topic: Comparison of Agent Benchmarks for Online Shopping
- Peeters et al., WebMall – A Multi-Shop Benchmark for Evaluating Web Agents, arXiv:2508.13024, 2025.
- Wang et al., ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents, arXiv:2508.04266, 2025.
- Lyu et al., DeepShop: A Benchmark for Deep Research Shopping Agents, arXiv:2506.02839, 2025.
6. Experimental Topic: Evaluating Agent-to-Agent Negotiation Interfaces (A2A-NI)
- https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
- https://github.com/a2aproject/A2A
- Derouiche, H., Brahmi, Z., & Mazeni, H. (2025). Agentic AI frameworks: Architectures, protocols, and design challenges. arXiv preprint arXiv:2508.10146
Retrieval-Augmented Generation (RAG)
7. Experimental Topic: RAG over Heterogeneous Corporate Data (E-Mails + Attachments)
- Yu, X., Jian, P., & Chen, C. (2025). TableRAG: A retrieval augmented generation framework for heterogeneous document reasoning. arXiv preprint arXiv:2506.10380
- Choi, N., Byun, G., Chung, A., Paek, E. S., Lee, S., & Choi, J. D. (2025). Reference-aligned retrieval-augmented question answering over heterogeneous proprietary documents. arXiv preprint arXiv:2502.19596
8. Experimental Topic: Optimizing RAG Pipelines with Preprocessing and Structured Representations for the WebMall Use Case
- Lyu et al., DeepShop: A Benchmark for Deep Research Shopping Agents, arXiv:2506.02839, 2025.
- Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, in Advances in Neural Information Processing Systems, 2020, pp. 9459–9474.
9. Literature Topic: Fact Verification using RAG
- Dmonte et al., Claim Verification in the Age of Large Language Models: A Survey. arXiv:2408.14317, 2024.
- Ge et al., Resolving Conflicting Evidence in Automated Fact-Checking: A Study on Retrieval-Augmented LLMs. arXiv:2505.17762, 2025.
10. Experimental Topic: Using LLMs for Generating Dataset Descriptions for Dataset Search
- Zhang et al., AutoDDG: Automated Dataset Description Generation using Large Language Models. arXiv:2502.01050, 2025.
- Brickley et al., Google Dataset Search: Building a Search Engine for Datasets in an Open Web Ecosystem. In The World Wide Web Conference, pp. 1365-1375. 2019.
Entity Matching
11. Experimental Topic: Training Data Labeling Using Agents for Entity Matching
- Peeters et al., Entity Matching using Large Language Models. In Proceedings of the 28th International Conference on Extending Database Technology, 2025
- Barlaug et al., Neural Networks for Entity Matching: A Survey, ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 3, p. 52:1–52:37, 2021.
- Tan et al., Large Language Models for Data Annotation and Synthesis: A Survey, in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 930–957.
12. Experimental Topic: Training Data Generation Using Agents for Entity Matching
- Peeters et al., Entity Matching using Large Language Models. In Proceedings of the 28th International Conference on Extending Database Technology, 2025
- Tan et al., Large Language Models for Data Annotation and Synthesis: A Survey, in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 930–957.
- Nadas et al., Synthetic Data Generation Using Large Language Models: Advances in Text and Code,” IEEE Access, vol. 13, pp. 134615–134633, 2025.
- Yu et al., Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias, Advances in Neural Information Processing Systems, vol. 36, pp. 55734–55784, 2023.
13. Experimental Topic: Group-wise Entity Matching using LLMs
- Wang et al., Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching. In Proceedings of the 31st International Conference on Computational Linguistics, 2025.
- Peeters et al., WDC Products: A Multi-Dimensional Entity Matching Benchmark. In Proceedings of the 27th International Conference on Extending Database Technology, 2024
14. Experimental Topic: Variant Detection in Product Data
- Vidal et al., Learning Variant Product Relationship and Variation Attributes from E-Commerce Website Structures. In Generative AI for E-Commerce at CIKM, 2024.
- West et al., Interpretable Methods for Identifying Product Variants, in Companion Proceedings of the Web Conference 2020, in WWW ’20. 2020, pp. 448–453.
Getting started
The following survey articles and tutorial are good starting points for getting an overview of the topics of the seminar:
- Wang, et al: A Survey on Large Language Model based Autonomous Agents. arXiv:2308.11432, 2024
- Sager et al.: AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants. arXiv:2501.16150, 2025
- Zhao, et al.: A survey of Large Language Models. arXiv:2303.18223, 2024