Seminar CS715: Solving Complex Tasks using Large Language Models (FSS 2025)
The focus of this semester's seminar are LLM-based agents. The seminar features literature as well as experimental topics. The goal of the literature topics is to summarize the state of the art concerning a specific aspect of LLM-based agents well as to compare specific approaches in this area using a systematic set of criteria. The goal of the experimental topics is to verify the utility of specific approaches by applying them to tasks beyond the tasks used in the respective papers for illustration and evaluation.
Organization
- This seminar is organized by Prof. Dr. Christian Bizer, Dr. Ralph Peeters, Aaron Steiner.
- The seminar is available for master and bachelor students of the Data Science, Social Data Science, and Business Informatics programs.
- Slides of the kick-off meeting
Goals
In this seminar, you will
- read, understand, and explore scientific literature
- critically summarize the state-of-the-art concerning your topic
- for experimental topics, experimentally verify the utility of prompt engineering or agent-based methods
- give a presentation about your topic (before the submission of the report)
Schedule
- Please register for the seminar via the centrally-coordinated seminar registration in Portal2.
- After you have been accepted into the seminar, please email us your three preferred topics from the list below. We will assign topic to students according to your preferences.
- Attend the kickoff meeting on February 25th, 15:30 in which we will discuss general requirements for the reports and presentations as well as answer initial questions about the topics
- You will be assigned a mentor, who provides guidance and one-to-one meetings
- Work individually throughout the semester: explore literature, perform experiments (if you are assigned an experimental topic), create a presentation, and write a report
- Give your presentation in a block seminar on April 28th, 2025.
- Write and submit your seminar thesis until June 2025.
Topics
1. Literature Topic: Generalist Agents and Agent Benchmarks
- Sager et al.: AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants. arXiv:2501.16150, 2025.
- Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.
- Xu et al.: TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. arXiv:2412.14161, 2024.
2. Literature Topic: Web Agents and Web Agent Benchmarks
- De Chezelles et al.: The BrowserGym Ecosystem for Web Agent Research. arXiv:2412.05467, 2024.
- Zhou et al.: WebArena: A Realistic Web Environment for Building Autonomous Agents. arXiv:2307.13854, 2024.
3. Experimental Topic: Agents for Computer Use
- DeepLearning.AI & Anthropic: Building Toward Computer Use with Anthropic – Lesson 1: Introduction. [Online Course]. Available at: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/1/introduction
- Sager et al.: AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants. arXiv:2501.16150, 2025.
- Anthropic Documentation: Build with Claude for Computer Use. Available at: https://docs.anthropic.com/en/docs/build-with-claude/computer-use
- FranceDot: ACU – Agents for Computer Use GitHub Repository. Available at: https://github.com/francedot/acu
4. Experimental Topic: LLM Agent Development Frameworks
- Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.
- Wang: OpenHands et al.: An Open Platform for AI Software Developers as Generalist Agents. arXiv:2407.16741, 2024.
- Yang et al.: LLM-based Multi-Agent Systems: Techniques and Business Perspectives. arXiv:2411.14033v2, 2024.
5. Experimental Topic: Evaluating the Planning Capabilities of LLM Agents
- Huang et al.: Understanding the Planning of LLM Agents: A survey. arXiv:2402.02716, 2024.
- Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.
6. Literature Topic: Safety of LLM Agents
- Gan et al.: Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents. arXiv:2411.09523, 2024.
- Zhang et al.: Agent-SavetyBench: Evaluating the Safety of LLM Agents. arXiv:2412.14470, 2024.
- Levy et al.: ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents. arXiv:2410.06703, 2024.
7. Literature Topic: Energy Efficiency of LLMs and LLM Agents
- Wu et al.: Addressing the Sustainable AI Trilemma: A Case Study on LLM Agents and RAG, arXiv:2501.08262, 2025.
- Zhou et al.: A Survey on Efficient Inference for Large Language Models, arXiv:2404.14294, 2024.
- Argerich et al.: Measuring and Improving the Energy Efficiency of Large Language Models Inference, IEEE Access, vol. 12, pp. 80194–80207, 2024.
8. Experimental Topic: Energy Efficiency of LLM Agents
- Wu et al.: Addressing the Sustainable AI Trilemma: A Case Study on LLM Agents and RAG, arXiv:2501.08262, 2025.
- Zhou et al.: A Survey on Efficient Inference for Large Language Models, arXiv:2404.14294, 2024.
- Argerich et al.: Measuring and Improving the Energy Efficiency of Large Language Models Inference, IEEE Access, vol. 12, pp. 80194–80207, 2024.
9. Experimental Topic: LLMs as Evaluators for LLM Agents
- Xu et al.: TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. arXiv:2412.14161, 2024.
- Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.
- Zhang et al.: LLMEval: A Preliminary Study on How to Evaluate Large Language Models. arXiv:2312.07398v2, 2023.
10. Experimental Topic: Information Extraction from Web Pages using LLMs
- Brinkmann, et al.: ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction, Information Integration and Web Intelligence, Springer Nature Switzerland, pp. 38–52, 2025.
- Zou et al.: EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM, arXiv:2404.08886, 2024.
- Zhang et al.: Vision-Language Models for Vision Tasks: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5625–5644, 2024.
11. Literature Topic: Vision LLMs and their Evaluation
- J. Zhang et al.: Vision-Language Models for Vision Tasks: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5625–5644, 2024.
- J. Huang et al.: A Survey on Evaluation of Multimodal Large Language Models, arXiv:2408.15769, 2024.
- Z. Li et al.: Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey, arXiv:2501.02189, 2025.
12. Literature Topic: From Supervised to Reinforcement Fine-tuning of LLMs
- S. Minaee et al.: Large Language Models: A Survey, arXiv:2402.06196, 2024.
- S. Wang et al.: Reinforcement Learning Enhanced LLMs: A Survey, arXiv: arXiv:2412.10400, 2024.
- DeepSeek-AI: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arXiv:2501.12948, 2025.
13. Experimental Topic: WebRAG for Entity Matching
- N. Barlaug et al.: Neural Networks for Entity Matching: A Survey, ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 3, p. 52:1–52:37, 2021.
- Y. Gao et al.: Retrieval-Augmented Generation for Large Language Models: A Survey, arXiv:2312.10997, 2024.
- W. Xie et al.: WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs, arXiv:2408.07611, 2024.
Getting started
The following survey articles and tutorial are good starting points for getting an overview of the topics of the seminar:
- Wang, et al: A Survey on Large Language Model based Autonomous Agents. arXiv:2308.11432, 2024
- Sager et al.: AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants. arXiv:2501.16150, 2025
- Zhao, et al.: A survey of Large Language Models. arXiv:2303.18223, 2024