Seminar CS715: Solving Complex Tasks using Large Language Models (FSS 2025)

The focus of this semester's seminar are LLM-based agents. The seminar features literature as well as experimental topics. The goal of the literature topics is to summarize the state of the art concerning a specific aspect of LLM-based agents well as to compare specific approaches in this area using a systematic set of criteria. The goal of the experimental topics is to verify the utility of specific approaches by applying them to tasks beyond the tasks used in the respective papers for illustration and evaluation.

Organization

This seminar is organized by Prof. Dr. Christian Bizer, Dr. Ralph Peeters, Aaron Steiner.
The seminar is available for master and bachelor students of the Data Science, Social Data Science, and Business Informatics programs.
Slides of the kick-off meeting (PDF, 3 MB)

Goals

In this seminar, you will

read, understand, and explore scientific literature
critically summarize the state-of-the-art concerning your topic
for experimental topics, experimentally verify the utility of prompt engineering or agent-based methods
give a presentation about your topic (before the submission of the report)

Schedule

Please register for the seminar via the centrally-coordinated seminar registration in Portal2.
After you have been accepted into the seminar, please email us your three preferred topics from the list below. We will assign topic to students according to your preferences.
Attend the kickoff meeting on February 25th, 15:30 in which we will discuss general requirements for the reports and presentations as well as answer initial questions about the topics
You will be assigned a mentor, who provides guidance and one-to-one meetings
Work individually throughout the semester: explore literature, perform experiments (if you are assigned an experimental topic), create a presentation, and write a report
Give your presentation in a block seminar on April 28th, 2025.
Write and submit your seminar thesis until June 2025.

Topics

1. Literature Topic: Generalist Agents and Agent Benchmarks

Sager et al.: AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants. arXiv:2501.16150, 2025.
Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.
Xu et al.: TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. arXiv:2412.14161, 2024.

2. Literature Topic: Web Agents and Web Agent Benchmarks

De Chezelles et al.: The BrowserGym Ecosystem for Web Agent Research. arXiv:2412.05467, 2024.
Zhou et al.: WebArena: A Realistic Web Environment for Building Autonomous Agents. arXiv:2307.13854, 2024.

3. Experimental Topic: Agents for Computer Use

DeepLearning.AI & Anthropic: Building Toward Computer Use with Anthropic – Lesson 1: Introduction. [Online Course]. Available at: https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/1/introduction
Sager et al.: AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants. arXiv:2501.16150, 2025.
Anthropic Documentation: Build with Claude for Computer Use. Available at: https://docs.anthropic.com/en/docs/build-with-claude/computer-use
FranceDot: ACU – Agents for Computer Use GitHub Repository. Available at: https://github.com/francedot/acu

4. Experimental Topic: LLM Agent Development Frameworks

Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.
Wang: OpenHands et al.: An Open Platform for AI Software Developers as Generalist Agents. arXiv:2407.16741, 2024.
Yang et al.: LLM-based Multi-Agent Systems: Techniques and Business Perspectives. arXiv:2411.14033v2, 2024.

5. Experimental Topic: Evaluating the Planning Capabilities of LLM Agents

Huang et al.: Understanding the Planning of LLM Agents: A survey. arXiv:2402.02716, 2024.
Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.

6. Literature Topic: Safety of LLM Agents

Gan et al.: Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents. arXiv:2411.09523, 2024.
Zhang et al.: Agent-SavetyBench: Evaluating the Safety of LLM Agents. arXiv:2412.14470, 2024.
Levy et al.: ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents. arXiv:2410.06703, 2024.

7. Literature Topic: Energy Efficiency of LLMs and LLM Agents

Wu et al.: Addressing the Sustainable AI Trilemma: A Case Study on LLM Agents and RAG, arXiv:2501.08262, 2025.
Zhou et al.: A Survey on Efficient Inference for Large Language Models, arXiv:2404.14294, 2024.
Argerich et al.: Measuring and Improving the Energy Efficiency of Large Language Models Inference, IEEE Access, vol. 12, pp. 80194–80207, 2024.

8. Experimental Topic: Energy Efficiency of LLM Agents

Wu et al.: Addressing the Sustainable AI Trilemma: A Case Study on LLM Agents and RAG, arXiv:2501.08262, 2025.
Zhou et al.: A Survey on Efficient Inference for Large Language Models, arXiv:2404.14294, 2024.
Argerich et al.: Measuring and Improving the Energy Efficiency of Large Language Models Inference, IEEE Access, vol. 12, pp. 80194–80207, 2024.

9. Experimental Topic: LLMs as Evaluators for LLM Agents

Xu et al.: TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. arXiv:2412.14161, 2024.
Fourney et al.: Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. arXiv:2411.04468, 2024.
Zhang et al.: LLMEval: A Preliminary Study on How to Evaluate Large Language Models. arXiv:2312.07398v2, 2023.

10. Experimental Topic: Information Extraction from Web Pages using LLMs

Brinkmann, et al.: ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction, Information Integration and Web Intelligence, Springer Nature Switzerland, pp. 38–52, 2025.
Zou et al.: EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM, arXiv:2404.08886, 2024.
Zhang et al.: Vision-Language Models for Vision Tasks: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5625–5644, 2024.

11. Literature Topic: Vision LLMs and their Evaluation

J. Zhang et al.: Vision-Language Models for Vision Tasks: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5625–5644, 2024.
J. Huang et al.: A Survey on Evaluation of Multimodal Large Language Models, arXiv:2408.15769, 2024.
Z. Li et al.: Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey, arXiv:2501.02189, 2025.

12. Literature Topic: From Supervised to Reinforcement Fine-tuning of LLMs

S. Minaee et al.: Large Language Models: A Survey, arXiv:2402.06196, 2024.
S. Wang et al.: Reinforcement Learning Enhanced LLMs: A Survey, arXiv: arXiv:2412.10400, 2024.
DeepSeek-AI: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arXiv:2501.12948, 2025.

13. Experimental Topic: WebRAG for Entity Matching

N. Barlaug et al.: Neural Networks for Entity Matching: A Survey, ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 3, p. 52:1–52:37, 2021.
Y. Gao et al.: Retrieval-Augmented Generation for Large Language Models: A Survey, arXiv:2312.10997, 2024.
W. Xie et al.: WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs, arXiv:2408.07611, 2024.

Getting started

The following survey articles and tutorial are good starting points for getting an overview of the topics of the seminar:

Wang, et al: A Survey on Large Language Model based Autonomous Agents. arXiv:2308.11432, 2024
Sager et al.: AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants. arXiv:2501.16150, 2025
Zhao, et al.: A survey of Large Language Models. arXiv:2303.18223, 2024