Seminar CS715: Solving Complex Tasks using Large Language Models (HWS 2023/2024)

The topic of the seminar is solving complex tasks using Large Language Models (LLMs). The seminar features literature as well as experimental topics. The goal of the literature topics is to summarize the state of the art concerning the application and evaluation of LLMs. The goal of the experimental topics is to verify the utility of advanced prompt engineering techniques by applying them to tasks beyond the tasks used in the respective papers for illustration and evaluation.

Organization

This seminar is organized by Prof. Dr. Christian Bizer, Dr. Steffen Eger, Alexander Brinkmann.
The seminar is available for master and bachelor students of the Data Science and Business Informatics programs.
Slides of the Kickoff-Meeting

Goals

In this seminar, you will

read, understand, and explore scientific literature
critically summarize the state-of-the-art concerning your topic
experimentally verify the utility of advanced prompt engineering methods
give a presentation about your topic (before the submission of the report)

Topics HWS2023

1. Literature Topic: Explainability of LLMs

Yao et al., Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Turpin et al., Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Lanham et al., Measuring Faithfulness in Chain-of-Thought Reasoning
Radhakrishnan et al., Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

2. Literature Topic: Efficiency of LLMs

Lee et al., Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research
Touvron et al., LLaMA: Open and Efficient Foundation Language Models
Dettmers et al., QLoRA: Efficient Finetuning of Quantized LLMs
Hsieh et al., Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Gu et al., Knowledge Distillation of Large Language Models

3. Literature Topic: Agent-Based Modeling via LLMs

Park et al., Generative Agents: Interactive Simulacra of Human Behavior
Li et al., CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society
Boiko et al., Emergent autonomous scientific research capabilities of large language models
Zhuge et al., Mindstorms in Natural Language-Based Societies of Mind
Wang et al., Interactive Natural Language Processing

4. Literature Topic: LLMs for the Social Sciences

.Ziems et al., Can Large Language Models Transform Computational Social Science?
Feng et al., From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
Hartmann et al., The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation

5. Literature Topic: Limitations of LLMs

Frieder et al., Mathematical Capabilities of ChatGPT
Borji, A Categorical Archive of ChatGPT Failures
Wang et al., Large Language Models are not Fair Evaluators
Schick et al., Toolformer: Language Models Can Teach Themselves to Use Tools
Bang et al., A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

6. Literature Topic: LLMs for Education+Science

Baidoo-Anu et al., Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning
Choi et al., ChatGPT Goes to Law School
Boiko et al., Emergent autonomous scientific research capabilities of large language models
Meyer et al., ChatGPT and large language models in academia: opportunities and challenges

7. Literature Topic: Multimodality and LLMs

Liu et al., Visual instruction tuning
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

8. Experimental Topic: Chain-of-Thought Prompting

Wei, Jason, et al. “Chain-of-thought prompting elicits reasoning in large language models.” Advances in Neural Information Processing Systems 35 (2022): 24824-24837.
Kojima, Takeshi, et al. “Large language models are zero-shot reasoners.” Advances in neural information processing systems 35 (2022): 22199-22213.
Zhang, Zhuosheng, et al. “Automatic chain of thought prompting in large language models.” arXiv preprint arXiv:2210.03493 (2022).

9. Experimental Topic: Knowledge Generation Prompting

Liu, Jiacheng, et al. “Generated knowledge prompting for commonsense reasoning.” arXiv preprint arXiv:2110.08387 (2021).

10. Experimental Topic: Tree of Thoughts Prompting

Yao, Shunyu, et al. “Tree of thoughts: Deliberate problem solving with large language models.” arXiv preprint arXiv:2305.10601 (2023).
Long, Jieyi. “Large Language Model Guided Tree-of-Thought.” arXiv preprint arXiv:2305.08291 (2023).
Besta et al. “Graph of Thoughts: Solving Elaborate Problems with Large Language Models” arXiv preprint arxiv.org/abs/2308.09687 (2023)

11. Experimental Topic: Plan-and-Solve Prompting

Wang, Lei, et al. “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models.” arXiv preprint arXiv:2305.04091 (2023).
Chen, et al. “Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes.” CIDR, 2023.
python.langchain.com/docs/modules/agents/agent_types/plan_and_execute

12. Experimental Topic: Automatic Prompt Engineering

Zhou, Yongchao, et al. “Large language models are human-level prompt engineers.” arXiv preprint arXiv:2211.01910 (2022).

13. Experimental Topic: Data Fusion using LLMs

Ahmad, Mohammad Shahmeer, et al. “RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes.” arXiv preprint arXiv:2303.16909 (2023).
Jens Bleiholder and Felix Naumann. 2009. Data fusion. ACM Comput. Surv. 41, 1, Article 1 (January 2009), 41 pages. https://doi.org/10.1145/14
Narayan, Avanika et. al. 2022. Can Foundation Models Wrangle Your Data? In VLDB2022 (4), 738–746.

Getting started

The following survey articles and tutorial are good starting points for getting an overview of the topics of the seminar:

Zhao, et al.: A survey of Large Language Models. arXiv:2303.18223 [cs.CL]
Mialon, et al.: Augmented Language Models: a Survey. arXiv:2302.07842 [cs.CL]
Prompt Engineering Guide