Seminar CS715: Solving Complex Tasks using Large Language Models (HWS 2023/2024)

The topic of the seminar is solving complex tasks using Large Language Models (LLMs).  The seminar features literature as well as experimental topics. The goal of the literature topics is to summarize the state of the art concerning the application and evaluation of LLMs. The goal of the experimental topics is to verify the utility of advanced prompt engineering techniques by applying them to tasks beyond the tasks used in the respective papers for illustration and evaluation. 

Organization

Goals

In this seminar, you will

  • read, understand, and explore scientific literature
  • critically summarize the state-of-the-art concerning your topic
  • experimentally verify the utility of advanced prompt engineering methods
  • give a presentation about your topic (before the submission of the report)

Topics HWS2023

1. Literature Topic: Explainability of LLMs

  • Yao et al., Tree of Thoughts: Deliberate Problem Solving with Large Language Models
  • Turpin et al., Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
  • Lanham et al., Measuring Faithfulness in Chain-of-Thought Reasoning
  • Radhakrishnan et al., Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

2. Literature Topic: Efficiency of LLMs

  • Lee et al., Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research
  • Touvron et al., LLaMA: Open and Efficient Foundation Language Models
  • Dettmers et al., QLoRA: Efficient Finetuning of Quantized LLMs
  • Hsieh et al., Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
  • Gu et al., Knowledge Distillation of Large Language Models

3. Literature Topic: Agent-Based Modeling via LLMs

  • Park et al., Generative Agents: Interactive Simulacra of Human Behavior
  • Li et al., CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society
  • Boiko et al., Emergent autonomous scientific research capabilities of large language models
  • Zhuge et al., Mindstorms in Natural Language-Based Societies of Mind
  • Wang et al., Interactive Natural Language Processing

4. Literature Topic: LLMs for the Social Sciences

  • .Ziems et al., Can Large Language Models Transform Computational Social Science?
  • Feng et al., From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
  • Hartmann et al., The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation

5. Literature Topic: Limitations of LLMs

  • Frieder et al., Mathematical Capabilities of ChatGPT
  • Borji, A Categorical Archive of ChatGPT Failures
  • Wang et al., Large Language Models are not Fair Evaluators
  • Schick et al., Toolformer: Language Models Can Teach Themselves to Use Tools
  • Bang et al., A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

6. Literature Topic: LLMs for Education+Science

  • Baidoo-Anu et al., Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning
  • Choi et al., ChatGPT Goes to Law School
  • Boiko et al., Emergent autonomous scientific research capabilities of large language models
  • Meyer et al., ChatGPT and large language models in academia: opportunities and challenges

7. Literature Topic: Multimodality and LLMs 

  • Liu et al., Visual instruction tuning
  • Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

8. Experimental Topic: Chain-of-Thought Prompting

  • Wei, Jason, et al. “Chain-of-thought prompting elicits reasoning in large language models.” Advances in Neural Information Processing Systems 35 (2022): 24824-24837.
  • Kojima, Takeshi, et al. “Large language models are zero-shot reasoners.” Advances in neural information processing systems 35 (2022): 22199-22213.
  • Zhang, Zhuosheng, et al. “Automatic chain of thought prompting in large language models.” arXiv preprint arXiv:2210.03493 (2022).

9. Experimental Topic: Knowledge Generation Prompting

  • Liu, Jiacheng, et al. “Generated knowledge prompting for commonsense reasoning.” arXiv preprint arXiv:2110.08387 (2021).

10. Experimental Topic: Tree of Thoughts Prompting

  • Yao, Shunyu, et al. “Tree of thoughts: Deliberate problem solving with large language models.” arXiv preprint arXiv:2305.10601 (2023).
  • Long, Jieyi. “Large Language Model Guided Tree-of-Thought.” arXiv preprint arXiv:2305.08291 (2023).
  • Besta et al. “Graph of Thoughts: Solving Elaborate Problems with Large Language Models” arXiv preprint arxiv.org/abs/2308.09687 (2023)

11. Experimental Topic: Plan-and-Solve Prompting

12. Experimental Topic: Automatic Prompt Engineering

  • Zhou, Yongchao, et al. “Large language models are human-level prompt engineers.” arXiv preprint arXiv:2211.01910 (2022).

13. Experimental Topic: Data Fusion using LLMs

  • Ahmad, Mohammad Shahmeer, et al. “RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes.” arXiv preprint arXiv:2303.16909 (2023).
  • Jens Bleiholder and Felix Naumann. 2009. Data fusion. ACM Comput. Surv. 41, 1, Article 1 (January 2009), 41 pages. https://doi.org/10.1145/14
  • Narayan, Avanika et. al. 2022. Can Foundation Models Wrangle Your Data? In VLDB2022 (4), 738–746.

Getting started

The following survey articles and tutorial are good starting points for getting an overview of the topics of the seminar: