Prompt Engineering
1. Experimental Topic: From Self-Consistency to MedPrompt: Improving Results by Ensembling LLMs
- Wang, et al.: Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171 (2022)
- Nori, Harsha, et al. “Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine.” arXiv preprint arXiv:2311.16452 (2023).
- Zhao, et al.: A survey of Large Language Models. arXiv:2303.18223 (2023)
2. Experimental Topic: Prompt Search / Breeding
- Fernando, Chrisantha, et al. “Promptbreeder: Self-referential self-improvement via prompt evolution.” arXiv preprint arXiv:2309.16797 (2023).
- Liu, Pengfei, et al. “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.” ACM Computing Surveys 55.9 (2023): 1–35.
3. Experimental Topic: Active Prompt
- Diao, Shizhe, et al. “Active Prompting with Chain-of-Thought for Large Language Models.” arXiv, May 23, 2023.
- Mavromatis, Costas, et al. “Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection.” arXiv, October 30, 2023.
4. Experimental Topic: Contrastive Prompting
- Chia, Yew Ken, et al. “Contrastive Chain-of-Thought Prompting.” arXiv preprint arXiv:2311.09277 (2023).
- Paranjape, Bhargavi, et al. “Prompting contrastive explanations for commonsense reasoning tasks.” arXiv preprint arXiv:2106.06823 (2021).
5. Experimental Topic: Limitations of LLMs
- Berglund, Lukas, et al. “The Reversal Curse: LLMs Trained on ‘A Is B’ Fail to Learn ‘B Is A.’” arXiv, September 22, 2023.
- Kaddour, Jean, et al. “Challenges and Applications of Large Language Models.” arXiv, July 19, 2023. https://doi.org/10.48550/arXiv.2307.10169.
6. Literature Topic: LLM Self-Evaluation during Fine-tuning
- Deutsch, Daniel, et al. “On the Limitations of Reference-Free Evaluations of Generated Text.” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 10960–77.
- Ouyang, Long, et al. “Training Language Models to Follow Instructions with Human Feedback.” arXiv, March 4, 2022.
- Rafailov, Rafael, et al.“Direct Preference Optimization: Your Language Model Is Secretly a Reward Model.” arXiv, December 13, 2023.
Evaluation
7. Literature Topic: LLMs as Evaluation Metrics
- Kocmi, Tom, et al. “Large Language Models Are State-of-the-Art Evaluators of Translation Quality.” arXiv, May 31, 2023.
- Leiter, Christoph, et al. “The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics.” arXiv, October 30, 2023.
8. Literature Topic: Can LLMs Evaluate Themselves?
- Deutsch, Daniel, et al. “On the Limitations of Reference-Free Evaluations of Generated Text.” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 10960–77.
- Ouyang, Long, et al. “Training Language Models to Follow Instructions with Human Feedback.” arXiv, March 4, 2022.
- Rafailov, Rafael, et al.“Direct Preference Optimization: Your Language Model Is Secretly a Reward Model.” arXiv, December 13, 2023.
9. Experimental Topic: LLMs with Tools as Evaluation Metrics
- Fernandes, Patrick, et al. “The Devil Is in the Errors: Leveraging Large Language Models for Fine-Grained Machine Translation Evaluation.” arXiv, August 14, 2023.
- Kocmi, Tom, et al. “GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4.” arXiv, October 21, 2023.
- Shu, Lei, et al. “Fusion-Eval: Integrating Evaluators with LLMs.” arXiv, November 15, 2023.
10. Literature Topic: Task Contamination
- Li, Changmao, et al. “Task Contamination: Language Models May Not Be Few-Shot Anymore.” arXiv preprint arXiv:2312.16337 (2023).
- Roberts, Manley, et al. “Data Contamination Through the Lens of Time.” arXiv preprint arXiv:2310.10628 (2023).
- Jiang, et al.: Investigating Data Contamination for Pre-training Language Models. arXiv preprint arXiv:2401.06059 (2024).
11. Literature Topic: Evaluation of Code Writing Ability of LLMs
- Chen, Mark, et al. “Evaluating large language models trained on code.” arXiv preprint arXiv:2107.03374 (2021).
- Le, Triet HM, et al. “Deep learning for source code modeling and generation: Models, applications, and challenges.” ACM Computing Surveys (CSUR) 53.3 (2020): 1–38.
- https://paperswithcode.com/task/code-generation
12. Experimental Topic: Evaluation Benchmark for Scientific Text Generation Models
- Belouadi, Jonas, et al. “AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ.” arXiv, January 23, 2024.
- Zerroug, Aimen, et al. “A Benchmark for Compositional Visual Reasoning.” Advances in Neural Information Processing Systems 35 (December 6, 2022): 29776–88.
Applications
13. Experimental Topic: WebAPI Query Planning Using LLMs
- Chen, Zui, et al. “Symphony: Towards natural language query answering over multi-modal data lakes.” Conference on Innovative Data Systems Research, CIDR. 2023.
- Urban, Matthias, et al. “CAESURA: Language Models as Multi-Modal Query Planners.” arXiv preprint arXiv:2308.03424 (2023).
- Wang, et al.: A Survey on Large Language Model based Autonomous Agents. arXiv preprint arXiv:2308.11432 (2023)
- https://gorilla.cs.berkeley.edu/
14. Experimental Topic: Attribute Value Normalization Using LLMs
- Jaimovitch-López, Gonzalo, et al. “Can language models automate data wrangling?.” Machine Learning 112.6 (2023): 2053–2082.
- Bogatu, Alex, et al. “Towards automatic data format transformations: Data wrangling at scale.” Data Analytics: 31st British International Conference on Databases (BICOD2017), 2017.
15. Experimental Topic: LLM for Literary Translation and Evaluation
- Fonteyne, Margot, et al. “Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document Level.” In Proceedings of the Twelfth Language Resources and Evaluation Conference, 3790–98. Marseille, France, 2020.
- Karpinska, Marzena, et al. “Large Language Models Effectively Leverage Document-Level Context for Literary Translation, but Critical Errors Persist.” arXiv, May 22, 2023.
- Wang, Longyue, et al. “Document-Level Machine Translation with Large Language Models.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 16646–61. Singapore, 2023.
16. Experimental Topic: LLMs for Synthetic Training Data Generation
- Frédéric Piedboeuf et al. Is ChatGPT the ultimate Data Augmentation Algorithm? In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023.
- Pal, Koyena, et al. “Generative Benchmark Creation for Table Union Search.” arXiv, August 7, 2023.
17. Experimental Topic: LLM-based Agents / OpenAI Assistants
18. Experimental Topic: Agent Cooperation
- Park, Joon Sung, et al. “Generative agents: Interactive simulacra of human behavior.” Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023.
- Zhuge, Mingchen, et al. “Mindstorms in Natural Language-Based Societies of Mind.” arXiv preprint arXiv:2305.17066 (2023).
- Suzgun and Kalai: Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding. arXiv preprint arXiv:2401.12954 (2024).
- Wang, et al.: A Survey on Large Language Model based Autonomous Agents. arXiv preprint arXiv:2308.11432 (2023)
- https://www.promptingguide.ai/research/llm-agents