Marlene Lutz, Chair for Data Science in the Economic and Social Sciences: Natural Language Processing (February 2023)

Marlene Lutz has been a doctoral student and research assistant at the Chair of Data Science in Economics and Social Sciences since 2022. Previously, she studied computer science at RWTH Aachen University. Her research interests include Responsible Machine Learning, Fair Algorithmic Ranking and Interpretable Word Embeddings.

What is your current research topic?

My research can be assigned to the field of natural language processing (NLP). NLP is concerned with how computers can process and understand human language. Recently, large machine learning based language models like BERT and ChatGPT have shown remarkable performance on different NLP tasks. These models are trained on huge amounts of text to learn an understanding of human language. However, as these models get larger, it becomes more difficult to understand why they behave in a certain way. By exposing large language models to vast amounts of text, they also learn unwanted associations and discriminatory patterns that are encoded in the data. I want to understand what these models learn and how we can develop strategies for addressing bias. I think promoting fairness and transparency in NLP applications is critical for ensuring that these technologies are deployed ethically and responsibly.

For those who have not yet delved deeply into the topic of Data Science: How would you explain to a child what you are working on?

I'm studying how computers understand and use language. Maybe you have already asked a computer or a phone to do something for you, like translating something to another language or asking Siri or Alexa to tell you a joke. Those things are called “language models”. But just like people, these language models can make mistakes and sometimes don’t treat everyone the same way. So, they might not always give you the right answer, or give you an answer that makes you or other people feel like they're not good enough, even though that's not true. I’m working on making these models better and how I can make them easier to understand and use for everyone. I also want to make sure that they are used in a good way, that’s fair and helpful for everyone and doesn’t cause any harm.

Everyone talks about Data Science – how would you describe the importance of the topic for yourself in three words?

Insightful, diverse, fast

What points of contact with Data Science does your work have? Which methods do you already use, and which would be interesting for you in the future?

In my work Data Science is a part of every step. To build a language model, we first must collect, clean and preprocess text data. Then, we use this data and Machine Learning to teach the model an understanding of language. Finally, I’m trying to understand what patterns the language model has learned from the data and how we can refine it to be better and fairer. Unfortunately, a lot of state-of-the-art language models are not open source right now. Given that I am interested in transparency and ethics, it would be very exciting for me to work with these models.

How high is the value of Data Science for your work? Would your research even be possible without Data Science?

Data Science is crucial for my work. In fact, the existence of Data Science is the very reason why my research is necessary. I’m working on making sure that we can use data-driven models and methods in an ethical and responsible way and can better understand why they make certain predictions.

What development opportunities do you see for the topic of Data Science in relation to your field?

The topic of Data Science brings together researchers and practitioners with diverse backgrounds and skills. This is especially true for research and work on text data. In my opinion, there is a need for more and better interdisciplinary collaboration between researchers in fields such as computer science, linguistics, sociology, and psychology to better understand the social and cultural contexts in which language is used, and how these factors can affect the development and deployment of NLP technologies. In particular, sustainable development and the ethical use of language technology are issues that unite all these fields and make interdisciplinarity inevitable.