Lea Cohausz, Chair of Artificial Intelligence: Causal Modelling (January 2023)

Lea Cohausz is a doctoral student specializing in artificial intelligence at the University of Mannheim. Previously, she studied Sociology in her Bachelor's degree and the Mannheim Master in Data Science here. Her research interests include data mining in education, human behavior, and goal recognition and plan recognition.

What is your current research topic?

Currently, I am mainly working on the connection between causal and black-box models. Black-box models are, for example, neural networks, which are immensely popular in the Data Science community right now. They are very good at identifying relationships in data and making accurate predictions without expert knowledge. Unfortunately, they are not very good at also telling us why they made a prediction. Causal models, on the other hand, allow us to understand exactly what factors are acting on a variable and how strong the effect of the relationships is. The only drawback here is: constructing causal models usually requires a lot of expert knowledge, and in the end, when used for predictions, such models are unfortunately not as accurate as black-box models. Causal models, or the related Bayes Nets or Directed Acyclic Graphs, are certainly also a topic in computer science or the Data Science community within computer science; currently, however, they are rather a marginal phenomenon. However, because I believe they bring advantages that black-box models simply don't have, even with post-hoc Explainable Artificial Intelligence techniques, I'm looking for ways to combine the advantages of the two approaches. Incidentally, a good “side” effect of this is that it makes you quite clear whether you are using problematic variables (e.g., ethnicity) for predictions – fairness is also one of my areas of interest.

For those who have not yet delved deeply into the topic of Data Science: How would you explain to a child what you are working on?

I'm looking for methods to better predict events, but also to better understand them. Let's say you're taking a test at school in a few weeks. I already want to find out how well you will do. But I don't just want to know how you'll do, because that wouldn't do any good. I'm also trying to figure out why you're likely to perform that way. Let's say my computer predicts that you'll probably only get a mediocre score. It then next determines that you haven't quite understood a topic yet, and that a certain additional exercise might help you understand the topic better. I would then recommend that you do that exercise. Hopefully that will help you. In fact, I actually work a lot with data from education – that's a really important topic, after all.

Everyone talks about Data Science – how would you describe the importance of the topic for yourself in three words?

Exciting, occasionally useful.

What points of contact with Data Science does your work have? Which methods do you already use, and which would be interesting for you in the future?

My work can clearly be assigned to the field of Data Science. Some of my methods (e.g. Directed Acyclic Graphs and causal inference) come rather from classical statistics, but the separation between statistics and Data Science is not always given. Most methods, e.g. neural networks, can be directly assigned to the field of Data Science. Methods that will probably be interesting for me in the future are those that lie between classical statistics and Data Science. Furthermore, everything that takes fairness into account is interesting for me.

How high is the value of Data Science for your work? Would your research even be possible without Data Science?

Very high. Well, no.

What development opportunities do you see for the topic of Data Science in relation to your field?

Very big ones. I have to say that because my field is Data Science and I really hope that it will not stagnate for all time. But on a more serious note, up to now it has been relatively common for computer scientists to develop fairly general methods and then for various disciplines to find applications for them. This has worked well so far, because there was a lot to do at first. For example, you can use a mixture of computer vision and natural language processing to digitize and translate medieval text corpora. Totally helpful and great. Or you can crawl huge amounts of real network data from social media, which provides new data for social scientists. I think Data Science has given a lot of disciplines whole new options. Slowly, though, the other disciplines (or even business sectors) are having more specific problems that general methods don't directly help with, or are wondering how exactly they can link their long-developed methods to Data Science. For me, this also includes approaches from the “human in the loop” area, i.e. applications in which Data Science methods and human expert knowledge are continuously linked. In addition, the fairness aspect of the applications is becoming much more important. This will lead to new impulses in the Data Science community and hopefully to completely new methods. I also assume that the community will become more interdisciplinary. In any case, I am excited and looking forward to it.