John Collins, MZES: Prediction-based Adaptive Designs for Panel Surveys (PrADePS) (November 2024)
What is your current research topic?
I research how to use Machine Learning (ML) to predict who will nonrespond in longitudinal surveys.
For those who have not yet delved deeply into the topic of Data Science: How would you explain to a child what you are working on?
I think there are at least two kinds of data science: one is about the management of information, and the other is about discovering patterns in raw data. Almost every enterprise in the world needs to manage information, and data scientists are trained in processing data a little more deeply than computer scientists, which suits us well for tasks like designing databases or integrations between databases. The other part of data science is advanced data analysis: ML and AI are extremely complex topics, but enterprises desire simple answers derived from their data. Data scientists are therefore tasked with handling that complexity on their client/
Everyone talks about Data Science – how would you describe the importance of the topic for yourself in three words?
Sophisticated Harnessing of Information.
What points of contact with Data Science does your work have? Which methods do you already use, and which would be interesting for you in the future?
Two parts: firstly, I preprocess raw survey data into a format that is suitable for machine learning. This is an example of data transformation, which is an essential part of data science: it is extremely easy to corrupt data as you transform it, and data scientists should be experts in avoiding those errors. Secondly, I execute ML models, which require deep knowledge of the process to properly troubleshoot and evaluate.
How high is the value of Data Science for your work? Would your research even be possible without Data Science?
My work requires an ML component, so it is essential.
What development opportunities do you see for the topic of Data Science in relation to your field?
Survey methodologists are already fast adopting data science techniques. However, there is always more to do: AI, particularly LLMs, is advancing fast, and the applications of that technology are yet to filter into this sector (as is the case in many sectors). I expect a wave of research on this as practitioners aim to find what works and what doesn’t in this space.