Treasure Hunting in the Data Jungle

We all leave countless data traces behind every day. We give stars to products, leave heart emojis under the nicest photos we see, and consume streamed music and videos. Datafication, as scholars call this process, is part and parcel of all of our everyday lives. These volumes of data represent an unprecedented treasure trove to researchers, albeit one that is still well hidden. The goal of making unstructured data usable for the scientific community is now being pursued by the research consortium BERD@NFDI (Business Economic Related Data@National Research Data Infrastructure). Professor Florian Stahl, spokesperson of BERD’s Steering Committee and holder of the Chair of Quantitative Marketing and Consumer Analytics in Mannheim, explains why this is so important.

“Alexa, what time is it?” I murmur sleepily at 6:24 a.m. in the general direction of a small round speaker on my bedside table. Before hopping into the shower, I hastily click on my favorite Spotify playlist and turn up the volume. At breakfast, I scroll through Instagram on my phone. My gaze is drawn to a few ads, and I leave a few ‘likes’ on some nice snapshots of a colleague’s vacation. Before I even head out the door in the morning, I have already left a vast quantity of information about my preferences and habits on the Internet — a little personal data trail all of my very own.

Taken together, these data traces from all of us add up to a mind-bogglingly large data flow that occupies the attention of researchers like Florian Stahl — a tangled jungle of data that also contains buried treasure for researchers to unearth. “The increase in the amounts of data being generated that has come about creates a huge opportunity for research in economics and the social sciences. Projects like BERD are needed so that all this data, which is initially still completely unstructured, can become usable for researchers,” the Mannheim marketing expert explains. BERD is a time-limited alliance coordinated by the University of Mannheim. Its six member institutions have been working tirelessly since October 2021 to create a digital platform that can be used by all researchers in business administration, economics, and the social sciences. Researchers from the Leibniz Information Centre for Economics and Mannheim University Library are working alongside researchers from the universities of Mannheim, Hamburg, Cologne, and Munich. Their common objective is to make unstructured data accessible to scientists and usable for them.

What likes and dislikes do people have, and what are their habits? Why do people consume specific products? Why do they behave in specific ways in particular situations? “We used to try to collect all this information via surveys, and that demanded considerable resources at times. But once the data had been gathered, visualizing and evaluating it was unproblematic,” Stahl says. Now the Internet yields up ample data from a host of sources: websites, apps, IoT applications, digital business reports, and information from social networks. And this data comes a range of diverse formats that include audio, video, image, and text files. “We can now delve into important research questions using a completely fresh approach,” BERD spokesperson Stahl reports, commenting that “this represents a fantastic opportunity for behavioral research, especially.” But before the research community can exploit this opportunity, data needs to be preprocessed and suitable AI methods that make working with it easier for researchers need to be identified. This is where the BERD platform comes into play: it provides research infrastructure that makes it possible for researchers to access and use data and to gain an overview of existing AI algorithms that can be repurposed to address new research questions. “We see ourselves as a kind of technical evaluation service like the service that verifies the roadworthiness of your car — we can show you which data sets have already been used for which research purposes and which AI applications proved especially suitable. We aim to give the community the tools that are needed for researchers to be able to work well with these often intimidatingly large quantities of data,” Stahl explains. For the time being, selected researchers are testing whether the new platform works as intended. Their feedback will be valuable for the BERD team. By as early as November, Professor Stahl has told us, the digital BERD platform will be accessible to the wider public.

Text: Jule Leger/ December 2023