João Areal, Professorship for Social Data Science and Methodology: Content Analysis of web-track data (May 2023)

João Areal received a MSc Research Master Social Sciences from the University of Amsterdam in August 2020. Since May 2021 João is a Doctoral Researcher at the Professorship for Social Data Science and Methodology (Prof. Florian Keusch) at the University of Mannheim, where he works with Dr. Ruben Bach in the project “Filter Bubbles, Alternative News and Political Polarisation”.

What is your current research topic?

My main research topic is the concept of negative partisanship, people’s rejection of a particular party. I am interested in how to measure and conceptualise negative partisanship, as well as investigating some of its consequences for individual attitudes and behaviour. For example, I try to measure negative partisanship as a type of social identity, and study whether this type of negative identification (e.g. “I am anti-...”) leads to more hostile attitudes towards other voters, and whether this identity shapes people’s information-seeking behaviour.

For those who have not yet delved deeply into the topic of Data Science: How would you explain to a child what you are working on?

Currently I am working on classifying news articles as portraying political parties in a positive, negative, or neutral light using the title and headline of the respective. I would probably say to a child that I am trying to guess if hundreds of thousands of articles are good news or bad news based only on what I know about a dozen articles.

Everyone talks about Data Science – how would you describe the importance of the topic for yourself in three words?

Independence, opportunity, creativity.

What points of contact with Data Science does your work have? Which methods do you already use, and which would be interesting for you in the future?

At the moment I am involved in a few projects that use web-tracking data, records of individuals’ browsing behaviour, to better understand what information people consume online and what are the sources and effects of this consumption. This is mostly through methods of text analysis, such as topic modelling and different methods of classifying content (such as transformer-based models and few-shot learning). I would like to take this type of content analysis to the next level and also analyse images and sounds, especially if we are talking about political content such as campaign material and misinformation.

How high is the value of Data Science for your work? Would your research even be possible without Data Science?

Given that a lot of my work is based on web-tracking data, Data Science is crucial. It would be impossible to gain insights at this level of granularity about what people tend to consume online. Whilst this type of data is far from perfect, it allows researchers to consider the content people consume rather than just what websites they visited. This is only possible through computational methods of data analysis and collection.

What development opportunities do you see for the topic of Data Science in relation to your field?

I am excited about how large language models can help with classification tasks, which may reduce the costs and thus inequalities in research sub-fields that rely on human coders. I think this can be hugely important to any field.