Team Projects

The presentation of the new team projects will probably be again in the first week of lectures.

Ongoing and Upcoming Teamprojects (HWS2024)

Web2Table: Using LLMs to Generate Tables about Arbitrary Topics from the Web (HWS2024)
The project, “Web2Table: Using LLMs to Generate Tables from the Web,” addresses the growing need to organize scattered online information into structured, user-friendly formats. As the web holds vast amounts of data on a wide range of topics—from job postings to events—transforming this unstructured content into clear and useful tables can significantly enhance decision-making and data accessibility. This project explores how effectively LLMs can automate the table creation process, combining information from multiple sources into a comprehensive table format that can be readily used for analysis or recommendations. The goal is to assess the potential of LLMs to simplify complex data retrieval and organization tasks in real-world applications.
Supervisors:
Alexander Brinkmann
Christian Bizer
Slides:
Slides from the kick-off meeting (PDF, 1020 kB)

Past Projects

Benchmarking Function-Calling Enabled LLMs (HWS2023)
Large Language Models (LLMs) have been an increasing topic in the past months. Due to their large number of parameters, they have shown some abilities that were not observed in pre-trained language models before: they display good performance by only using in-context learning instead of fine-tuning and they seem to follow instructions and step-by-step reasonings. However, LLMs have problems with advanced reasoning such as mathematical reasoning, often hallucinate factual information and the data they contain is outdated or incorrect. The team project will focus on building a bechmark to test the new function calling ability which can help LLMs overcome the above mentioned problems. The benchmark will investigate how function descriptions and different formulations of queries can affect the function calling abilities of ChatGPT and GPT-4.
Supervisors:
Keti Korini
Christian Bizer
Slides:
Slides from the kick-off meeting (PDF, 2 MB)
Project Report
Report summarizing the results of the student project
Table Annotation using Deep Learning (HWS2022)
Table annotation is the task of annotating a table with terms/concepts from knowledge graph or database schema. Table annotation is a key prerequisite for understanding the content of large data corpora and an enabler for applications such as data search or knowledge graph completion. Table annotation has recently received quite some attention in the research community and there are active benchmark competitions on the topic. The team project will investigate using Transformer-based deep learning methods for table annotation. The participants will experiment with different pre-training, fine-tuning, and data augmentation (PDF) techniques. Experiments will be conducted using Python and Huggingface.
Supervisors:
Keti Korini
Christian Bizer
Slides:
Slides from the kick-off meeting (PDF, 2 MB)
Project Report
Report summarizing the results of the student project
References:
Further references about the topic are found at Papers with Code: Table Annotation
Data Search using Deep Learning (FSS2021)
The project will explore the benefits of Deep Learning techniques (such as transformers) for table search and table augmentation.
The students will use a large table corpus extracted from the web for their experiments.
Tools that will be used in the project include Elastic Search as well as Huggingface Transformers.
Short Project Presentation (PDF, 113 kB)
Supervisors:
Christian Bizer
Alexander Brinkmann
Social Simulation (FSS2021)
Summary
In this project, you'll use geo-spatial data of Mannheim city to create a simulation model of the car traffic using NetLogo. You'll be able to build upon a prototype developed by the participants of your 2020 Master's seminar CS 704 Social Simulation.
You can find the slides of the team project presentation on 2 March 2021 here (PDF, 552 kB) and the slides of the kick-off on 23 March 2021 here (PDF, 820 kB).
Modalities
The project will be supervised by Kilian Theil and Heiner Stuckenschmidt of the Chair of Artificial Intelligence. Intermediate knowledge in Python or Java are required. Prior knowledge in agent-based modeling and NetLogo are optional, but highly welcome.
The duration is 1 term and a preferred team size is 5 to 10 people. All sessions will be held online and in English. The course is especially suitable for MMDS students as it requires an interdisciplinary understanding of computational and social science.
The final grade is calculated as follows: 50% indidvidual grade + 50% collective grade. The individual grade is determined via three brief individual reports (1 to 2 pages) that are due on 20 May, 15 July and 9 September. The collective grade is determined via: a final group report due on 9 September, a group presentation held on the same day, and the code of your implemented model.
Cross-lingual Product Matching using Transformers (HWS2020)
The project will explore the benefits of using Deep Learning techniques (such as multi-lingual transformers) for product matching in cross-lingual settings. The students will use a large corpus of product offers and potentially also a large corpus to web tables for their experiments. Tools that will be used in the project include DeepMatcher as well as Huggingface Transformers.
Schedule: The project starts 22 October 2020 and has a duration of 6 months
Organization: Kickoff: 22.10.2020 at 10:00 (slides (PDF, 1 MB))
Project Results: The final results of the project and their discussion can be found here. The code is found in this github repository.
Supervisors:
Christian Bizer
Ralph Peeters
Integrating Product Specifications from the Web (HWS2019)
E-commerce websites often contain specification tables providing detailed product data. This data can be used to build comprehensive product catalogues, provide sophisticated product recommendations, and advanced exploration features such as faceted browsing. There is a twofold challenge in the task of deriving product catalogs from the Web: First, offers from different shops need to matched. Afterwards, product features need to be extracted, aligned, and normalized from clusters of offers for the same product. Building on matching results from previous work, the goal of this project is extract key value pairs from the clusters, align the schemata used on different websites, and learn how to normalize feature values in order build a comprehensive product catalogue for different product categories using millions of web offers. The students will use a large-scale corpus of product data and experiment with different feature extraction, data normalization and data fusion methods.
Schedule: The project starts 27 September (Duration 6 months)
Organization: Slides of the kickoff meeting (PDF, 2 MB)
Supervisors:
Christian Bizer
Anna Primpeli
Integrating Product Data from the Web (HWS2018)
The Web is a rich source of product information which if integrated can help to construct comprehensive product catalogues, compare product prices on a global level, and understand market structure and consumer preferences. The key challenge that needs to be solved before these applications can be realized is to determine on web-scale which shops are selling a specific product. The difficulty of web-scale identity resolution for product data lies in the nature of the data: some product categories are rather structured, might include product identifiers and a specified schema that most vendors use while others are more unstructured having vendor specific schemata. The goal of this project is to determine which combinations of information extraction and identity resolution methods work best for integrating product data from a large number of websites. The students will experiment with different identity resolution and feature extraction methods and analyze their suitability with respect to the nature of the data.
Schedule: The project starts 28 September (Duration 6 months)
Organization:
Slides of the kickoff meeting (PDF, 2 MB)
Slides phases 1–4 (PDF, 2 MB)
Supervisors:
Christian Bizer
Anna Primpeli
Mining Product Data from the Web (HWS2017)
The Web contains a wealth of product and review data which can be used for comparing product prices on global scale as well as for advanced mining tasks such as determining how product features influence product prices or how specific features influence the perception of products in customer reviews. The gathering of product and review data from the Web is eased by the increased adoption of schema.org annotations by many e-shops. This team project will gather a corpus of product data from the Web and will use this corpus as a basis for advanced product data mining afterwards. Gathering the corpus will involve focused web crawling, product feature extraction, schema matching, multi-level classification, as well as sentiment analysis.
Schedule: The project starts 2 March (Duration 6 months)
Organization:
Slides of the kickoff meeting (PDF, 2 MB)
Slides for Phase 2 (PDF, 699 kB)
Slides for Phase 3 (PDF, 706 kB)
Supervisors:
Christian Bizer
Anna Primpeli
Active and Online Learning – Your Personal Assistant for Lifestyle Improvement
The development of wearable devices such as smart-phones features a variety of sensors and provides new opportunities for continuous monitoring and supporting. In this project, we focus on inertial sensors (e.g. Accelerometer, Gyroscope) to recognize common physical activites such as climbing stairs, walking, and standing. We want to develop an application that allows to learn and update online a classification model that should recognize the physical activities. Further, an important aspect is also to query the user concerning uncertain results of the classificiation model. Especially the feasability to query the user is an important aspect of this project. The goal is an Android app that records automatically the performed activities but also interacts with the user in a comfortable way.
More Details (PDF, 316 kB)
Supervisor:
Prof. Dr. Heiner Stuckenschmidt (responsible)
Timo Sztyler (contact)
Topic Monitoring in the Pharmaceutical Industry
Companies are interested in understanding how they and their products and services are perceived by the public. In this project, we use social media, such as Facebook and Twitter, to address the following questions: (1) what topics related to a company are currently discussed in social media? (2) what topics are trending, recurring, or declining? (3) Are there any geographical differences in the topic coverage?
The topic is performed together with the pharmaceutical company AbbVie.
Supervisors:
Heiko Paulheim
Sven Hertling
Knowledge Extraction from WikiFarms
Large-scale public knowledge graph, such as DBpedia or YAGO, are most often only created using a smal set of sources, usually parts of Wikipedia. They have a good coverage w.r.t. well-known entities (such as: big cities or famous athletes), but only a bad coverage of less well-known entities (such as: small villages or minor league athletes). On the other hand, smaller, specialized Wikis, such as fan-created Wikis at WikiFarms, contain detailed information about very specific topics.
In this project, we aim at filling the long tail of DBpedia from thousands of small-scale Wikis. We investigate the potential of a large-number of small-scale Wikis, as well as challenges in data quality and data integration.
Supervisors:
Heiko Paulheim
Sven Hertling
Exploring the Future Warehouse
Recognizing, validating, and optimizing activities of workers in logistics is increasingly aided by smart devices like glasses, gloves, and sensor enhanced wristbands. This project focuses on developing a system that recognizes the movement and actions of warehouse employees by processing video and sensor data from a data glass. In the end a mobile application for the data glass will be developed that can aid logistics workers in their tasks.
Supervisor:
Prof. Dr. Heiner Stuckenschmidt (responsible)
Lydia Weiland (contact)
Alexander Diete (contact)
FitFor4: Semantics-based Integration of Cyber-Physical Systems in the Industrial Internet
Many industry companies are currently adapting or plan to adapt their production processes according to Industry 4.0 standards. Due to the circumstance that lifecycles of production machines tend to be traditionally rather long (i.e. from years to decades), companies are facing several conceptual as well as technical problems such as high integration effort and non-compatibility between interfaces. Therefore this master team project tries to tackle aforementioned problems by constructing an Industry 4.0 compliant infrastructure based on semantic interface descriptions. The results will be then used by Big Data Tools and be evaluated at a real-case industry demonstrator in proceedings of a large IT summit.
Supervisor:
Prof. Dr. Simone Paolo Ponzetto (responsible)
Dr. Christian Bartelt (contact)
Fabian Burzlaff (contact)

Web2Table: Using LLMs to Generate Tables about Arbitrary Topics from the Web (HWS2024)

Benchmarking Function-Calling Enabled LLMs (HWS2023)

Table Annotation using Deep Learning (HWS2022)

Data Search using Deep Learning (FSS2021)

Social Simulation (FSS2021)

Cross-lingual Product Matching using Transformers (HWS2020)

Integrating Product Specifications from the Web (HWS2019)

Integrating Product Data from the Web (HWS2018)

Mining Product Data from the Web (HWS2017)

Active and Online Learning – Your Personal Assistant for Lifestyle Improvement

Topic Monitoring in the Pharmaceutical Industry

Knowledge Extraction from WikiFarms

Exploring the Future Warehouse

FitFor4: Semantics-based Integration of Cyber-Physical Systems in the Industrial Internet