The presentation of the new team projects will probably be again in the first week of lectures.
The project will explore the benefits of using Deep Learning techniques (e.g. relational transformers) for data integration tasks such as identity resolution, schema matching (attribute and relation annotation), and slot filling (value prediction). The participants will experiment with different relational pre-training as well as fine-tuning techniques. Experiments will be conducted using Python and Huggingface Transformers.
References for getting an idea about the topic:
The project will explore the benefits of Deep Learning techniques (such as transformers) for table search and table augmentation.
The students will use a large table corpus extracted from the web for their experiments.
Tools that will be used in the project include Elastic Search as well as Huggingface Transformers.
In this project, you'll use geo-spatial data of Mannheim city to create a simulation model of the car traffic using NetLogo. You'll be able to build upon a prototype developed by the participants of your 2020 Master's seminar CS 704 Social Simulation.
The project will be supervised by Kilian Theil and Heiner Stuckenschmidt of the Chair of Artificial Intelligence. Intermediate knowledge in Python or Java are required. Prior knowledge in agent-based modeling and NetLogo are optional, but highly welcome.
The duration is 1 term and a preferred team size is 5 to 10 people. All sessions will be held online and in English. The course is especially suitable for MMDS students as it requires an interdisciplinary understanding of computational and social science.
The final grade is calculated as follows: 50% indidvidual grade + 50% collective grade. The individual grade is determined via three brief individual reports (1 to 2 pages) that are due on 20 May, 15 July and 9 September. The collective grade is determined via: a final group report due on 9 September, a group presentation held on the same day, and the code of your implemented model.
The project will explore the benefits of using Deep Learning techniques (such as multi-lingual transformers) for product matching in cross-lingual settings. The students will use a large corpus of product offers and potentially also a large corpus to web tables for their experiments. Tools that will be used in the project include DeepMatcher as well as Huggingface Transformers.
Schedule: The project starts 22 October 2020 and has a duration of 6 months
Organization: Kickoff: 22.10.2020 at 10:00 (slides)
E-commerce websites often contain specification tables providing detailed product data. This data can be used to build comprehensive product catalogues, provide sophisticated product recommendations, and advanced exploration features such as faceted browsing. There is a twofold challenge in the task of deriving product catalogs from the Web: First, offers from different shops need to matched. Afterwards, product features need to be extracted, aligned, and normalized from clusters of offers for the same product. Building on matching results from previous work, the goal of this project is extract key value pairs from the clusters, align the schemata used on different websites, and learn how to normalize feature values in order build a comprehensive product catalogue for different product categories using millions of web offers. The students will use a large-scale corpus of product data and experiment with different feature extraction, data normalization and data fusion methods.
Schedule: The project starts 27 September (Duration 6 months)
Organization: Slides of the kickoff meeting
The Web is a rich source of product information which if integrated can help to construct comprehensive product catalogues, compare product prices on a global level, and understand market structure and consumer preferences. The key challenge that needs to be solved before these applications can be realized is to determine on web-scale which shops are selling a specific product. The difficulty of web-scale identity resolution for product data lies in the nature of the data: some product categories are rather structured, might include product identifiers and a specified schema that most vendors use while others are more unstructured having vendor specific schemata. The goal of this project is to determine which combinations of information extraction and identity resolution methods work best for integrating product data from a large number of websites. The students will experiment with different identity resolution and feature extraction methods and analyze their suitability with respect to the nature of the data.
Schedule: The project starts 28 September (Duration 6 months)
The Web contains a wealth of product and review data which can be used for comparing product prices on global scale as well as for advanced mining tasks such as determining how product features influence product prices or how specific features influence the perception of products in customer reviews. The gathering of product and review data from the Web is eased by the increased adoption of schema.org annotations by many e-shops. This team project will gather a corpus of product data from the Web and will use this corpus as a basis for advanced product data mining afterwards. Gathering the corpus will involve focused web crawling, product feature extraction, schema matching, multi-level classification, as well as sentiment analysis.
Schedule: The project starts 2 March (Duration 6 months)
The development of wearable devices such as smart-phones features a variety of sensors and provides new opportunities for continuous monitoring and supporting. In this project, we focus on inertial sensors (e.g. Accelerometer, Gyroscope) to recognize common physical activites such as climbing stairs, walking, and standing. We want to develop an application that allows to learn and update online a classification model that should recognize the physical activities. Further, an important aspect is also to query the user concerning uncertain results of the classificiation model. Especially the feasability to query the user is an important aspect of this project. The goal is an Android app that records automatically the performed activities but also interacts with the user in a comfortable way.
Companies are interested in understanding how they and their products and services are perceived by the public. In this project, we use social media, such as Facebook and Twitter, to address the following questions: (1) what topics related to a company are currently discussed in social media? (2) what topics are trending, recurring, or declining? (3) Are there any geographical differences in the topic coverage?
The topic is performed together with the pharmaceutical company AbbVie.
Large-scale public knowledge graph, such as DBpedia or YAGO, are most often only created using a smal set of sources, usually parts of Wikipedia. They have a good coverage w.r.t. well-known entities (such as: big cities or famous athletes), but only a bad coverage of less well-known entities (such as: small villages or minor league athletes). On the other hand, smaller, specialized Wikis, such as fan-created Wikis at WikiFarms, contain detailed information about very specific topics.
In this project, we aim at filling the long tail of DBpedia from thousands of small-scale Wikis. We investigate the potential of a large-number of small-scale Wikis, as well as challenges in data quality and data integration.
Recognizing, validating, and optimizing activities of workers in logistics is increasingly aided by smart devices like glasses, gloves, and sensor enhanced wristbands. This project focuses on developing a system that recognizes the movement and actions of warehouse employees by processing video and sensor data from a data glass. In the end a mobile application for the data glass will be developed that can aid logistics workers in their tasks.
Many industry companies are currently adapting or plan to adapt their production processes according to Industry 4.0 standards. Due to the circumstance that lifecycles of production machines tend to be traditionally rather long (i.e. from years to decades), companies are facing several conceptual as well as technical problems such as high integration effort and non-compatibility between interfaces. Therefore this master team project tries to tackle aforementioned problems by constructing an Industry 4.0 compliant infrastructure based on semantic interface descriptions. The results will be then used by Big Data Tools and be evaluated at a real-case industry demonstrator in proceedings of a large IT summit.