The project will explore the benefits of using Deep Learning techniques (such as term- and entity embeddings, RNNs, transformers like BERT) for product matching as well as product feature extraction. The students will use a large corpus of product offers and potentially also a large corpus to web tables for their experiments. Tools that will be used in the project include DeepMatcher as well as Huggingface Transformers.
Schedule: The project starts October 15th, 2020 and has a duration of 6 months
Organization: Slides of the kickoff meeting will be published here
E-commerce websites often contain specification tables providing detailed product data. This data can be used to build comprehensive product catalogues, provide sophisticated product recommendations, and advanced exploration features such as faceted browsing. There is a twofold challenge in the task of deriving product catalogs from the Web: First, offers from different shops need to matched. Afterwards, product features need to be extracted, aligned, and normalized from clusters of offers for the same product. Building on matching results from previous work, the goal of this project is extract key value pairs from the clusters, align the schemata used on different websites, and learn how to normalize feature values in order build a comprehensive product catalogue for different product categories using millions of web offers. The students will use a large-scale corpus of product data and experiment with different feature extraction, data normalization and data fusion methods.
Schedule: The project starts September 27th (Duration 6 months)
Organization: Slides of the kickoff meeting
The Web is a rich source of product information which if integrated can help to construct comprehensive product catalogues, compare product prices on a global level, and understand market structure and consumer preferences. The key challenge that needs to be solved before these applications can be realized is to determine on web-scale which shops are selling a specific product. The difficulty of web-scale identity resolution for product data lies in the nature of the data: some product categories are rather structured, might include product identifiers and a specified schema that most vendors use while others are more unstructured having vendor specific schemata. The goal of this project is to determine which combinations of information extraction and identity resolution methods work best for integrating product data from a large number of websites. The students will experiment with different identity resolution and feature extraction methods and analyze their suitability with respect to the nature of the data.
Schedule: The project starts September 28th (Duration 6 months)
The Web contains a wealth of product and review data which can be used for comparing product prices on global scale as well as for advanced mining tasks such as determining how product features influence product prices or how specific features influence the perception of products in customer reviews. The gathering of product and review data from the Web is eased by the increased adoption of schema.org annotations by many e-shops. This team project will gather a corpus of product data from the Web and will use this corpus as a basis for advanced product data mining afterwards. Gathering the corpus will involve focused web crawling, product feature extraction, schema matching, multi-level classification, as well as sentiment analysis.
Schedule: The project starts March 2nd (Duration 6 months)
The development of wearable devices such as smart-phones features a variety of sensors and provides new opportunities for continuous monitoring and supporting. In this project, we focus on inertial sensors (e.g. Accelerometer, Gyroscope) to recognize common physical activites such as climbing stairs, walking, and standing. We want to develop an application that allows to learn and update online a classification model that should recognize the physical activities. Further, an important aspect is also to query the user concerning uncertain results of the classificiation model. Especially the feasability to query the user is an important aspect of this project. The goal is an Android app that records automatically the performed activities but also interacts with the user in a comfortable way.
Companies are interested in understanding how they and their products and services are perceived by the public. In this project, we use social media, such as Facebook and Twitter, to address the following questions: (1) what topics related to a company are currently discussed in social media? (2) what topics are trending, recurring, or declining? (3) Are there any geographical differences in the topic coverage?
The topic is performed together with the pharmaceutical company AbbVie.
Large-scale public knowledge graph, such as DBpedia or YAGO, are most often only created using a smal set of sources, usually parts of Wikipedia. They have a good coverage w.r.t. well-known entities (such as: big cities or famous athletes), but only a bad coverage of less well-known entities (such as: small villages or minor league athletes). On the other hand, smaller, specialized Wikis, such as fan-created Wikis at WikiFarms, contain detailed information about very specific topics.
In this project, we aim at filling the long tail of DBpedia from thousands of small-scale Wikis. We investigate the potential of a large-number of small-scale Wikis, as well as challenges in data quality and data integration.
Recognizing, validating, and optimizing activities of workers in logistics is increasingly aided by smart devices like glasses, gloves, and sensor enhanced wristbands. This project focuses on developing a system that recognizes the movement and actions of warehouse employees by processing video and sensor data from a data glass. In the end a mobile application for the data glass will be developed that can aid logistics workers in their tasks.
Many industry companies are currently adapting or plan to adapt their production processes according to Industry 4.0 standards. Due to the circumstance that lifecycles of production machines tend to be traditionally rather long (i.e. from years to decades), companies are facing several conceptual as well as technical problems such as high integration effort and non-compatibility between interfaces. Therefore this master team project tries to tackle aforementioned problems by constructing an Industry 4.0 compliant infrastructure based on semantic interface descriptions. The results will be then used by Big Data Tools and be evaluated at a real-case industry demonstrator in proceedings of a large IT summit.