The article „A Unified Framework for Frequent Sequence Mining with Subsequence Constraints“ by Kaustubh Beedkar, Rainer Gemulla und Wim Martens has been accepted for publication in ACM Transactions on Database Systems (TODS). Abstract:
Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints---including and beyond those considered in the literature---can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive „pattern expressions to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms---although more general---are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.
The paper „Scalable Frequent Sequence Mining With Flexible Subsequence Constraints“ by Alexander Renz-Wieland, Matthias Bertsch, and Rainer Gemulla has been accepted at the 2019 IEEE International Conference on Data Engineering (ICDE). Abstract:
We study scalable algorithms for frequent sequence mining under flexible subsequence constraints. Such constraints enable applications to specify concisely which patterns are of interest and which are not. We focus on the bulk synchronous parallel model with one round of communication; this model is suitable for platforms such as MapReduce or Spark. We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework. The algorithms differ in what data are communicated and how computation is split up among workers. To the best of our knowledge, D-SEQ and D-CAND are the first scalable algorithms for frequent sequence mining with flexible constraints. We conducted an experimental study on multiple real-world datasets that suggests that our algorithms scale nearly linearly, outperform common baselines, and offer acceptable generalization overhead over existing, less general mining algorithms.
The Data and Web Science Group is hosting the Data Science Conference LWDA 2018 in Mannheim on August 22-24, 2018.
LWDA, which expands to „Lernen, Wissen, Daten, Analysen“ („Learning, Knowledge, Data, Analytics“), covers recent research in areas such as knowledge discovery, machine learning & data mining, knowledge management, database management & information systems, information retrieval.
The LWDA conference is organized by and brings together the various special interest groups of the Gesellschaft für Informatik (German Computer Science Society) in this area. The program comprises of joint research sessions and keynotes as well as of workshops organized by each special interest group.
Further information can be found on the conference website: https://www.uni-mannheim.de/lwda-2018/.
Download the conference poster.
The paper „Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion“ by Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, and Heiner Stuckenschmidt has been accepted at the 2018 International Semantic Web Conference (ISWC). Abstract: Over the recent years, embedding methods have attracted increasing focus as a means for knowledge graph completion. Similarly, rule-based systems have been studied for this task in the past. What is missing so far is a common evaluation that includes more than one type of method. We close this gap by comparing representatives of both types of systems in a frequently used evaluation protocol. Leveraging the explanatory qualities of rule-based systems, we present a fine-grained evaluation that gives insight into characteristics of the most popular datasets and points out the different strengths and shortcomings of the examined approaches. Our results show that models such as TransE, RESCAL or HolE have problems in solving certain types of completion tasks that can be solved by a rule-based approach with high precision. At the same time, there are other completion tasks that are difficult for rule-based systems. Motivated by these insights, we combine both families of approaches via ensemble learning. The results support our assumption that the two methods complement each other in a beneficial way.