Prof. Dr. Rainer Gemulla

Chair of Practical Computer Science I: Data Analytics

Universität Mannheim
B6, 26, Room B 016
D-68159 Mannheim

Tel.: +49 621 181 2480

rgemullamail-uni-mannheim.de
My PGP public key (id 0x81405E0B30302532)

I am heading the Chair of Practical Computer Science I: Data Analytics at the University of Mannheim. The chair is part of the Data and Web Science Group.


Go to: Research Interests, CV, PhD Students, Teaching, Awards, Professional Activities, Data and Software, Publications


If you consider applying to our lab, please read:

  • Applications should include CV, transcripts, and a short (!) cover letter or will be ignored.
  • We generally do not offer short interships (3 months or less). Applications for such internships will be ignored.
  • If you are a BSc or MSc student here, have very good transcripts, and are interested in a student job with us, contact me directly.

Research Interests

My research interests are mainly in machine learning and data processing. In particular:

  • Machine learning with structured data (such as relational data)
  • Machine learning with semi-structured data (such as multi-relational graphs)
  • Combining the above with unstructured knowledge (such as text)
  • Efficient and scalable methods and systems for data-intensive processing

News

Paper accepted at EMNLP Findings 2023: A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs
The paper “A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs” by Adrian Kochsiek and Rainer Gemulla has been accepted at the Findings of the Association for Computational Linguistics: EMNLP 2023. Abstract: Semi-inductive link prediction (LP) in knowledge graphs (KG) is the task ...
Paper accepted at CIKM 2023: Good Intentions: Adaptive Parameter Management via Intent Signaling
The paper “Good Intentions: Adaptive Parameter Management via Intent Signaling” by Alexander Renz-Wieland, Andreas Kieslinger, Robert Gericke, Rainer Gemulla, Zoi Kaoudi, and Volker Markl has been accepted at the 2023 CIKM Conference on Information and Knowledge Management. Abstract: Model ...
Paper accepted in Repl4NLP 2023: Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction
The paper “Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction” by Adrian Kochsiek, Apoorv Saxena, Inderjeet Nair, and Rainer Gemulla has been accepted at the 2023 Repl4NLP Workshop on Representation Learning for NLP, hosted by ACL 2023. Abstract: We propose KGT5-context, a ...
Paper accepted in ECML-PKDD 2022: Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings
The paper “Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings” by Adrian Kochsiek, Fritz Niesel, and Rainer Gemulla has been accepted at the 2022 ECML-PKDD European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in ...

Curriculum Vitae

Since 2014   W3-Professor for Practical Computer Science I, Universität Mannheim, Germany
2010 – 2014   Senior researcher / group leader, Max-Planck-Institut für Informatik, Saarbrücken, Germany
2008 – 2010   Postdoctoral researcher, IBM Almaden Research Center, San Jose, CA, USA
2004 – 2008   PhD in Computer Science, Technische Universität Dresden, Germany

PhD Students

Former PhD students

Kaustubh Beedkar, Luciano del Corro, Kiril Gashteovski, Stefan Kain, Faraz Makari Manshadi, Alexander Renz-Wieland, Christina Teflioudi, Yanjie Wang

Teaching

If you are interested in writing a seminar, Bachelor or Master thesis with us, please read the following guidelines.

If you are within the intranet of University of Mannheim, you can access lecture videos / materials here. If not, ask me.

Current semester (FSS 2025)

Previous semester (HWS 2024)

Previous courses (not taught anymore)

      Awards

      • Distinguished PC Member Award at EDBT, 2023
      • Outstanding Reviewer Award at NeurIPS, 2021
      • Named Distinguished PC Member for SIGMOD, 2017
      • Junior-Fellow of the Gesellschaft für Informatik (GI), 2013
      • AWS in Education Research Grant Award, 2013
      • Busy Beaver teaching award (winter term 2012/2013)
        for Non-Traditional Data Management (NoSQL and more)
      • IBM's 2011 Pat Goldberg Memorial best paper award in CS, EE and Math
        (for “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent” with P. J. Haas, E. Nijkamp, and Y. Sismanis; KDD 2011)
      • Best paper of NIPS 2011 Biglearn workshop
        (for “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent” with P. J. Haas, Y. Sismanis, C. Teflioudi, and F. Makari)
      • Google Focused Research Award 2011: Robust and Scalable Fact Discovery from Web Sources
        (with G. Weikum and M. Theobald)
      • Research Highlight in Communications of the ACM
        (for “Distinct-Value Synopses for Multiset Operations” with K. Beyer, P.J. Haas, B. Reinwald, Y. Sismanis)
      • The VLDB Journal, Special Issue: Best Papers of VLDB 2006
        (for “A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets” with W. Lehner and P.J. Haas)

      Professional Activities

      Administration

      • Head of examination board: MSc Business Informatics (since 2017)
      • Member of examination board: BSc Business Informatics (since 2017 or earlier), Mannheim Master in Data Science (since 2017)
      • Member of selection commitee: BSc and MSc Business Informatics (since 2017 or earlier), Mannheim Master in Data Science (since 2024)
      • Information officer of the WIM faculty: since 2022
      • Embassador of the German Informatics Society (GI): since 2017
      • CIO of University of Mannheim: 2022–2024
      • Study dean of the WIM faculty: 2016–2019
      • Member of the selection committee: Max Planck International Research School (2010–2014)

      Organizer

      Young academics

      • One-day workshop “Machine Learning Systems” at Jugendforum BW 2024
      • Mentor of Junior Professors: Roland Leißa (2021–2025), Margret Keuper (2017–2021), Goran Glavaš (2017–2021)
      • Mentor in the mentoring program of the German Informatics Society (GI, since 2020)
      • Member of task committee and final-round jury of BWINF (since 2010)
      • Member of the board of BWINF (2014–2022)

      Associate editor / area chair

      • VLDB: 2021, 2019, 2015
      • TKDE: 2015–2018
      • DASP: 2018
      • EDBT: 2017
      • JISA: 2016
      • CIKM: 2014

      PC member / reviewer (since 2011)

      • 2025: EDBT, ICLR, SIGMOD, STACS, TMLR, VLDB
      • 2024: ARR, DAMI, DEEM, ICLR, IJCAI, LLM+KG, NeurIPS, SIGMOD, TMLR, VLDBJ
      • 2023: ARR, Artificial Intelligence, BTW, DEEM, EDBT, ICLR, IJCAI, Neural Networks, Repl4NLP, SIGMOD, TMLR, VLDB, VLDBJ
      • 2022: Artificial Intelligence, DEEM, EDBT, ICLR, IJCAI, KDML, Machine Learning, Pattern Recognition, SDM, SIGMOD, VLDBJ
      • 2021: DEEM, EDBT, ICDM, IJCAI, KDML, NeurIPS, SIGMOD (demo), Repl4NLP
      • 2020: AKBC, EDBT, IJCAI, LWDA, Repl4NLP, SUM, TPDS
      • 2019: AKBC, BTW, DEEM, IJCAI, INFORMATIK, LWDA, SUM
      • 2018: VLDB, EDBT, DEEM
      • 2017: SIGMOD, BTW, DEEM, TALG, VLDB, ECML-PKDD
      • 2016: SIGMOD, ECML-PKDD
      • 2015: BTW, DAMI, GvDB, IS, JODS, JWS, PODS
      • 2014: BDDC, BigData, Buda, DMC, JMLR, SIGMOD, VLDB
      • 2013: AKBC, BigData, BTW, CIKM, DMC, ICALP, TKDE, TML, VLDB
      • 2012: KDD, JMLR, TODS
      • 2011: BTW, IS, VLDB, VLDBJ

        Data and Software

        • GraSH: Multi-fidelity HPO for graph learning
        • DistKGE: A knowledge graph embedding library for multi-GPU and multi-machine training
        • AdaPM: A fully adaptive parameter manager
        • LibKGE: A knowledge graph embedding library
        • Lapse: A parameter server with dynamic parameter allocation
        • OPIEC: An open information extraction corpus
        • MinIE: Open information extractor (spiritual successor to ClausIE)
        • DSGDpp: Various parallel algorithms for matrix factorization (including DSGD++)
        • DESQ: Frequent sequence mining with subsequence constraints
        • Rounding rank: algorithms for computing rounding-rank decompositions
        • CORE: Context-aware open relation extraction with factorization machines
        • FINET: Context-aware fine-grained named entity typing
        • Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning
        • ClausIE: Clause-Based Open Information Extraction
        • LEMP: Fast Retrieval of Large Entries in a Matrix Product
        • LASH: Large-Scale Sequence Mining with Hierarchies
        • MG-FSM: Large-Scale Frequent Sequence Mining

        Publications

        See also Google Scholar and DBLP.

        2023   A. Kochsiek, R. Gemulla
        A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs [pdfresources]
        In EMNLP Findings, 2023
         A. Kochsiek, A. Saxena, I. Nair, R. Gemulla
        Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction [pdfresources]
        In Repl4NLP workshop, 2023
         A. Renz-Wieland, A. Kieslinger, R. Gericke, R. Gemulla, Z. Kaoudi, V. Markl
        Good Intentions: Adaptive Parameter Management via Intent Signaling [pdfresources]
        In CIKM, 2023
        2022   A. Kochsiek, F. Niesel, R. Gemulla
        Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings [pdfresources]
        In ECML-PKDD, 2022
         A. Saxena, A. Kochsiek, R. Gemulla
        Sequence-to-Sequence Knowledge Graph Completion and Question Answering [pdf, video resources]
        In ACL, pp. 2814-2828, 2022
         A. Renz-Wieland, R. Gemulla, Z. Kaoudi, V. Markl
        NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access [pdfsource code]
        In SIGMOD, pp. 481–495, 2022
        2021   A. Kochsiek, R. Gemulla
        Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques [pdfresources]
        In PVLDB, 15(3), 2021
         A. Renz-Wieland, T. Drobisch, R. Gemulla, Z. Kaoudi, V. Markl
        Just Move It! Dynamic Parameter Allocation in Action [pdfdemo]
        In PVLDB (demo), 14(12), 2021.
        2020   A. Renz-Wieland, R. Gemulla, S. Zeuch, V. Markl
        Dynamic Parameter Allocation in Parameter Servers [pdfsource code]
        In PVLDB, 13(12), pp. 1877-1890, 2020
         S. Broscheit, K. Gashteovski, Y. Wang, Rainer Gemulla
        Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction [pdfresources]
        In ACL, 2020
         D. Ruffinelli, S. Broscheit, R. Gemulla
        You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings [pdfvideoresourcesOpenReview]
        In ICLR, 2020
         S. Broscheit, D. Ruffinelli, A. Kochsiek, P. Betz, R. Gemulla
        LibKGE – A knowledge graph embedding library for reproducible research [pdfsource]
        In EMNLP (demo), 2020
         K. Gashteovski, R. Gemulla, B. Kotnis, S. Hertling, C. Meilicke
        On Aligning OpenIE Extractions with Knowledge Bases: A Case Study [pdfslides, resources]
        In Eval4NLP, 2020
        2019   Y. Wang, D. Ruffinelli, R. Gemulla, S. Broscheit, C. Meilicke
        On Evaluating Embedding Models for Knowledge Base Completion [pdf]
        In RepL4NLP workshop, 2019
         K. Beedkar, R. Gemulla, W. Martens
        A Unified Framework for Frequent Sequence Mining with Subsequence Constraints [pdf (journal version), pdf (author version), resources]
        In TODS, 2019
         K. Gashteovski, S. Wanner, S. Hertling, S. Broscheit, R. Gemulla
        OPIEC: An Open Information Extraction Corpus [pdfposterresourcesOpenReview]
        In AKBC, 2019
         A. Renz-Wieland, M. Bertsch, R. Gemulla
        Scalable Frequent Sequence Mining With Flexible Subsequence Constraints [pdfposter]
        In ICDE, 2019
        Preprints
        (2019)
           
        Y. Wang, S. Broscheit, R. Gemulla
        A Relational Tucker Decomposition for Multi-Relational Link Prediction [arXiv]
        2019
        2018   C. Meilicke, M. Fink, Y. Wang, D. Ruffinelli, R. Gemulla, and H. Stuckenschmidt
        Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion [pdfresources]
        In ISWC, 2018
         J. Pfeiffer, S. Broscheit, R. Gemulla, M. Göschl
        A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval [pdf]
        In BioNLP workshop, 2018
         S. Broscheit, R. Gemulla, M. Keuper
        Learning Distributional Token Representations from Visual Features [pdf]
        In RepL4NLP workshop, 2018
         Y. Wang, R. Gemulla, H. Li
        On Multi-Relational Link Prediction with Bilinear Models [pdfresources]
        In AAAI, 2018
        2017   K. Gashteovski, R. Gemulla, L. del Corro
        MinIE: Minimizing Facts in Open Information Extraction [pdfposterresources]
        In EMNLP, pp. 2620-2630, 2017
         C. Teflioudi, R. Gemulla
        Exact and Approximate Maximum Inner Product Search with LEMP [pdf (journal version)pdf (author version)resources]
        In TODS, 42(1) Art. 5, 2017
        2016   S. Neumann, R. Gemulla, P. Miettinen
        What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank [pdftech reportresources]
        In ICDM, pp. 380–389, 2016
         K. Beedkar, R. Gemulla
        DESQ: Frequent Sequence Mining with Subsequence Constraints [pdftech reportresources]
        In ICDM (short paper), pp. 793–798, 2016
        2015   L. Del Corro, A. Abujabal, R. Gemulla, G. Weikum
        FINET: Context-Aware Fine-Grained Named Entity Typing [pdfslidesresources]
        In EMNLP, pp. 868–878, 2015
         F. Petroni, L. Del Corro, R. Gemulla
        CORE: Context-Aware Open Relation Extraction with Factorization Machines [pdfslidesresources]
        In EMNLP, pp. 1763-1773, 2015
         K. Beedkar, K. Berberich, R. Gemulla, I. Miliaraki
        Closing the Gap: Sequence Mining at Scale [pdf (journal version)pdf (author version)resources]
        In TODS, 40(2) Art. 8, 2015
         C. Teflioudi, R. Gemulla, O. Mykytiuk
        LEMP: Fast Retrieval of Large Entries in a Matrix Product [pdfslidesresources]
        In SIGMOD, pp. 107–122, 2015
         K. Beedkar, R. Gemulla
        LASH: Large-Scale Sequence Mining with Hierarchies [pdfslidessource code]
        In SIGMOD, pp. 491–503, 2015
         R. Gemulla
        A Self-Portrayal of GI Junior Fellow Rainer Gemulla: Data Analysis at Scale [pdf (journal version), pdf (author version)]
        it – Information Technology 57(2), pp. 130–132 , 2015
        2014   L. Del Corro, R. Gemulla, G. Weikum
        Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning [pdfresources]
        In EMNLP, pp. 374–385, 2014
         P. Roy, J. Teubner, R. Gemulla
        Low-Latency Handshake Join [pdf]
        In PVLDB, 7(9), pp. 709–720, 2014
         L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, G. Weikum
        Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM [pdf]
        In TACL, 2, pp. 155–168, 2014
         D. Erdös, R. Gemulla, E. Terzi
        Reconstructing Graphs from Neighborhood Data [pdf (author version)pdf (journal version)]
        In TKDD, 8(4), 2014
        2013   F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis
        Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion [pdf (author version)pdf (journal version)source code]
        In KAIS (special issue: best papers of ICDM 2012), pp. 1–31, 2013
         F. Makari, R. Gemulla
        A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs [pdf]
        In NIPS 2013 Biglearn workshop (poster), 2013
         F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, M. Sozio
        A Distributed Algorithm for Large-Scale Generalized Matching [pdfslides]
        The analysis of the number of binary search steps (Lemma 2) contains a bug; see our Biglearn paper for a corrected version.
        In PVLDB, 6(9), pp. 613–624, 2013
         I. Miliaraki, K. Berberich, R. Gemulla, S. Zoupanos
        Mind the Gap: Large-Scale Frequent Sequence Mining [pdfslidesresources]
        In SIGMOD, pp. 797–808, 2013
         L. Del Corro, R. Gemulla
        ClausIE: Clause-Based Open Information Extraction [pdfslidesresources]
        In WWW, pp. 355–366, 2013
         R. Gemulla, P. J. Haas, W. Lehner
        Non-Uniformity Issues and Workarounds in Bounded-Size Sampling [pdf (author version)pdf (journal version)source code]
        In The VLDB Journal, 22(6), pp. 753–772, 2013
         K. Beedkar, L. Del Corro, R. Gemulla
        Fully Parallel Inference in Markov Logic Networks [pdf]
        In BTW, pp. 205–224, 2013
        2012   D. Erdös, R. Gemulla, E. Terzi
        Reconstructing Graphs from Neighborhood Data [pdfslides]
        In ICDM, pp. 231–240, 2012
         C. Teflioudi, F. Makari, R. Gemulla
        Distributed Matrix Completion [pdfslidessource code]
        In ICDM, pp. 655–664, 2012
         L. Qu, R. Gemulla, G. Weikum
        A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts [pdf]
        In EMNLP-CoNLL, pp. 149–159, 2012
        2011   R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari
        Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdfslidessource code]
        In NIPS 2011 Biglearn workshop, 2011 (best paper award)
         R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis
        Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdfslidessource code]
        In KDD, pp. 69–77, 2011
         K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.C. Kanne, F. Ozcan, E. Shekita
        Jaql: A Scripting Language for Large Scale Semistructured Data Analysis [pdf]
        In PVLDB (industrial track), 4(11), pp. 1272-1283, 2011
         M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson
        CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop [pdf]
        In PVLDB, 4(9), pp. 575–585, 2011
         R. Gemulla, P. J. Haas, E. Nijkamp, Y. Sismanis
        Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf]
        IBM Research Report RJ10481, March 2011 Revised February, 2013
         B. Schlegel, R. Gemulla, W. Lehner
        Memory-Efficient Frequent-Itemset Mining [pdf]
        In EDBT, pp. 461–472, 2011
        2010   S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson.
        Ricardo: Integrating R and Hadoop [pdf]
        In SIGMOD (industrial track), pp. 987–998, 2010
         B. Schlegel, R. Gemulla, W. Lehner.
        Fast Integer Compression using SIMD Instructions [pdf]
        In DAMON, pp. 34–40, 2010
        2009   K. Beyer, R. Gemulla. P. J. Haas, B. Reinwald, Y. Sismanis.
        Distinct-Value Synopses for Multiset Operations [pdftechnical perspective by Surajit Chaudhuri]
        In Commun. ACM, 52(10), pp. 87–95, 2009
         B. Schlegel, R. Gemulla, W. Lehner.
        k-Ary Search on Modern Processors [pdfslides]
        In DAMON, pp. 52–60, 2009
        2008   R. Gemulla.
        Sampling Algorithms for Evolving Datasets [pdfsummaryslides]
        Ph.D. thesis, Technische Universität Dresden, 2009
        URL for citations: nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644
         R. Gemulla, P. Rösch and W. Lehner.
        Linked Bernoulli Synopses: Sampling Along Foreign Keys [pdfslides]
        In SSDBM, pp. 6–23, 2008
         R. Gemulla and W. Lehner.
        Sampling Time-Based Sliding Windows in Bounded Space [pdfslides]
        As observed by Hu et al., the lower bound of Ω(k log N) stated in Theorem 1 should read Ω(k log(N/k)).
        In SIGMOD, pp. 379–392, 2008
         P. Rösch, R. Gemulla and W. Lehner.
        Designing Random Sample Synopses with Outliers [pdfposter]
        In ICDE (poster), pp. 1400-1402, 2008
        2007   R. Gemulla, W. Lehner and P.J. Haas.
        Maintaining Bounded-Size Sample Synopses of Evolving Datasets [pdf]
        The resizing algorithm proposed in this article contains a bug; see my Ph.D. thesis or our 2013 VLDB Journal paper for a corrected version.
        In The VLDB Journal, Special Issue: Best Papers of VLDB 2006, pp. 173–201, 2007
         K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis and R. Gemulla.
        On Synopses for Distinct-Value Estimation Under Multiset Operations [pdfslides]
        In SIGMOD, pp. 199–210, 2007
         R. Gemulla, W. Lehner and P. J. Haas.
        Maintaining Bernoulli Samples over Evolving Multisets [pdfslides]
        In PODS, pp. 93–102, 2007
        2006   R. Gemulla, W. Lehner and P. J. Haas.
        A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets [pdfslides]
        In VLDB, pp. 595–606, 2006
         A. Klein, R. Gemulla, P. Rösch and W. Lehner.
        Derby/S: A DBMS for Sample-Based Query Answering [pdfposter1poster2]
        In SIGMOD (demo), pp. 757–759, 2006
         R. Gemulla and W. Lehner.
        Deferred Maintenance of Disk-Based Random Samples [pdfslides]
        In EDBT, pp. 423–441, 2006

        Kontakt

        Prof. Dr. Rainer Gemulla

        Prof. Dr. Rainer Gemulla

        Chair of Practical Computer Science I: Data Analytics
        University of Mannheim
        School of Business Informatics and Mathematics
        B 6, 26 – Room B 0.16
        68159 Mannheim