Prof. Dr. Rainer Gemulla

Chair of Practical Computer Science I: Data Analytics

Universität Mannheim
B6, 26, Room B 016
D-68159 Mannheim

Tel.: +49 621 181 2480

rgemullamail-uni-mannheim.de
My PGP public key (id 0x81405E0B30302532)

I am heading the Chair of Practical Computer Science I: Data Analytics at the University of Mannheim. The chair is part of the Data and Web Science Group.


Go to: Research Interests, CV, PhD Students, Teaching, Awards, Professional Activities, Data and Software, Publications


If you consider applying to our lab, please read:

  • Applications should include CV, transcripts, and a short (!) cover letter or will be ignored.
  • We generally do not offer short interships (3 months or less). Applications for such internships will be ignored.
  • If you are a BSc or MSc student here, have very good transcripts, and are interested in a student job with us, contact me directly.

News

Research Interests

  • Machine learning with semi-structured/structured datadata
  • Combining unstructured and structured knowledge
  • Representation learning for multi-relational graphs
  • Efficient and scalable methods and systems for data-intensive processing

Curriculum Vitae

Since 2014   W3-Professor for Practical Computer Science I, Universität Mannheim, Germany
2010 – 2014   Senior researcher / group leader, Max-Planck-Institut für Informatik, Saarbrücken, Germany
2008 – 2010   Postdoctoral researcher, IBM Almaden Research Center, San Jose, CA, USA
2004 – 2008   PhD in Computer Science, Technische Universität Dresden, Germany

PhD Students

Former PhD students

Kaustubh Beedkar, Luciano del Corro, Kiril Gashteovski, Stefan Kain, Faraz Makari Manshadi, Alexander Renz-Wieland, Christina Teflioudi, Yanjie Wang

Teaching

If you are interested in writing a seminar, Bachelor or Master thesis with us, please read the following guidelines.

Current semester (FSS 2024)

Previous semester (HWS 2023)

      Awards

      • Distinguished PC Member Award at EDBT, 2023
      • Outstanding Reviewer Award at NeurIPS, 2021
      • Named Distinguished PC Member for SIGMOD, 2017
      • Junior-Fellow of the Gesellschaft für Informatik (GI), 2013
      • AWS in Education Research Grant Award, 2013
      • Busy Beaver teaching award (winter term 2012/2013)
        for Non-Traditional Data Management (NoSQL and more)
      • IBM's 2011 Pat Goldberg Memorial best paper award in CS, EE and Math
        (for “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent” with P. J. Haas, E. Nijkamp, and Y. Sismanis; KDD 2011)
      • Best paper of NIPS 2011 Biglearn workshop
        (for “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent” with P. J. Haas, Y. Sismanis, C. Teflioudi, and F. Makari)
      • Google Focused Research Award 2011: Robust and Scalable Fact Discovery from Web Sources
        (with G. Weikum and M. Theobald)
      • Research Highlight in Communications of the ACM
        (for “Distinct-Value Synopses for Multiset Operations” with K. Beyer, P.J. Haas, B. Reinwald, Y. Sismanis)
      • The VLDB Journal, Special Issue: Best Papers of VLDB 2006
        (for “A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets” with W. Lehner and P.J. Haas)

      Professional Activities

      Associate editor / area chair

      • VLDB: 2021, 2019, 2015
      • TKDE: 2015–2018
      • DASP: 2018
      • EDBT: 2017
      • JISA: 2016
      • CIKM: 2014

      PC member / reviewer (since 2011)

      • 2022: EDBT, IJCAI, ICLR, SDM, SIGMOD
      • 2021: DEEM, EDBT, ICDM, IJCAI, KDML, NeurIPS, SIGMOD (demo), Repl4NLP
      • 2020: AKBC, EDBT, IJCAI, LWDA, Repl4NLP, SUM, TPDS
      • 2019: AKBC, BTW, DEEM, IJCAI, INFORMATIK, LWDA, SUM
      • 2018: VLDB, EDBT, DEEM
      • 2017: SIGMOD, BTW, DEEM, TALG, VLDB, ECML-PKDD
      • 2016: SIGMOD, ECML-PKDD
      • 2015: BTW, DAMI, GvDB, IS, JODS, JWS, PODS
      • 2014: BDDC, BigData, Buda, DMC, JMLR, SIGMOD, VLDB
      • 2013: AKBC, BigData, BTW, CIKM, DMC, ICALP, TKDE, TML, VLDB
      • 2012: KDD, JMLR, TODS
      • 2011: BTW, IS, VLDB, VLDBJ

      Organizer

      Data and Software

      • AdaPM: A fully adaptive parameter manager
      • LibKGE: A knowledge graph embedding library
      • Lapse: A parameter server with dynamic parameter allocation
      • OPIEC: An open information extraction corpus
      • MinIE: Open information extractor (spiritual successor to ClausIE)
      • DSGDpp: Various parallel algorithms for matrix factorization (including DSGD++)
      • DESQ: Frequent sequence mining with subsequence constraints
      • Rounding rank: algorithms for computing rounding-rank decompositions
      • CORE: Context-aware open relation extraction with factorization machines
      • FINET: Context-aware fine-grained named entity typing
      • Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning
      • ClausIE: Clause-Based Open Information Extraction
      • LEMP: Fast Retrieval of Large Entries in a Matrix Product
      • LASH: Large-Scale Sequence Mining with Hierarchies
      • MG-FSM: Large-Scale Frequent Sequence Mining

      Publications

      See also Google Scholar and DBLP.

      2023   A. Kochsiek, R. Gemulla
      A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs [pdfresources]
      In EMNLP Findings, 2023
       A. Kochsiek, A. Saxena, I. Nair, R. Gemulla
      Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction [pdfresources]
      In Repl4NLP workshop, 2023
       A. Renz-Wieland, A. Kieslinger, R. Gericke, R. Gemulla, Z. Kaoudi, V. Markl
      Good Intentions: Adaptive Parameter Management via Intent Signaling [pdfresources]
      In CIKM, 2023
      2022   A. Kochsiek, F. Niesel, R. Gemulla
      Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings [pdfresources]
      In ECML-PKDD, 2022
       A. Saxena, A. Kochsiek, R. Gemulla
      Sequence-to-Sequence Knowledge Graph Completion and Question Answering [pdf, video resources]
      In ACL, pp. 2814-2828, 2022
       A. Renz-Wieland, R. Gemulla, Z. Kaoudi, V. Markl
      NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access [pdfsource code]
      In SIGMOD, pp. 481–495, 2022
      2021   A. Kochsiek, R. Gemulla
      Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques [pdfresources]
      In PVLDB, 15(3), 2021
       A. Renz-Wieland, T. Drobisch, R. Gemulla, Z. Kaoudi, V. Markl
      Just Move It! Dynamic Parameter Allocation in Action [pdfdemo]
      In PVLDB (demo), 14(12), 2021.
      2020   A. Renz-Wieland, R. Gemulla, S. Zeuch, V. Markl
      Dynamic Parameter Allocation in Parameter Servers [pdfsource code]
      In PVLDB, 13(12), pp. 1877-1890, 2020
       S. Broscheit, K. Gashteovski, Y. Wang, Rainer Gemulla
      Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction [pdfresources]
      In ACL, 2020
       D. Ruffinelli, S. Broscheit, R. Gemulla
      You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings [pdfvideoresourcesOpenReview]
      In ICLR, 2020
       S. Broscheit, D. Ruffinelli, A. Kochsiek, P. Betz, R. Gemulla
      LibKGE – A knowledge graph embedding library for reproducible research [pdfsource]
      In EMNLP (demo), 2020
       K. Gashteovski, R. Gemulla, B. Kotnis, S. Hertling, C. Meilicke
      On Aligning OpenIE Extractions with Knowledge Bases: A Case Study [pdfslides, resources]
      In Eval4NLP, 2020
      2019   Y. Wang, D. Ruffinelli, R. Gemulla, S. Broscheit, C. Meilicke
      On Evaluating Embedding Models for Knowledge Base Completion [pdf]
      In RepL4NLP workshop, 2019
       K. Beedkar, R. Gemulla, W. Martens
      A Unified Framework for Frequent Sequence Mining with Subsequence Constraints [pdf (journal version), pdf (author version), resources]
      In TODS, 2019
       K. Gashteovski, S. Wanner, S. Hertling, S. Broscheit, R. Gemulla
      OPIEC: An Open Information Extraction Corpus [pdfposterresourcesOpenReview]
      In AKBC, 2019
       A. Renz-Wieland, M. Bertsch, R. Gemulla
      Scalable Frequent Sequence Mining With Flexible Subsequence Constraints [pdfposter]
      In ICDE, 2019
      Preprints
      (2019)
         
      Y. Wang, S. Broscheit, R. Gemulla
      A Relational Tucker Decomposition for Multi-Relational Link Prediction [arXiv]
      2019
      2018   C. Meilicke, M. Fink, Y. Wang, D. Ruffinelli, R. Gemulla, and H. Stuckenschmidt
      Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion [pdfresources]
      In ISWC, 2018
       J. Pfeiffer, S. Broscheit, R. Gemulla, M. Göschl
      A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval [pdf]
      In BioNLP workshop, 2018
       S. Broscheit, R. Gemulla, M. Keuper
      Learning Distributional Token Representations from Visual Features [pdf]
      In RepL4NLP workshop, 2018
       Y. Wang, R. Gemulla, H. Li
      On Multi-Relational Link Prediction with Bilinear Models [pdfresources]
      In AAAI, 2018
      2017   K. Gashteovski, R. Gemulla, L. del Corro
      MinIE: Minimizing Facts in Open Information Extraction [pdfposterresources]
      In EMNLP, pp. 2620-2630, 2017
       C. Teflioudi, R. Gemulla
      Exact and Approximate Maximum Inner Product Search with LEMP [pdf (journal version)pdf (author version)resources]
      In TODS, 42(1) Art. 5, 2017
      2016   S. Neumann, R. Gemulla, P. Miettinen
      What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank [pdftech reportresources]
      In ICDM, pp. 380–389, 2016
       K. Beedkar, R. Gemulla
      DESQ: Frequent Sequence Mining with Subsequence Constraints [pdftech reportresources]
      In ICDM (short paper), pp. 793–798, 2016
      2015   L. Del Corro, A. Abujabal, R. Gemulla, G. Weikum
      FINET: Context-Aware Fine-Grained Named Entity Typing [pdfslidesresources]
      In EMNLP, pp. 868–878, 2015
       F. Petroni, L. Del Corro, R. Gemulla
      CORE: Context-Aware Open Relation Extraction with Factorization Machines [pdfslidesresources]
      In EMNLP, pp. 1763-1773, 2015
       K. Beedkar, K. Berberich, R. Gemulla, I. Miliaraki
      Closing the Gap: Sequence Mining at Scale [pdf (journal version)pdf (author version)resources]
      In TODS, 40(2) Art. 8, 2015
       C. Teflioudi, R. Gemulla, O. Mykytiuk
      LEMP: Fast Retrieval of Large Entries in a Matrix Product [pdfslidesresources]
      In SIGMOD, pp. 107–122, 2015
       K. Beedkar, R. Gemulla
      LASH: Large-Scale Sequence Mining with Hierarchies [pdfslidessource code]
      In SIGMOD, pp. 491–503, 2015
       R. Gemulla
      A Self-Portrayal of GI Junior Fellow Rainer Gemulla: Data Analysis at Scale [pdf (journal version), pdf (author version)]
      it – Information Technology 57(2), pp. 130–132 , 2015
      2014   L. Del Corro, R. Gemulla, G. Weikum
      Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning [pdfresources]
      In EMNLP, pp. 374–385, 2014
       P. Roy, J. Teubner, R. Gemulla
      Low-Latency Handshake Join [pdf]
      In PVLDB, 7(9), pp. 709–720, 2014
       L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, G. Weikum
      Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM [pdf]
      In TACL, 2, pp. 155–168, 2014
       D. Erdös, R. Gemulla, E. Terzi
      Reconstructing Graphs from Neighborhood Data [pdf (author version)pdf (journal version)]
      In TKDD, 8(4), 2014
      2013   F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis
      Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion [pdf (author version)pdf (journal version)source code]
      In KAIS (special issue: best papers of ICDM 2012), pp. 1–31, 2013
       F. Makari, R. Gemulla
      A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs [pdf]
      In NIPS 2013 Biglearn workshop (poster), 2013
       F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, M. Sozio
      A Distributed Algorithm for Large-Scale Generalized Matching [pdfslides]
      The analysis of the number of binary search steps (Lemma 2) contains a bug; see our Biglearn paper for a corrected version.
      In PVLDB, 6(9), pp. 613–624, 2013
       I. Miliaraki, K. Berberich, R. Gemulla, S. Zoupanos
      Mind the Gap: Large-Scale Frequent Sequence Mining [pdfslidesresources]
      In SIGMOD, pp. 797–808, 2013
       L. Del Corro, R. Gemulla
      ClausIE: Clause-Based Open Information Extraction [pdfslidesresources]
      In WWW, pp. 355–366, 2013
       R. Gemulla, P. J. Haas, W. Lehner
      Non-Uniformity Issues and Workarounds in Bounded-Size Sampling [pdf (author version)pdf (journal version)source code]
      In The VLDB Journal, 22(6), pp. 753–772, 2013
       K. Beedkar, L. Del Corro, R. Gemulla
      Fully Parallel Inference in Markov Logic Networks [pdf]
      In BTW, pp. 205–224, 2013
      2012   D. Erdös, R. Gemulla, E. Terzi
      Reconstructing Graphs from Neighborhood Data [pdfslides]
      In ICDM, pp. 231–240, 2012
       C. Teflioudi, F. Makari, R. Gemulla
      Distributed Matrix Completion [pdfslidessource code]
      In ICDM, pp. 655–664, 2012
       L. Qu, R. Gemulla, G. Weikum
      A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts [pdf]
      In EMNLP-CoNLL, pp. 149–159, 2012
      2011   R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari
      Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdfslidessource code]
      In NIPS 2011 Biglearn workshop, 2011 (best paper award)
       R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis
      Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdfslidessource code]
      In KDD, pp. 69–77, 2011
       K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.C. Kanne, F. Ozcan, E. Shekita
      Jaql: A Scripting Language for Large Scale Semistructured Data Analysis [pdf]
      In PVLDB (industrial track), 4(11), pp. 1272-1283, 2011
       M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson
      CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop [pdf]
      In PVLDB, 4(9), pp. 575–585, 2011
       R. Gemulla, P. J. Haas, E. Nijkamp, Y. Sismanis
      Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf]
      IBM Research Report RJ10481, March 2011 Revised February, 2013
       B. Schlegel, R. Gemulla, W. Lehner
      Memory-Efficient Frequent-Itemset Mining [pdf]
      In EDBT, pp. 461–472, 2011
      2010   S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson.
      Ricardo: Integrating R and Hadoop [pdf]
      In SIGMOD (industrial track), pp. 987–998, 2010
       B. Schlegel, R. Gemulla, W. Lehner.
      Fast Integer Compression using SIMD Instructions [pdf]
      In DAMON, pp. 34–40, 2010
      2009   K. Beyer, R. Gemulla. P. J. Haas, B. Reinwald, Y. Sismanis.
      Distinct-Value Synopses for Multiset Operations [pdftechnical perspective by Surajit Chaudhuri]
      In Commun. ACM, 52(10), pp. 87–95, 2009
       B. Schlegel, R. Gemulla, W. Lehner.
      k-Ary Search on Modern Processors [pdfslides]
      In DAMON, pp. 52–60, 2009
      2008   R. Gemulla.
      Sampling Algorithms for Evolving Datasets [pdfsummaryslides]
      Ph.D. thesis, Technische Universität Dresden, 2009
      URL for citations: nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644
       R. Gemulla, P. Rösch and W. Lehner.
      Linked Bernoulli Synopses: Sampling Along Foreign Keys [pdfslides]
      In SSDBM, pp. 6–23, 2008
       R. Gemulla and W. Lehner.
      Sampling Time-Based Sliding Windows in Bounded Space [pdfslides]
      As observed by Hu et al., the lower bound of Ω(k log N) stated in Theorem 1 should read Ω(k log(N/k)).
      In SIGMOD, pp. 379–392, 2008
       P. Rösch, R. Gemulla and W. Lehner.
      Designing Random Sample Synopses with Outliers [pdfposter]
      In ICDE (poster), pp. 1400-1402, 2008
      2007   R. Gemulla, W. Lehner and P.J. Haas.
      Maintaining Bounded-Size Sample Synopses of Evolving Datasets [pdf]
      The resizing algorithm proposed in this article contains a bug; see my Ph.D. thesis or our 2013 VLDB Journal paper for a corrected version.
      In The VLDB Journal, Special Issue: Best Papers of VLDB 2006, pp. 173–201, 2007
       K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis and R. Gemulla.
      On Synopses for Distinct-Value Estimation Under Multiset Operations [pdfslides]
      In SIGMOD, pp. 199–210, 2007
       R. Gemulla, W. Lehner and P. J. Haas.
      Maintaining Bernoulli Samples over Evolving Multisets [pdfslides]
      In PODS, pp. 93–102, 2007
      2006   R. Gemulla, W. Lehner and P. J. Haas.
      A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets [pdfslides]
      In VLDB, pp. 595–606, 2006
       A. Klein, R. Gemulla, P. Rösch and W. Lehner.
      Derby/S: A DBMS for Sample-Based Query Answering [pdfposter1poster2]
      In SIGMOD (demo), pp. 757–759, 2006
       R. Gemulla and W. Lehner.
      Deferred Maintenance of Disk-Based Random Samples [pdfslides]
      In EDBT, pp. 423–441, 2006

      Kontakt

      Prof. Dr. Rainer Gemulla

      Prof. Dr. Rainer Gemulla

      Chair of Practical Computer Science I: Data Analytics
      University of Mannheim
      School of Business Informatics and Mathematics
      B 6, 26 – Room B 0.16
      68159 Mannheim