Prof. Dr. Rainer Gemulla
Chair of Practical Computer Science I: Data Analytics
Universität Mannheim
B6, 26, Room B 016
D-68159 Mannheim
Tel.: +49 621 181 2480
rgemulla
My PGP public key (id 0x81405E0B30302532)
I am heading the Chair of Practical Computer Science I: Data Analytics at the University of Mannheim. The chair is part of the Data and Web Science Group.
Go to: Research Interests, CV, PhD Students, Teaching, Awards, Professional Activities, Data and Software, Publications
If you consider applying to our lab, please read:
- Applications should include CV, transcripts, and a short (!) cover letter or will be ignored.
- We generally do not offer short interships (3 months or less). Applications for such internships will be ignored.
- If you are a BSc or MSc student here, have very good transcripts, and are interested in a student job with us, contact me directly.
Research Interests
My research interests are mainly in machine learning and data processing. In particular:
- Machine learning with structured data (such as relational data)
- Machine learning with semi-structured data (such as multi-relational graphs)
- Combining the above with unstructured knowledge (such as text)
- Efficient and scalable methods and systems for data-intensive processing
News
Curriculum Vitae
Since 2014 | W3-Professor for Practical Computer Science I, Universität Mannheim, Germany |
2010 – 2014 | Senior researcher / group leader, Max-Planck-Institut für Informatik, Saarbrücken, Germany |
2008 – 2010 | Postdoctoral researcher, IBM Almaden Research Center, San Jose, CA, USA |
2004 – 2008 | PhD in Computer Science, Technische Universität Dresden, Germany |
PhD Students
Former PhD students
Kaustubh Beedkar, Luciano del Corro, Kiril Gashteovski, Stefan Kain, Faraz Makari Manshadi, Alexander Renz-Wieland, Christina Teflioudi, Yanjie Wang
Teaching
If you are interested in writing a seminar, Bachelor or Master thesis with us, please read the following guidelines.
If you are within the intranet of University of Mannheim, you can access lecture videos / materials here. If not, ask me.
Upcoming semester (FSS 2024)
- CS 303: Praktische Informatik II (Bachelor course)
- IE 678: Deep Learning (Master course)
- CS 707: Data and Web Science Seminar (Master seminar)
- Data Analytics Team Project: Your Project, Your Team
- Colloquium (for PhD candidates)
Current semester (HWS 2024)
- CS 560: Large-Scale Data Management (Master Course)
- IE 675b: Machine Learning (Master course)
- SM 445/
CS707: Data and Web Science Seminar (Bachelor/Master seminar) - Team Project: Research Profiling
- Colloquium (for PhD candidates)
Previous courses (not taught anymore)
- Hot Topics in Machine Learning (UMA, MSc, 2016–2017); pre-cursor to IE675b Machine Learning
- Database Systems II (UMA, MSc, 2015); pre-cursor to CS560 Large-Scale Data Management
- Algorithmen und Datenstrukturen (UMA, BSc, 2014)
- Data Mining and Matrices (MPII/UMA, MSc, 2013–2018)
- Scalable Uncertainty Management (MPII, MSc, 2012–2013)
- Datenbankprogrammierung (TUD, Diploma, DB2 certification course, 2005–2007)
Awards
- Distinguished PC Member Award at EDBT, 2023
- Outstanding Reviewer Award at NeurIPS, 2021
- Named Distinguished PC Member for SIGMOD, 2017
- Junior-Fellow of the Gesellschaft für Informatik (GI), 2013
- AWS in Education Research Grant Award, 2013
- Busy Beaver teaching award (winter term 2012/
2013)
for Non-Traditional Data Management (NoSQL and more) - IBM's 2011 Pat Goldberg Memorial best paper award in CS, EE and Math
(for “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent” with P. J. Haas, E. Nijkamp, and Y. Sismanis; KDD 2011) - Best paper of NIPS 2011 Biglearn workshop
(for “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent” with P. J. Haas, Y. Sismanis, C. Teflioudi, and F. Makari) - Google Focused Research Award 2011: Robust and Scalable Fact Discovery from Web Sources
(with G. Weikum and M. Theobald) - Research Highlight in Communications of the ACM
(for “Distinct-Value Synopses for Multiset Operations” with K. Beyer, P.J. Haas, B. Reinwald, Y. Sismanis) - The VLDB Journal, Special Issue: Best Papers of VLDB 2006
(for “A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets” with W. Lehner and P.J. Haas)
Professional Activities
Administration
- Head of examination board: MSc Business Informatics (since 2017)
- Member of examination board: BSc Business Informatics (since 2017 or earlier), Mannheim Master in Data Science (since 2017)
- Member of selection commitee: BSc and MSc Business Informatics (since 2017 or earlier), Mannheim Master in Data Science (since 2024)
- Information officer of the WIM faculty: since 2022
- Embassador of the German Informatics Society (GI): since 2017
- CIO of University of Mannheim: 2022–2024
- Study dean of the WIM faculty: 2016–2019
- Member of the selection committee: Max Planck International Research School (2010–2014)
Organizer
- Chair of study program at BTW23 and BTW25 conferences
- LWDA 2018 conference
- GI Dagstuhl seminar: Informatik@Schule 2016 – Das Verhältnis von informatischer Bildung und “Digitaler Bildung” (2016)
- GI Dagstuhl seminar: Informatik@Schule – Agenda für informatische Bildung in der Schule (2014)
- Member of organizing board and head of jury of BTW11's demonstration program
Young academics
- One-day workshop “Machine Learning Systems” at Jugendforum BW 2024
- Mentor of Junior Professors: Roland Leißa (2021–2025), Margret Keuper (2017–2021), Goran Glavaš (2017–2021)
- Mentor in the mentoring program of the German Informatics Society (GI, since 2020)
- Member of task committee and final-round jury of BWINF (since 2010)
- Member of the board of BWINF (2014–2022)
Associate editor / area chair
- VLDB: 2021, 2019, 2015
- TKDE: 2015–2018
- DASP: 2018
- EDBT: 2017
- JISA: 2016
- CIKM: 2014
PC member / reviewer (since 2011)
- 2025: EDBT, ICLR, SIGMOD, STACS, TMLR, VLDB
- 2024: ARR, DAMI, DEEM, ICLR, IJCAI, LLM+KG, NeurIPS, SIGMOD, TMLR, VLDBJ
- 2023: ARR, Artificial Intelligence, BTW, DEEM, EDBT, ICLR, IJCAI, Neural Networks, Repl4NLP, SIGMOD, TMLR, VLDB, VLDBJ
- 2022: Artificial Intelligence, DEEM, EDBT, ICLR, IJCAI, KDML, Machine Learning, Pattern Recognition, SDM, SIGMOD, VLDBJ
- 2021: DEEM, EDBT, ICDM, IJCAI, KDML, NeurIPS, SIGMOD (demo), Repl4NLP
- 2020: AKBC, EDBT, IJCAI, LWDA, Repl4NLP, SUM, TPDS
- 2019: AKBC, BTW, DEEM, IJCAI, INFORMATIK, LWDA, SUM
- 2018: VLDB, EDBT, DEEM
- 2017: SIGMOD, BTW, DEEM, TALG, VLDB, ECML-PKDD
- 2016: SIGMOD, ECML-PKDD
- 2015: BTW, DAMI, GvDB, IS, JODS, JWS, PODS
- 2014: BDDC, BigData, Buda, DMC, JMLR, SIGMOD, VLDB
- 2013: AKBC, BigData, BTW, CIKM, DMC, ICALP, TKDE, TML, VLDB
- 2012: KDD, JMLR, TODS
- 2011: BTW, IS, VLDB, VLDBJ
Data and Software
- GraSH: Multi-fidelity HPO for graph learning
- DistKGE: A knowledge graph embedding library for multi-GPU and multi-machine training
- AdaPM: A fully adaptive parameter manager
- LibKGE: A knowledge graph embedding library
- Lapse: A parameter server with dynamic parameter allocation
- OPIEC: An open information extraction corpus
- MinIE: Open information extractor (spiritual successor to ClausIE)
- DSGDpp: Various parallel algorithms for matrix factorization (including DSGD++)
- DESQ: Frequent sequence mining with subsequence constraints
- Rounding rank: algorithms for computing rounding-rank decompositions
- CORE: Context-aware open relation extraction with factorization machines
- FINET: Context-aware fine-grained named entity typing
- Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning
- ClausIE: Clause-Based Open Information Extraction
- LEMP: Fast Retrieval of Large Entries in a Matrix Product
- LASH: Large-Scale Sequence Mining with Hierarchies
- MG-FSM: Large-Scale Frequent Sequence Mining
Publications
See also Google Scholar and DBLP.
2023 | A. Kochsiek, R. Gemulla A Benchmark for Semi-Inductive Link Prediction in Knowledge Graphs [pdf, resources] In EMNLP Findings, 2023 |
A. Kochsiek, A. Saxena, I. Nair, R. Gemulla Friendly Neighbors: Contextualized Sequence-to-Sequence Link Prediction [pdf, resources] In Repl4NLP workshop, 2023 | |
A. Renz-Wieland, A. Kieslinger, R. Gericke, R. Gemulla, Z. Kaoudi, V. Markl Good Intentions: Adaptive Parameter Management via Intent Signaling [pdf, resources] In CIKM, 2023 | |
2022 | A. Kochsiek, F. Niesel, R. Gemulla Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings [pdf, resources] In ECML-PKDD, 2022 |
A. Saxena, A. Kochsiek, R. Gemulla Sequence-to-Sequence Knowledge Graph Completion and Question Answering [pdf, , video resources] In ACL, pp. 2814-2828, 2022 | |
A. Renz-Wieland, R. Gemulla, Z. Kaoudi, V. Markl NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access [pdf, source code] In SIGMOD, pp. 481–495, 2022 | |
2021 | A. Kochsiek, R. Gemulla Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques [pdf, resources] In PVLDB, 15(3), 2021 |
A. Renz-Wieland, T. Drobisch, R. Gemulla, Z. Kaoudi, V. Markl Just Move It! Dynamic Parameter Allocation in Action [pdf, demo] In PVLDB (demo), 14(12), 2021. | |
2020 | A. Renz-Wieland, R. Gemulla, S. Zeuch, V. Markl Dynamic Parameter Allocation in Parameter Servers [pdf, source code] In PVLDB, 13(12), pp. 1877-1890, 2020 |
S. Broscheit, K. Gashteovski, Y. Wang, Rainer Gemulla Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction [pdf, resources] In ACL, 2020 | |
D. Ruffinelli, S. Broscheit, R. Gemulla You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings [pdf, video, resources, OpenReview] In ICLR, 2020 | |
S. Broscheit, D. Ruffinelli, A. Kochsiek, P. Betz, R. Gemulla LibKGE – A knowledge graph embedding library for reproducible research [pdf, source] In EMNLP (demo), 2020 | |
K. Gashteovski, R. Gemulla, B. Kotnis, S. Hertling, C. Meilicke On Aligning OpenIE Extractions with Knowledge Bases: A Case Study [pdf, slides, resources] In Eval4NLP, 2020 | |
2019 | Y. Wang, D. Ruffinelli, R. Gemulla, S. Broscheit, C. Meilicke On Evaluating Embedding Models for Knowledge Base Completion [pdf] In RepL4NLP workshop, 2019 |
K. Beedkar, R. Gemulla, W. Martens A Unified Framework for Frequent Sequence Mining with Subsequence Constraints [pdf (journal version), pdf (author version), resources] In TODS, 2019 | |
K. Gashteovski, S. Wanner, S. Hertling, S. Broscheit, R. Gemulla OPIEC: An Open Information Extraction Corpus [pdf, poster, resources, OpenReview] In AKBC, 2019 | |
A. Renz-Wieland, M. Bertsch, R. Gemulla Scalable Frequent Sequence Mining With Flexible Subsequence Constraints [pdf, poster] In ICDE, 2019 | |
Preprints (2019) | Y. Wang, S. Broscheit, R. Gemulla A Relational Tucker Decomposition for Multi-Relational Link Prediction [arXiv] 2019 |
2018 | C. Meilicke, M. Fink, Y. Wang, D. Ruffinelli, R. Gemulla, and H. Stuckenschmidt Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion [pdf, resources] In ISWC, 2018 |
J. Pfeiffer, S. Broscheit, R. Gemulla, M. Göschl A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval [pdf] In BioNLP workshop, 2018 | |
S. Broscheit, R. Gemulla, M. Keuper Learning Distributional Token Representations from Visual Features [pdf] In RepL4NLP workshop, 2018 | |
Y. Wang, R. Gemulla, H. Li On Multi-Relational Link Prediction with Bilinear Models [pdf, resources] In AAAI, 2018 | |
2017 | K. Gashteovski, R. Gemulla, L. del Corro MinIE: Minimizing Facts in Open Information Extraction [pdf, poster, resources] In EMNLP, pp. 2620-2630, 2017 |
C. Teflioudi, R. Gemulla Exact and Approximate Maximum Inner Product Search with LEMP [pdf (journal version), pdf (author version), resources] In TODS, 42(1) Art. 5, 2017 | |
2016 | S. Neumann, R. Gemulla, P. Miettinen What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank [pdf, tech report, resources] In ICDM, pp. 380–389, 2016 |
K. Beedkar, R. Gemulla DESQ: Frequent Sequence Mining with Subsequence Constraints [pdf, tech report, resources] In ICDM (short paper), pp. 793–798, 2016 | |
2015 | L. Del Corro, A. Abujabal, R. Gemulla, G. Weikum FINET: Context-Aware Fine-Grained Named Entity Typing [pdf, slides, resources] In EMNLP, pp. 868–878, 2015 |
F. Petroni, L. Del Corro, R. Gemulla CORE: Context-Aware Open Relation Extraction with Factorization Machines [pdf, slides, resources] In EMNLP, pp. 1763-1773, 2015 | |
K. Beedkar, K. Berberich, R. Gemulla, I. Miliaraki Closing the Gap: Sequence Mining at Scale [pdf (journal version), pdf (author version), resources] In TODS, 40(2) Art. 8, 2015 | |
C. Teflioudi, R. Gemulla, O. Mykytiuk LEMP: Fast Retrieval of Large Entries in a Matrix Product [pdf, slides, resources] In SIGMOD, pp. 107–122, 2015 | |
K. Beedkar, R. Gemulla LASH: Large-Scale Sequence Mining with Hierarchies [pdf, slides, source code] In SIGMOD, pp. 491–503, 2015 | |
R. Gemulla A Self-Portrayal of GI Junior Fellow Rainer Gemulla: Data Analysis at Scale [pdf (journal version), pdf (author version)] it – Information Technology 57(2), pp. 130–132 , 2015 | |
2014 | L. Del Corro, R. Gemulla, G. Weikum Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning [pdf, resources] In EMNLP, pp. 374–385, 2014 |
P. Roy, J. Teubner, R. Gemulla Low-Latency Handshake Join [pdf] In PVLDB, 7(9), pp. 709–720, 2014 | |
L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, G. Weikum Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM [pdf] In TACL, 2, pp. 155–168, 2014 | |
D. Erdös, R. Gemulla, E. Terzi Reconstructing Graphs from Neighborhood Data [pdf (author version), pdf (journal version)] In TKDD, 8(4), 2014 | |
2013 | F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion [pdf (author version), pdf (journal version), source code] In KAIS (special issue: best papers of ICDM 2012), pp. 1–31, 2013 |
F. Makari, R. Gemulla A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs [pdf] In NIPS 2013 Biglearn workshop (poster), 2013 | |
F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, M. Sozio A Distributed Algorithm for Large-Scale Generalized Matching [pdf, slides] The analysis of the number of binary search steps (Lemma 2) contains a bug; see our Biglearn paper for a corrected version. In PVLDB, 6(9), pp. 613–624, 2013 | |
I. Miliaraki, K. Berberich, R. Gemulla, S. Zoupanos Mind the Gap: Large-Scale Frequent Sequence Mining [pdf, slides, resources] In SIGMOD, pp. 797–808, 2013 | |
L. Del Corro, R. Gemulla ClausIE: Clause-Based Open Information Extraction [pdf, slides, resources] In WWW, pp. 355–366, 2013 | |
R. Gemulla, P. J. Haas, W. Lehner Non-Uniformity Issues and Workarounds in Bounded-Size Sampling [pdf (author version), pdf (journal version), source code] In The VLDB Journal, 22(6), pp. 753–772, 2013 | |
K. Beedkar, L. Del Corro, R. Gemulla Fully Parallel Inference in Markov Logic Networks [pdf] In BTW, pp. 205–224, 2013 | |
2012 | D. Erdös, R. Gemulla, E. Terzi Reconstructing Graphs from Neighborhood Data [pdf, slides] In ICDM, pp. 231–240, 2012 |
C. Teflioudi, F. Makari, R. Gemulla Distributed Matrix Completion [pdf, slides, source code] In ICDM, pp. 655–664, 2012 | |
L. Qu, R. Gemulla, G. Weikum A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts [pdf] In EMNLP-CoNLL, pp. 149–159, 2012 | |
2011 | R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides, source code] In NIPS 2011 Biglearn workshop, 2011 (best paper award) |
R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides, source code] In KDD, pp. 69–77, 2011 | |
K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.C. Kanne, F. Ozcan, E. Shekita Jaql: A Scripting Language for Large Scale Semistructured Data Analysis [pdf] In PVLDB (industrial track), 4(11), pp. 1272-1283, 2011 | |
M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop [pdf] In PVLDB, 4(9), pp. 575–585, 2011 | |
R. Gemulla, P. J. Haas, E. Nijkamp, Y. Sismanis Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf] IBM Research Report RJ10481, March 2011 Revised February, 2013 | |
B. Schlegel, R. Gemulla, W. Lehner Memory-Efficient Frequent-Itemset Mining [pdf] In EDBT, pp. 461–472, 2011 | |
2010 | S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson. Ricardo: Integrating R and Hadoop [pdf] In SIGMOD (industrial track), pp. 987–998, 2010 |
B. Schlegel, R. Gemulla, W. Lehner. Fast Integer Compression using SIMD Instructions [pdf] In DAMON, pp. 34–40, 2010 | |
2009 | K. Beyer, R. Gemulla. P. J. Haas, B. Reinwald, Y. Sismanis. Distinct-Value Synopses for Multiset Operations [pdf, technical perspective by Surajit Chaudhuri] In Commun. ACM, 52(10), pp. 87–95, 2009 |
B. Schlegel, R. Gemulla, W. Lehner. k-Ary Search on Modern Processors [pdf, slides] In DAMON, pp. 52–60, 2009 | |
2008 | R. Gemulla. Sampling Algorithms for Evolving Datasets [pdf, summary, slides] Ph.D. thesis, Technische Universität Dresden, 2009 URL for citations: nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644 |
R. Gemulla, P. Rösch and W. Lehner. Linked Bernoulli Synopses: Sampling Along Foreign Keys [pdf, slides] In SSDBM, pp. 6–23, 2008 | |
R. Gemulla and W. Lehner. Sampling Time-Based Sliding Windows in Bounded Space [pdf, slides] As observed by Hu et al., the lower bound of Ω(k log N) stated in Theorem 1 should read Ω(k log(N/k)). In SIGMOD, pp. 379–392, 2008 | |
P. Rösch, R. Gemulla and W. Lehner. Designing Random Sample Synopses with Outliers [pdf, poster] In ICDE (poster), pp. 1400-1402, 2008 | |
2007 | R. Gemulla, W. Lehner and P.J. Haas. Maintaining Bounded-Size Sample Synopses of Evolving Datasets [pdf] The resizing algorithm proposed in this article contains a bug; see my Ph.D. thesis or our 2013 VLDB Journal paper for a corrected version. In The VLDB Journal, Special Issue: Best Papers of VLDB 2006, pp. 173–201, 2007 |
K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis and R. Gemulla. On Synopses for Distinct-Value Estimation Under Multiset Operations [pdf, slides] In SIGMOD, pp. 199–210, 2007 | |
R. Gemulla, W. Lehner and P. J. Haas. Maintaining Bernoulli Samples over Evolving Multisets [pdf, slides] In PODS, pp. 93–102, 2007 | |
2006 | R. Gemulla, W. Lehner and P. J. Haas. A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets [pdf, slides] In VLDB, pp. 595–606, 2006 |
A. Klein, R. Gemulla, P. Rösch and W. Lehner. Derby/ In SIGMOD (demo), pp. 757–759, 2006 | |
R. Gemulla and W. Lehner. Deferred Maintenance of Disk-Based Random Samples [pdf, slides] In EDBT, pp. 423–441, 2006 |