Prof. Dr. Rainer Gemulla

Chair of Practical Computer Science I: Data Analytics

Universität Mannheim
B6, 26, Room B 016
D-68159 Mannheim

Tel.: +49 621 181 2480

rgemulla(at)uni-mannheim.de

I am heading the Chair of Practical Computer Science I: Data Analytics at the University of Mannheim. The chair is part of the Data and Web Science Group.

Go to: Research Interests, CV, Awards, Ph.D. Students, Teaching, Professional Activities, Data and Software, Publications

If you consider applying to our lab, please read:

  • Applications that do not include CV, transcripts, and a short (!) cover letter will likely be ignored.
  • We generally do not offer short interships (3 months or less).
  • If you are a BSc or MSc student here, have very good transcripts, and are interested in a student job with us, please contact me directly.

Research Interests

  • Data analysis and data mining
  • Text mining and information extraction
  • Optimization
  • Approximation techniques
  • Algorithms for modern hardware

Curriculum Vitae

Since 2014    W3-Professor for Practical Computer Science I, Universität Mannheim, Germany
2010 - 2014    Senior researcher / group leader, Max-Planck-Institut für Informatik, Saarbrücken, Germany
2008 - 2010    Postdoctoral researcher, IBM Almaden Research Center, San Jose, CA, USA
2004 - 2008    PhD in Computer Science, Technische Universität Dresden, Germany

Awards

  • Named Distinguished PC Member for SIGMOD, 2017
  • Junior-Fellow of the Gesellschaft für Informatik (GI), 2013
  • AWS in Education Research Grant Award, 2013
  • Busy Beaver teaching award (winter term 2012/2013)
    for Non-Traditional Data Management (NoSQL and more)
  • IBM's 2011 Pat Goldberg Memorial best paper award in CS, EE and Math
    (for „Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent“ with P. J. Haas, E. Nijkamp, and Y. Sismanis; KDD 2011)
  • Best paper of NIPS 2011 Biglearn workshop
    (for „Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent“ with P. J. Haas, Y. Sismanis, C. Teflioudi, and F. Makari)
  • Google Focused Research Award 2011: Robust and Scalable Fact Discovery from Web Sources
    (with G. Weikum and M. Theobald)
  • Research Highlight in Communications of the ACM
    (for „Distinct-Value Synopses for Multiset Operations“ with K. Beyer, P.J. Haas, B. Reinwald, Y. Sismanis)
  • The VLDB Journal, Special Issue: Best Papers of VLDB 2006
    (for „A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets“ with W. Lehner and P.J. Haas)

Ph.D. Students

Former students

Teaching

If you are interested in writing a seminar, Bachelor or Master thesis with us, please read the following guidelines.

Upcoming semester (FSS 2019)

Current semester (HWS 2018)

Previous semester (FSS 2018)

Professional Activities

Organizer

Associate editor / area chair

  • 2015-2018: TKDE
  • 2019: VLDB
  • 2018: DASP
  • 2017: EDBT
  • 2016: JISA
  • 2015: VLDB
  • 2014: CIKM

PC member / reviewer (since 2011)

  • 2018: VLDB, EDBT, DEEM
  • 2017: SIGMOD, BTW, DEEM, TALG, VLDB, ECML-PKDD
  • 2016: SIGMOD, ECML-PKDD
  • 2015: BTW, DAMI, GvDB, IS, JODS, JWS, PODS
  • 2014: BDDC, BigData, Buda, DMC, JMLR, SIGMOD, VLDB
  • 2013: AKBC, BigData, BTW, CIKM, DMC, ICALP, TKDE, TML, VLDB
  • 2012: KDD, JMLR, TODS
  • 2011: BTW, IS, VLDB, VLDBJ

Data and Software

  • MinIE: Open information extractor (spiritual successor to ClausIE)
  • DSGDpp: Various parallel algorithms for matrix factorization (including DSGD++)
  • DESQ: Frequent sequence mining with subsequence constraints
  • Rounding rank: algorithms for computing rounding-rank decompositions
  • CORE: Context-aware open relation extraction with factorization machines
  • FINET: Context-aware fine-grained named entity typing
  • Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning
  • ClausIE: Clause-Based Open Information Extraction
  • LEMP: Fast Retrieval of Large Entries in a Matrix Product
  • LASH: Large-Scale Sequence Mining with Hierarchies
  • MG-FSM: Large-Scale Frequent Sequence Mining

Publications

See also Google Scholar and DBLP.

Preprints    Y. Wang, S. Broscheit, R. Gemulla
A Relational Tucker Decomposition for Multi-Relational Link Prediction [arXiv]
2019
  Y. Wang, D. Ruffinelli, R. Gemulla, S. Broscheit, C. Meilicke
On Evaluating Embedding Models for Knowledge Base Completion [arXiv]
2019
2019    A. Renz-Wieland, M. Bertsch, R. Gemulla
Scalable Frequent Sequence Mining With Flexible Subsequence Constraints
To appear in ICDE, 2019
2018    C. Meilicke, M. Fink, Y. Wang, D. Ruffinelli, R. Gemulla, and H. Stuckenschmidt
Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion [pdf, resources]
In ISWC, 2018
  J. Pfeiffer, S. Broscheit, R. Gemulla, M. Göschl
A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval [pdf]
In BioNLP workshop, 2018
  S. Broscheit, R. Gemulla, M. Keuper
Learning Distributional Token Representations from Visual Features [pdf]
In RepL4NLPworkshop, 2018
  Y. Wang, R. Gemulla, H. Li
On Multi-Relational Link Prediction with Bilinear Models [pdf, resources]
In AAAI, 2018
2017    K. Gashteovski, R. Gemulla, L. del Corro
MinIE: Minimizing Facts in Open Information Extraction [pdf, poster, resources]
In EMNLP, pp. 2620-2630, 2017
  C. Teflioudi, R. Gemulla
Exact and Approximate Maximum Inner Product Search with LEMP [pdf (journal version), pdf (author version), resources]
In TODS, 42(1) Art. 5, 2017
2016    S. Neumann, R. Gemulla, P. Miettinen
What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank [pdf, tech report, resources]
In ICDM, pp. 380-389, 2016
  K. Beedkar, R. Gemulla
DESQ: Frequent Sequence Mining with Subsequence Constraints [pdf, tech report, resources]
In ICDM (short paper), pp. 793-798, 2016
2015    L. Del Corro, A. Abujabal, R. Gemulla, G. Weikum
FINET: Context-Aware Fine-Grained Named Entity Typing [pdf, slides, resources]
In EMNLP, pp. 868-878, 2015
  F. Petroni, L. Del Corro, R. Gemulla
CORE: Context-Aware Open Relation Extraction with Factorization Machines [pdf, slides, resources]
In EMNLP, pp. 1763-1773, 2015
  K. Beedkar, K. Berberich, R. Gemulla, I. Miliaraki
Closing the Gap: Sequence Mining at Scale [pdf (journal version), pdf (author version), resources]
In TODS, 40(2) Art. 8, 2015
  C. Teflioudi, R. Gemulla, O. Mykytiuk
LEMP: Fast Retrieval of Large Entries in a Matrix Product [pdf, slides, resources]
In SIGMOD, pp. 107-122, 2015
  K. Beedkar, R. Gemulla
LASH: Large-Scale Sequence Mining with Hierarchies [pdf, slides, source code]
In SIGMOD, pp. 491-503, 2015
  R. Gemulla
A Self-Portrayal of GI Junior Fellow Rainer Gemulla: Data Analysis at Scale [pdf (journal version), pdf (author version)]
it - Information Technology 57(2), pp. 130-132 , 2015
2014    L. Del Corro, R. Gemulla, G. Weikum
Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning [pdf, resources]
In EMNLP, pp. 374-385, 2014
  P. Roy, J. Teubner, R. Gemulla
Low-Latency Handshake Join [pdf]
In PVLDB, 7(9), pp. 709-720, 2014
  L. Qu, Y. Zhang, R. Wang, L. Jiang, R. Gemulla, G. Weikum
Senti-LSSVM: Sentiment-Oriented Multi-Relation Extraction with Latent Structural SVM [pdf]
In TACL, 2, pp. 155-168, 2014
  D. Erdös, R. Gemulla, E. Terzi
Reconstructing Graphs from Neighborhood Data [pdf (author version), pdf (journal version)]
In TKDD, 8(4), 2014
2013    F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis
Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion [pdf (author version), pdf (journal version), source code]
In KAIS (special issue: best papers of ICDM 2012), pp. 1-31, 2013
  F. Makari, R. Gemulla
A Distributed Approximation Algorithm for Mixed Packing-Covering Linear Programs [pdf]
In NIPS 2013 Biglearn workshop (poster), 2013
  F. Makari, B. Awerbuch, R. Gemulla, R. Khandekar, J. Mestre, M. Sozio
A Distributed Algorithm for Large-Scale Generalized Matching [pdf, slides]
The analysis of the number of binary search steps (Lemma 2) contains a bug; see our Biglearn paper for a corrected version.
In PVLDB, 6(9), pp. 613-624, 2013
  I. Miliaraki, K. Berberich, R. Gemulla, S. Zoupanos
Mind the Gap: Large-Scale Frequent Sequence Mining [pdf, slides, resources]
In SIGMOD, pp. 797-808, 2013
  L. Del Corro, R. Gemulla
ClausIE: Clause-Based Open Information Extraction [pdf, slides, resources]
In WWW, pp. 355-366, 2013
  R. Gemulla, P. J. Haas, W. Lehner
Non-Uniformity Issues and Workarounds in Bounded-Size Sampling [pdf (author version), pdf (journal version), source code]
In The VLDB Journal, 22(6), pp. 753-772, 2013
  K. Beedkar, L. Del Corro, R. Gemulla
Fully Parallel Inference in Markov Logic Networks [pdf]
In BTW, pp. 205-224, 2013
2012    D. Erdös, R. Gemulla, E. Terzi
Reconstructing Graphs from Neighborhood Data [pdf, slides]
In ICDM, pp. 231-240, 2012
  C. Teflioudi, F. Makari, R. Gemulla
Distributed Matrix Completion [pdf, slides, source code]
In ICDM, pp. 655-664, 2012
  L. Qu, R. Gemulla, G. Weikum
A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts [pdf]
In EMNLP-CoNLL, pp. 149-159, 2012
2011    R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides, source code]
In NIPS 2011 Biglearn workshop, 2011 (best paper award)
  R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf, slides, source code]
In KDD, pp. 69-77, 2011
  K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.C. Kanne, F. Ozcan, E. Shekita
Jaql: A Scripting Language for Large Scale Semistructured Data Analysis [pdf]
In PVLDB (industrial track), 4(11), pp. 1272-1283, 2011
  M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, J. McPherson
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop [pdf]
In PVLDB, 4(9), pp. 575-585, 2011
  R. Gemulla, P. J. Haas, E. Nijkamp, Y. Sismanis
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent [pdf]
IBM Research Report RJ10481, March 2011 Revised February, 2013
  B. Schlegel, R. Gemulla, W. Lehner
Memory-Efficient Frequent-Itemset Mining [pdf]
In EDBT, pp. 461-472, 2011
2010    S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson.
Ricardo: Integrating R and Hadoop [pdf]
In SIGMOD (industrial track), pp. 987-998, 2010
  B. Schlegel, R. Gemulla, W. Lehner.
Fast Integer Compression using SIMD Instructions [pdf]
In DAMON, pp. 34-40, 2010
2009    K. Beyer, R. Gemulla. P. J. Haas, B. Reinwald, Y. Sismanis.
Distinct-Value Synopses for Multiset Operations [pdf, technical perspective by Surajit Chaudhuri]
In Commun. ACM, 52(10), pp. 87-95, 2009
  B. Schlegel, R. Gemulla, W. Lehner.
k-Ary Search on Modern Processors [pdf, slides]
In DAMON, pp. 52-60, 2009
2008    R. Gemulla.
Sampling Algorithms for Evolving Datasets [pdf, summary, slides]
Ph.D. thesis, Technische Universität Dresden, 2009
URL for citations: nbn-resolving.de/urn:nbn:de:bsz:14-ds-1224861856184-11644
  R. Gemulla, P. Rösch and W. Lehner.
Linked Bernoulli Synopses: Sampling Along Foreign Keys [pdf, slides]
In SSDBM, pp. 6-23, 2008
  R. Gemulla and W. Lehner.
Sampling Time-Based Sliding Windows in Bounded Space [pdf, slides]
As observed by Hu et al., the lower bound of Ω(k log N) stated in Theorem 1 should read Ω(k log(N/k)).
In SIGMOD, pp. 379-392, 2008
  P. Rösch, R. Gemulla and W. Lehner.
Designing Random Sample Synopses with Outliers [pdf, poster]
In ICDE (poster), pp. 1400-1402, 2008
2007    R. Gemulla, W. Lehner and P.J. Haas.
Maintaining Bounded-Size Sample Synopses of Evolving Datasets [pdf]
The resizing algorithm proposed in this article contains a bug; see my Ph.D. thesis or our 2013 VLDB Journal paper for a corrected version.
In The VLDB Journal, Special Issue: Best Papers of VLDB 2006, pp. 173-201, 2007
  K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis and R. Gemulla.
On Synopses for Distinct-Value Estimation Under Multiset Operations [pdf, slides]
In SIGMOD, pp. 199-210, 2007
  R. Gemulla, W. Lehner and P. J. Haas.
Maintaining Bernoulli Samples over Evolving Multisets [pdf, slides]
In PODS, pp. 93-102, 2007
2006    R. Gemulla, W. Lehner and P. J. Haas.
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets [pdf, slides]
In VLDB, pp. 595-606, 2006
  A. Klein, R. Gemulla, P. Rösch and W. Lehner.
Derby/S: A DBMS for Sample-Based Query Answering [pdf, poster1, poster2]
In SIGMOD (demo), pp. 757-759, 2006
  R. Gemulla and W. Lehner.
Deferred Maintenance of Disk-Based Random Samples [pdf, slides]
In EDBT, pp. 423-441, 2006

Kontakt

Prof. Dr. Rainer Gemulla

Prof. Dr. Rainer Gemulla

Lehr­stuhl für Praktische Informatik I: Data Analytics
University of Mannheim
Fakultät für Wirtschafts­informatik und Wirtschafts­mathematik
B 6, 26 – Room B 0.16
68159 Mannheim