The paper “Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques” by Adrian Kochsiek and Rainer Gemulla has been accepted at the 2022 Proceedings of the VLDB Endowment (PVLDB).
Knowledge graph embedding (KGE) models represent the entities and relations of a knowledge graph (KG) using dense continuous representations called embeddings. KGE methods have recently gained traction for tasks such as knowledge graph completion and reasoning as well as to provide suitable entity representations for downstream learning tasks. While a large part of the available literature focuses on small KGs, a number of frameworks that are able to train KGE models for large-scale KGs by parallelization across multiple GPUs or machines have recently been proposed. So far, the benefits and drawbacks of the various parallelization techniques have not been studied comprehensively. In this paper, we report on an experimental study in which we presented, re-implemented in a common computational framework, investigated, and improved the available techniques. We found that the evaluation methodologies used in prior work are often not comparable and can be misleading, and that most of currently implemented training methods tend to have a negative impact on embedding quality. We propose a simple but effective variation of the stratification technique used by PyTorch BigGraph for mitigation. Moreover, basic random partitioning can be an effective or even the best-performing choice when combined with suitable sampling techniques. Ultimately, we found that efficient and effective parallel training of large-scale KGE models is indeed achievable but requires a careful choice of techniques.