Knowledge Graph Completion with Mixed Geometry Tensor Factorization
Viacheslav Yusupov Maxim Rakhuba Evgeny Frolov
HSE University HSE University AIRI HSE University
Abstract
In this paper, we propose a new geometric approach for knowledge graph completion via low rank tensor approximation. We augment a pretrained and well-established Euclidean model based on a Tucker tensor decomposition with a novel hyperbolic interaction term. This correction enables more nuanced capturing of distributional properties in data better aligned with real-world knowledge graphs. By combining two geometries together, our approach improves expressivity of the resulting model achieving new state-of-the-art link prediction accuracy with a significantly lower number of parameters compared to the previous Euclidean and hyperbolic models.
1 INTRODUCTION
Most of the information in the world can be expressed in terms of entities and the relationships between them. This information is effectively represented in the form of a knowledge graph (d’Amato, 2021; Peng et al., 2023), which serves as a repository for storing various forms of relational data with their interconnections. Particular examples include storing user profiles on social networking platforms (Xu et al., 2018), organizing Internet resources and the links between them, constructing knowledge bases that capture user preferences to enhance the functionality of recommender systems (Wang et al., 2019a; Guo et al., 2020). With the recent emergence of large language models (LLM), knowledge graphs have become an essential tool for improving the consistency and trustworthiness of linguistic tasks. Among notable examples of their application are fact checking (Pan et al., 2024), hallucinations mitigation (Agrawal et al., 2023), retrieval-augmented generation (Lewis et al., 2020), and generation of corpus for LLM pretraining (Agarwal et al., 2021). This utilization underscores the versatility and utility of knowledge graphs in managing complex datasets and facilitating the manipulation of interconnected information in various domains and downstream tasks.
On the other hand, knowledge graphs may present an incomplete view of the world. Relations can evolve and change over time, be subject to errors, processing limitations, and gaps in available information. Therefore, the tasks of restoring missing links or predicting the new ones are particularly important to maintain the accuracy and relevance of knowledge graphs. Algorithms designed to address this challenge can significantly reduce the manual labor involved in updating information and ensure consistent representation of the most current and up-to-date knowledge. In this work, we aim to design a new efficient graph completion algorithm that will capture inherent structural properties of the underlying data to improve the link prediction quality.
The key challenge of this task lies in the choice of proper architectural components of the solution suitable for handling a non-trivial nature of data. We start by noticing that knowledge graphs exhibit a strictly non-uniform arrangement of interconnections. Some nodes on the graph are present more frequently than others, which renders a hierarchical structure often reproducing the features of a power law-like distribution. It places knowledge graphs into the so called “complex network” category. In the seminal work on the theoretical foundations of complex networks analysis (Krioukov et al., 2010), it was shown that hyperbolic geometries are especially suitable for modelling this type of structures. In these geometries, as one moves further away from the origin, the distances between points and the area grow exponentially in contrast to the Euclidean geometry with its respective linear and quadratic growth. Due to this effect, hyperbolic embeddings have a higher expressive ability in terms of capturing hierarchical relations within data. Consequent studies have proved the practicality of such geometric approach in various domains such as natural language processing (Nickel and Kiela, 2017), computer vision (Ermolov et al., 2022) and recommender systems (Mirvakhabova et al., 2020).
Despite the promising results, in our experiments, we observe that real-world knowledge graphs may not consistently align with the assumption of strict hierarchical internal structure and may only partially follow a power-law distribution (see Figure 1). More specifically, the characteristics of relations in a knowledge graph or even data preprocessing may render connectivity components consisting of vertices with a large number of links almost uniformly distributed between them. We will call these vertices “active”. As we demonstrate in our work, having a large number of active vertices violates hierarchy and disrupts the total compatibility of hyperbolic geometry with real data (see Figure 2). Therefore, in this paper, we aim to find a balanced solution using different geometry types so that the weakness of each geometry is compensated. We introduce a hybrid mixed geometry model that combines the strengths of both Euclidean and hyperbolic geometry within a single architecture. In this model, the Euclidean component serves the goal of properly extracting information from the active vertices, while the hyperbolic component with embeddings from a hyperbolic space captures additional information from vertices in the graph that are distributed according to a power-law.
Overall, our contributions can be summarized as follows:
-
•
We introduce a new mixed-geometry tensor factorisation (MIG-TF) model that combines Tucker decomposition defined in the Euclidean space with a new low-parametric hyperbolic ternary interaction term.
-
•
We highlight intricacies of applying geometric approach to real-world knowledge graphs and demonstrate the associated with it limitations of using single-geometry modelling.
-
•
The proposed combined approach significantly reduces the number of model parameters compared to state-of-the-art methods. It does so without sacrificing expressive power and achieves more accurate results in most of the common benchmarks.

The rest of the paper is organized as follows. In Section 2, we review most relevant Euclidean and hyperbolic models. We introduce our parameter-efficient hyperbolic interaction model tetrahedron pooling tensor factorization (TPTF) along with the basics of hyperbolic geometry in Section 3.2. Section 3.3 is devoted to mixed geometry modelling MIG-TF, where we present the mixture of the Euclidean TuckER and the hyperbolic TPTF models. Finally, Section 4 contains the results of numerical experiments, where we showcase the performance of our model and also investigate the impact of using a hyperbolic correction. The anonymized code for reproducing our results is publicly available111https://github.com/hse-cs/MIGTF.

2 RELATED WORK
Knowledge graphs represent facts in the triplet form , where and denote a pair of subject and objects entities and denotes the corresponding relation between them. This representation can be naturally expressed in the third-order binary tensor form where a non-zero entry at position encodes an existing triplet of the knowledge graph. To identify connections in knowledge graphs that have been incorrectly specified in the interaction tensor, various tensor factorization-based models are utilized.
For instance, the SimplIE (Kazemi and Poole, 2018) and ComplEX(Trouillon et al., 2016) models employ Canonical decomposition () (Hitchcock, 1927) with embeddings in the real or complex Euclidean space. The PITF (Rendle and Schmidt-Thieme, 2010) model utilizes Pairwise Interaction Tensor Factorisation in the Euclidean space, which splits the standard ternary interaction term of into a linear sum of independent interactions between entity-entity and entity-relation pairs. Some other models use distances between embeddings of entities and relations in the vector space. For instance, TransE (Bordes et al., 2013b) uses Euclidean vector representations of relations, which are used as connection vectors between Euclidean embeddings of entities. The further development of this approach named Quate (Zhang et al., 2019) represents embeddings as rotations in the hypercomplex space.
Various artificial neural networks-based approaches were also proposed to solve the link prediction task. For instance, the models like ConvE and ConvKB (Dettmers et al., 2018; Nguyen et al., 2017) utilize convolutional neural networks to work with knowledge graphs. The R-GCN (Schlichtkrull et al., 2018) model proposes a modification of graph-convolutional neural networks GCN (Kipf and Welling, 2016) to adapt graphs convolutions to relational data. Models, such as GAAT (Wang et al., 2019b) or KBGAT (Nathani et al., 2019), use the attention mechanisms on graphs.
Some models rely on the shared-factors paradigm, which utilizes a special structure of the problem based on the fact that objects and subjects belong to the same space. While making the problem inherently non-linear, this approach significantly reduces the number of learned parameters of models and is better suited for predicting symmetric links (Hospedales, 2019) in knowledge graphs. For instance, the MEKER (Chekalina et al., 2022)) model employs CP decomposition with low-rank shared factors, and RESCAL (Nickel et al., 2011) utilizes DEDICOM (Harshman et al., 1982) factorisation decomposing each tensor slice along the relation axis. The TuckER (Hospedales, 2019) and R-TuckER (Peshekhonov et al., 2024) models employ shared-factors TuckER decomposition (Tucker, 1966) to represent knowledge graph entities and relations and supports the prediction of symmetric as well as asymmetric links in knowledge graphs.
Recent advances in understanding of hyperbolic geometry applications has lead to the development of hyperbolic models for knowledge graph completion as well. According to (Krioukov et al., 2010), the structural properties of hyperbolic spaces make them plausible for analyzing a special family of non-uniformly distributed data, which knowledge graphs also belong to. For instance, the RotH and RefH (Re, 2020) propose specific rotation and reflection transformations in a hyperbolic space to improve the expressive power of the learned embeddings. Even though these model show remarkable improvements on some knowledge graphs, there are still some cases where Euclidean models outperform hyperbolic ones. We attribute this observation to the violation of some basic distributional properties required for full compatibility with hyperbolic geometry. It motivates us to develop a hybrid approach similarly to (Gu et al., 2019; Seth and Sharaff, 2022; Kong, 2019) that takes the best from both geometries and therefore compensates for their shortcomings in analyzing real-world data. Similar idea was utilized in models M2GNN (Wang et al., 2021) generalizing graph neural network for multi-curvature and GIE (Cao et al., 2022) introducing message passing and attention algorithms on hyperbolic and hyperspherical spaces. In comparison to previous methods, our MIG-TF approach uses less complex and more stable operations (Section 3.4). Additionally, it implies learning only low-parametric hyperbolic addition to a pretrained model instead of learning the whole model as in RotH, M2GNN and GIE. Further in the text, we provide the detailed description of our approach.
3 MIXED GEOMETRY TENSOR FACTORIZATION
In this section, we introduce our new parameter-efficient hyperbolic interaction term TPTF, and the new mixed geometry model MIG-TF, where the hyperbolic term is used as a correction to the Euclidean model.
3.1 Hyperbolic geometry and Lorentz Distance
We start with basic notation and concepts from hyperbolic geometry. There are several models of hyperbolic geometry, among which Poincare Ball, Klein, and Lorentz model are frequently used in various machine learning approaches. All hyperbolic models are isomorphic – they describe the same geometry and can be interchangeably utilized by means of the corresponding transformations (Ungar, 2005). In our work, we use the Lorentz model of hyperbolic geometry. In the following, we only introduce the basics of Lorentz geometry and refer the reader to (Beem, 2017) for more details.
Let us define the Lorentz inner product for vectors , which is used for measuring distance between hyperbolic embeddings:
where
The induced norm of a vector in Lorentz geometry is defined as . The corresponding -dimensional space is called Hyperboloid and defined as follows:
The origin vector of the hyperboloid equals to . The inner product of 0 and is, hence, .
The associated geodesic distance is defined as
Similarly to (Xu and Wu, 2020), we introduce the square Lorentz distance between is defined as
(1) |
This distance fulfills all of the Euclidean distance axioms, except for the triangle inequality:
(2) |
Nevertheless, it does not hold in general in the hyperbolic setting. The idea to benefit from this fact appears, for instance, in a two-dimensional hyperbolic model LorentzFM (Xu and Wu, 2020). Inequality violation in this model implies that two entities are far away and, hence, become irrelevant to each other. Generalizing the triangle inequality to the three-dimensional case is not a straightforward task, which we address in the following section.
3.2 Tetrahedron Pooling Tensor Factorization
Solving the knowledge graph link prediction problem requires learning ternary interactions in the form of entity-relation-entity triplets. This can be naturally modelled through tensor representation in the Euclidean space. However, unlike the 2D case, generalizing a representation of such third-order interactions to the hyperbolic space is not straightforward. To the best of our knowledge, there is no unified approach.
We propose to modify (2) to capture ternary interactions. In particular, we utilize the so-called tetrahedron inequality (Itoh et al., 2021): for the points in the Euclidean space, it holds
(3) |
In the hyperbolic context, this inequality does not universally apply. However, as we explore further, the regions where this inequality is violated can be advantageous for our objectives. Consequently, we can naturally introduce the following score function:
(4) | ||||
where are entity embeddings and is the relation embedding. Note that this score function is negative when inequality (3) holds and is positive in the opposite case. Unfortunately, the score function (4) involves arccosh, which is not differentiable everywhere and, hence, may lead to difficulties for gradient-based optimization methods.
Similarly to LorentzFM, (4) is replaced with the following “smoothed” score function that mimics the desired behaviour:
(5) | ||||
see Figure 3 for comparison of the two score functions.
Note also that we may rewrite as:

The score function yields a positive value when the points corresponding to linked entities are in close proximity and a negative value when they are distant (see Appendix). Additionally, we note that the terms and , which account for interactions between relations and entities, effectively draw entities and closer to the relation . Consequently, the trained model exhibits a distinct behavior: entities tend to cluster around well-separated relations, as illustrated in Figure 4.
The model’s predictions for entity and relation can be obtained with :
where is the matrix of all entity embeddings and – the number of entities. In TPTF model, we use the binary cross-entropy (BCE) loss function:
where – binary label vector of the ground truth. Similarly to (Law et al., 2019), we optimize , and embeddings in the Euclidean space with the AdamW (Loshchilov and Hutter, 2019) optimizer and then map them to the hyperboloid as follows: maps to .
3.3 Mixed Geometry Tensor Factorisation
To improve the link prediction in knowledge graphs we introduce a shared-factor mixed geometry model combining Euclidean and hyperbolic models (see Figure 5). Due to the complexity of the knowledge graph, some vertices are better described with hyperbolic embeddings whilst others are with the Euclidean ones. Therefore with a combination of hyperbolic and Euclidean embeddings, significantly more information on the structure of the knowledge graph is saved with Euclidean and hyperbolic vector representations together. For the hyperbolic model, we used TPTF model (5), and for Euclidean one we used TuckER (Hospedales, 2019). The score function of the Euclidean model is defined as:
(6) |
where – embeddings of entities, – embedding of relations and – tensor of learnable parameters. Here, and are the ranks of the Tucker decomposition.
The score function of our mixed-geometry model MIG-TF is the sum of score functions of the Lorentzian and Euclidean models:
(7) |
where defined in (6) is an Euclidean interaction term, is a hyperbolic interaction term (5). The vectors and Euclidean entity and relation embeddings, and – hyperbolic embeddings, and are -th rows of matrices and of Euclidean and hyperbolic entity embeddings, respectively, and – 3-D tensor of learnable parameters. In MIG-TF model, we utilize pretrained TuckER model (6) and optimize the hyperbolic term parameters (5) of the score function (7) to minimize the BCE loss:
where is an -th element of interaction vector for entity and relation with embeddings and , respectively. We optimize hyperbolic embeddings as in Section 3.2. To prevent model overfitting, we employ the dropout layer (Salakhutdinov, 2012) in both Euclidean and hyperbolic interaction terms.
3.4 Mixed-geometry models comparison
The next subsection is devoted to comparing our approach with other mixed-geometry methods: M2GNN (Wang et al., 2021) and GIE (Cao et al., 2022).
The translational distance (Zhang et al., 2020) model M2GNN generalizes graph neural networks for multi-relational setting on knowledge graphs and modified it for multi-curvature setting capturing graph structures more effectively, especially hierarchical and cyclic ones.
The model GIE introduces the message passing and attention mechanism for the knowledge graph completion by utilizing mapping on hyperbolic and hyperspherical spaces.
Both of the existing approaches suffers from computational complexity and numerical instability due to utilizing exponential and logarithmic mapping, which are even more numerically unstable than the standard Lorentz model operations (Peng et al., 2021). In contrast, MIG-TF proposes using Square Lorentz distance, a convenient replacement for Geodesic distance. Square Lorentz distance reduces hyperbolic operations, such as exponential and logarithmic mappings and arccosh, to mere inner product computations, decreases complexity of operations and increases numerical stability.
Additionally, only the low-parametric addition TPTF is learned in our MIG-TF method instead of learning the whole models in M2GNN and GIE.
Unlike (Wang et al., 2021) and (Cao et al., 2022), we do not utilize spherical geometry and still our mixed geometry approach achieves the best metrics in the majority of cases. Nevertheless, our framework is sufficiently flexible to accommodate the incorporation of spherical geometry, presenting a promising direction for future research.
4 EXPERIMENTS
In experiments, we shows that mixed geometry model, combining Euclidean and hyperbolic embeddings, with low-dimensional hyperbolic ones outperforms Euclidean, Complex and hyperbolic models with significantly higher dimensions of embeddings. Additionally, we have studied the dependence of curvature of the space of hyperbolic embeddings and expressive ability of hyperbolic and mixed geometry models. Moreover, we have shown that hyperbolic addition to TuckER (Hospedales, 2019) model improves the quality of link prediction between hierarchically distributed entities.
4.1 Datasets
We evaluate our mixed geometry model for link prediction task on three standard knowledge graphs: WN18RR (Bordes et al., 2013b; Dettmers et al., 2018), FB15k-237 (Bordes et al., 2013b; Toutanova et al., 2015) and YAGO3-10 (Mahdisoltani et al., 2013). WN18RR is a subset of WordNet (Miller, 1995) – knowledge base containing lexical relations between words. FB15k-237 is a subset of FreeBase (Bollacker et al., 2008) – collaborative knowledge base containing general world knowledge, such as nationalities of famous people, locations of some buildings or genre of music. YAGO3-10 is a subset of YAGO3 (Suchanek et al., 2007) which mostly contains descriptions o people. The statistics of these knowledge graphs are shown in following table:
For each knowledge graph we follow standard data augmentation by adding inverse relations (Bordes et al., 2013a).
4.2 Baselines
We compare our models with baselines: models with euclidean embeddings, with complex embeddings and with hyperbolic embeddings:
- •
- •
- •
-
•
Hyperbolic models: RotH (Hyperbolic Rotations) and RefH (Hyperbolic Reflections) – attention models using rotations and reflections in hyperbolic space and utilizing hyperbolic embeddings from Poincare Ball (Re, 2020).
-
•
Mixed-geometry models: M2GNN (Mixed-Curvature Multi-Relational Graph Neural Network) – model using multi-curvature embeddings for translational distance and multi-relational graph neural networks. GIE (Geometry Interaction Knowledge Graph Embeddings) – model utilizing attention layers and embeddings from Euclidean, hyperbolic and spherical spaces.
To demonstrate the benefits of our mixed geometry approach to link prediction task we compare baselines with our hyperbolic model TPTF (5) and our mixed geometry models MIG-TF (7) and MIG-TFQR. Here, MIG-TFQR is mixed geometry model with orthogonalization heuristic. For more details see Appendix. Additionaly, to better understand the role of hyperbolic interaction term we compare TPTF and MIG-TF models with different fixed hyperbolicities.
4.3 Evaluation metrics
At the test mode, we rank correct entities against all entities using score fictions (5) for TPTF and (7) for MIG-TF. We compare models using two rank-base metrics: (Mean Reciprocal Rank) and (Hit Rates). measures mean inverse rank of correct entity, whilst measures the share of correct triples among recommendations (Vasilev, 2021). We follow the standard evaluation protocol in the filtered setting: all true triples in the KG are filtered out during evaluation (Lacroix et al., 2018).
5 RESULTS
Our mixed geometry models achieved new state-of-the-art results on FB15k-237, YAGO3-10 and WN18RR in the majority of metrics. Moreover, our models have significantly less parameters then the best models on WN18RR and YAGO3-10. For instance, our models have times less parameters than RotH or RefH. Additionally, as we expected, Lorentz addition to the TuckER model significantly increases and metrics, with low - Lorentzian embeddings. We also show, that MIG-TF model is better that TuckER model in prediction of various relations, especially rare ones Figure 6.
Models | |||
---|---|---|---|
Our models | |||

Models | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.459 | 0.347 | 0.186 | 0.284 | 0.403 | 0.384 | 0.392 | 0.296 | 0.610 | 0.494 | 0.350 | 0.453 | |
0.419 | 0.263 | 0.155 | 0.241 | 0.490 | 0.440 | 0.390 | 0.430 | 0.540 | 0.380 | 0.240 | 0.340 | |
0.544 | 0.394 | 0.266 | 0.358 | 0.526 | 0.482 | 0.443 | 0.470 | 0.681 | 0.579 | 0.466 | 0.544 | |
0.505 | 0.387 | 0.237 | 0.326 | 0.546 | 0.495 | 0.446 | 0.479 | – | – | – | – | |
0.547 | 0.392 | 0.264 | 0.357 | 0.572 | 0.495 | 0.435 | 0.480 | 0.701 | 0.609 | 0.498 | 0.569 | |
0.533 | 0.375 | 0.245 | 0.338 | 0.571 | 0.492 | 0.428 | 0.476 | 0.670 | 0.550 | 0.402 | 0.495 | |
0.535 | 0.380 | 0.246 | 0.344 | 0.586 | 0.514 | 0.449 | 0.496 | 0.706 | 0.612 | 0.495 | 0.573 | |
0.536 | 0.383 | 0.252 | 0.346 | 0.568 | 0.485 | 0.404 | 0.461 | 0.711 | 0.619 | 0.502 | 0.576 | |
0.565 | 0.398 | 0.275 | 0.362 | 0.572 | 0.498 | 0.444 | 0.485 | 0.702 | 0.605 | 0.478 | 0.573 | |
0.552 | 0.401 | 0.271 | 0.362 | 0.575 | 0.505 | 0.452 | 0.491 | 0.709 | 0.618 | 0.505 | 0.579 | |
Our models | ||||||||||||
0.504 | 0.321 | 0.186 | 0.238 | 0.489 | 0.378 | 0.252 | 0.314 | 0.665 | 0.521 | 0.383 | 0.481 | |
0.554 | 0.402 | 0.277 | 0.367 | 0.561 | 0.507 | 0.450 | 0.496 | 0.717 | 0.621 | 0.502 | 0.579 | |
0.553 | 0.402 | 0.277 | 0.366 | 0.574 | 0.514 | 0.452 | 0.499 | 0.720 | 0.622 | 0.502 | 0.580 |
5.1 Hyperbolicity
To study the impact of hyperbolic interaction term in hybrid model we trained TPTF and MIG-TF models with various fixed curvatures. Models with low curvature are similar to euclidean ones, therefore they worse predict links between hierarchically distributed entities of knowledge graph. In contrast, models with curvature show the highest metrics both in mixed geometry and hyperbolic cases Figure 7.
Euclidean component in MIG-TF model embeds features of knowledge graph structure, whilst hyperbolic ones are not fully represented with embeddings. Therefore, the hyperbolic component of mixed geometry model represents only the hyperbolic features of knowledge graph that left after Euclidean component application (Ni et al., 2015; Gu et al., 2019). In contrast, RotH and RefH architectures (Re, 2020), are based on rotations and reflections in hyperbolic and the Euclidean space, respectively. In contrast, our model utilizes the hyperbolic triangle inequality, as well as uses embeddings in both Euclidean and hyperbolic spaces. Additionally, our model has less stringent requirements on the structure of the input data.
5.2 Link prediction quality
As demonstrated in Table 3, our mixed geometry approach with mixed geometries outperforms all other competitors on the two out of three datasets. It slightly underperforms its main competitor (the hyperbolic RotH model) on the dataset on one metric. We attribute this result to the fact that the structure of this dataset is the most plausible for analysis with hyperbolic geometry. Indeed, by inspecting Figure 1, we see that this dataset is the most compatible with the ideal power law distribution, it contains the minimal amount of active nodes and exhibits almost linear dependency over the majority of entities. This result aligns well with our initial design that is targeted towards mixed hierarchical and non-hierarchical structures and therefore is may not take full advantage of more homogeneous data organization. A possible remedy to this effect would be changing Euclidean component of MIG-TF model to more powerful model, such as Euclidean RotE (Re, 2020) or complex ComplEx-N3 (Chen et al., 2021). We leave investigation of this approach for future work.
On the other two datasets our approach dominates its competitors, which aligns well with our assumptions on the structural complexities of these datasets that prevent purely hyperbolic or purely Euclidean model to learn the best representations. Our approach not only outperforms other models but also requires a lesser amount of parameters, which indicates the higher representation capacity of the mixed geometry embeddings.

By consulting to Figure 1, one can see that the most prominent results are obtained on the dataset exhibiting the highest degree of violation of the power law distribution. Remarkably, the otherwise top-performing hyperbolic models underperform the best Euclidean model on this dataset. This signifies the detrimental effect of the structural non-homogeneity on single-geometry models and highlight the advantages of our mixed geometry approach.
6 CONCLUSION
The use of the proposed mixed geometry model MIG-TF combining the Euclidean and hyperbolic embedding techniques can significantly improve the quality of link predictions in knowledge graphs with non-homogeneous structure. It provides a significant advantage over state-of-the-art single-geometry models, such as the Euclidean TuckER and the hyperbolic RotH and RefH. This effect can be attributed to the fact that both Euclidean and Hyperbolic embeddings capture different aspects of relationships between entities in a knowledge graph. By combining these two approaches, mixed geometry models can capture a broader range of relationships, leading to more accurate predictions. We also show that hybrid models are particularly well-suited for addressing the challenge of link prediction in knowledge graphs under limited computing resources. Due to the geometry-driven improved representation capacity, our proposed approach leads to fewer parameters compared to complex or attention-based models such as RotatE and RotH. Additionally, our model requires training only the minority of its parameter in comparison to the other mixed geometry models such as GIE and M2GNN. This facts make MIG-TF approach more efficient and scalable.
Acknowledgments
The work of Viacheslav Yusupov and Maxim Rakhuba was prepared within the framework of the HSE University Basic Research Program. The calculations were performed in part through the computational resources of HPC facilities at HSE University (Kostenetskiy et al., 2021).
References
- Agarwal et al. (2021) Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3554–3565. Association for Computational Linguistics, June 2021.
- Agrawal et al. (2023) Garima Agrawal, Tharindu Kumarage, Zeyad Alghami, and Huan Liu. Can knowledge graphs reduce hallucinations in llms?: A survey. arXiv preprint arXiv:2311.07914, 2023.
- Beem (2017) John K Beem. Global lorentzian geometry. Routledge, 2017.
- Bollacker et al. (2008) Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008.
- Bordes et al. (2013a) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26, 2013a.
- Bordes et al. (2013b) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26, 2013b.
- Cao et al. (2022) Zongsheng Cao, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, and Qingming Huang. Geometry interaction knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 5521–5529, 2022.
- Chekalina et al. (2022) Viktoriia Chekalina, Anton Razzhigaev, Albert Sayapin, Evgeny Frolov, and Alexander Panchenko. MEKER: Memory efficient knowledge embedding representation for link prediction and question answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 355–365, Dublin, Ireland, May 2022. Association for Computational Linguistics.
- Chen et al. (2021) Yihong Chen, Pasquale Minervini, Sebastian Riedel, and Pontus Stenetorp. Relation prediction as an auxiliary training objective for improving multi-relational graph representations. In 3rd Conference on Automated Knowledge Base Construction, 2021.
- d’Amato (2021) Aidan Hogan Eva Blomqvist Michael Cochez Claudia d’Amato. Knowledge graphs. ACM, 2021.
- Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Ermolov et al. (2022) Aleksandr Ermolov, Leyla Mirvakhabova, Valentin Khrulkov, Nicu Sebe, and Ivan Oseledets. Hyperbolic vision transformers: Combining improvements in metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7409–7419, 2022.
- Gu et al. (2019) Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. Learning mixed-curvature representations in products of model spaces. In International conference on learning representations, volume 5, 2019.
- Guo et al. (2020) Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering, 34(8):3549–3568, 2020.
- Harshman et al. (1982) Richard A Harshman, Paul E Green, Yoram Wind, and Margaret E Lundy. A model for the analysis of asymmetric data in marketing research. Marketing Science, 1(2):205–242, 1982.
- Hitchcock (1927) Frank L Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1-4):164–189, 1927.
- Hospedales (2019) Ivana Balažević Carl Allen Timothy M. Hospedales. Tucker: Tensor factorization for knowledge graph completion. ACL, 2019.
- Itoh et al. (2021) Jin-ichi Itoh, Joël Rouyer, and Costin Vilcu. Some inequalities for tetrahedra. Beiträge zur Algebra und Geometrie/Contributions to Algebra and Geometry, 62:705–715, 2021.
- Kazemi and Poole (2018) Seyed Mehran Kazemi and David Poole. Simple embedding for link prediction in knowledge graphs. Advances in neural information processing systems, 31, 2018.
- Kipf and Welling (2016) Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Kong (2019) Rui Chen Qingyi Hua Yan-Shuo Chang Bo Wang Lei Zhang Xiangjie Kong. A survey of collaborative filtering-based recommender systems: From traditional methods to hybrid methods based on social networks. IEEE, 2019.
- Kostenetskiy et al. (2021) PS Kostenetskiy, RA Chulkevich, and VI Kozyrev. Hpc resources of the higher school of economics. In Journal of Physics: Conference Series, volume 1740, page 012050. IOP Publishing, 2021.
- Krioukov et al. (2010) Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguná. Hyperbolic geometry of complex networks. Physical Review E, 82(3):036106, 2010.
- Lacroix et al. (2018) Timothée Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical tensor decomposition for knowledge base completion. In International Conference on Machine Learning, pages 2863–2872. PMLR, 2018.
- Law et al. (2019) Marc Law, Renjie Liao, Jake Snell, and Richard Zemel. Lorentzian distance learning for hyperbolic representations. In International Conference on Machine Learning, pages 3672–3681. PMLR, 2019.
- Lewis et al. (2020) Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Loshchilov and Hutter (2019) Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- Mahdisoltani et al. (2013) Farzaneh Mahdisoltani, Joanna Biega, and Fabian M Suchanek. Yago3: A knowledge base from multilingual wikipedias. In CIDR, 2013.
- Miller (1995) George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
- Mirvakhabova et al. (2020) Leyla Mirvakhabova, Evgeny Frolov, Valentin Khrulkov, Ivan Oseledets, and Alexander Tuzhilin. Performance of hyperbolic geometry models on top-n recommendation tasks. In Proceedings of the 14th ACM Conference on Recommender Systems, pages 527–532, 2020.
- Nathani et al. (2019) Deepak Nathani, Jatin Chauhan, Charu Sharma, and Manohar Kaul. Learning attention-based embeddings for relation prediction in knowledge graphs. In Annual Meeting of the Association for Computational Linguistics, 2019.
- Nguyen et al. (2017) Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Phung. A novel embedding model for knowledge base completion based on convolutional neural network. arXiv preprint arXiv:1712.02121, 2017.
- Ni et al. (2015) Chien-Chun Ni, Yu-Yao Lin, Jie Gao, Xianfeng David Gu, and Emil Saucan. Ricci curvature of the internet topology. In 2015 IEEE conference on computer communications (INFOCOM), pages 2758–2766. IEEE, 2015.
- Nickel et al. (2011) Maximilian Nickel, Volker Tresp, Hans-Peter Kriegel, et al. A three-way model for collective learning on multi-relational data. In Icml, volume 11, pages 3104482–3104584, 2011.
- Nickel and Kiela (2017) Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
- Pan et al. (2024) Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 2024.
- Peng et al. (2023) Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, and Francesco Osborne. Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review, 56(11):13071–13102, 2023.
- Peng et al. (2021) Wei Peng, Tuomas Varanka, Abdelrahman Mostafa, Henglin Shi, and Guoying Zhao. Hyperbolic deep neural networks: A survey. IEEE Transactions on pattern analysis and machine intelligence, 44(12):10023–10044, 2021.
- Peshekhonov et al. (2024) Ivan Peshekhonov, Aleksey Arzhantsev, and Maxim Rakhuba. Training a tucker model with shared factors: a riemannian optimization approach. In International Conference on Artificial Intelligence and Statistics, pages 3304–3312. PMLR, 2024.
- Re (2020) Ines Chami Adva Wolf Da-Cheng Juan Frederic Sala Sujith Ravi Christopher Re. Low-dimensional hyperbolic knowledge graph embeddings. ACL, 2020.
- Rendle and Schmidt-Thieme (2010) Steffen Rendle and Lars Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the third ACM international conference on Web search and data mining, pages 81–90, 2010.
- Salakhutdinov (2012) Geoffrey E. Hinton Nitish Srivastava Alex Krizhevsky Ilya Sutskever Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, 2012.
- Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15, pages 593–607. Springer, 2018.
- Seth and Sharaff (2022) Rakhi Seth and Aakanksha Sharaff. A comparative overview of hybrid recommender systems: Review, challenges, and prospects. Data mining and machine learning applications, pages 57–98, 2022.
- Suchanek et al. (2007) Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pages 697–706, 2007.
- Tang (2019) Zhiqing Sun Zhi-Hong Deng Jian-Yun Nie Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. ICLR, 2019.
- Toutanova et al. (2015) Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1499–1509, 2015.
- Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In International conference on machine learning, pages 2071–2080. PMLR, 2016.
- Tucker (1966) Ledyard R Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279–311, 1966.
- Ungar (2005) Abraham A Ungar. Analytic hyperbolic geometry: Mathematical foundations and applications. World Scientific, 2005.
- Vasilev (2021) Yan-Martin Tamm Rinchin Damdinov Alexey Vasilev. Qality metrics in recommender systems: Do we calculate metrics consistently? ACM, 2021.
- Wang et al. (2019a) Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. Knowledge graph convolutional networks for recommender systems. In The world wide web conference, pages 3307–3313, 2019a.
- Wang et al. (2019b) Rui Wang, Bicheng Li, Shengwei Hu, Wenqian Du, and Min Zhang. Knowledge graph embedding via graph attenuated attention networks. IEEE access, 8:5212–5224, 2019b.
- Wang et al. (2021) Shen Wang, Xiaokai Wei, Cicero Nogueira Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew Arnold, Bing Xiang, Philip S Yu, and Isabel F Cruz. Mixed-curvature multi-relational graph neural network for knowledge graph completion. In Proceedings of the web conference 2021, pages 1761–1771, 2021.
- Xu and Wu (2020) Canran Xu and Ming Wu. Learning feature interactions with lorentzian factorization machine. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6470–6477, 2020.
- Xu et al. (2018) Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pages 5453–5462. PMLR, 2018.
- Yang et al. (2014) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. CoRR, 2014.
- Zhang et al. (2019) Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. Quaternion knowledge graph embeddings. Advances in neural information processing systems, 32, 2019.
- Zhang et al. (2020) Siheng Zhang, Zhengya Sun, and Wensheng Zhang. Improve the translational distance models for knowledge graph embedding. Journal of Intelligent Information Systems, 55:445–467, 2020.
7 SQUARE LORENTZ DISTANCE VS. GEODESIC DISTANCE
In this section, we illustrate that the score function based on the utilized Lorentz distance behaves similarly to the geodesic distance for different . In particular, in Figure 8, we present the two-dimensional landscapes of the TPTF score function for .

8 COMPLEXITY
In this section, we estimate the complexity of our MIG-TF model.
Let us introduce the operation for a tensor and matrix
Let and be respectively the numbers of entities and relations in a knowledge graph, – a batch size, and – embedding sizes of the TuckER model, – the size of hyperbolic embeddings. Let the core of the Tucker decomposition be , the matrices of embeddings – the entities embedding matrix , the relation embedding matrix and another entity embedding matrix . Similarly, the matrices in the hyperbolic model are , and .
Firstly, we find the complexity of the TuckER model. To compute the TuckER score function, we firstly find . It takes operations. Then we compute with operations. Finally, the diagonal slice of is taken and multiplied with with operations. Overall, the total number of operations is:
Secondly, we find the complexity of the TPTF model. First of all, we compute Lorentz zero element of embeddings as
which is operations. Than we compute Lorentz inner products and it takes . Then we compute the sum of all inner products using operations. Finally, we compute the normalization with operations and make elementwise division with the complexity . Overall, the TPTF complexity is:
The complexity of MIG-TF iteration is a sum of TuckER and TPTF complexities and is equal to:
In terms of time and parameter complexity, our MIG-TF model adds only overhead on training time and overhead on inference time in comparison to TuckER.
9 MIXED GEOMETRY TENSOR FACTORIZATION MODIFICATIONS
In this section, we describe in more detail additional techniques that we use for our MIG-TF model.
9.1 Riemannian MIG-TF
To improve the quality of the TuckER (Hospedales, 2019) model we use the so-called RTuckER model (Peshekhonov et al., 2024). This model has the same architecture as TuckER, but was trained with the methods of Riemannian optimisation. This Riemannian model significant outperforms TuckER on some knowledge graphs. For more details we refer to (Peshekhonov et al., 2024). We utilize RTuckER as a Euclidean term of our MIG-TF model only on WN18RR knowledge graph on which the Riemannian model significantly outperforms TuckER.
9.2 MIG-TF with QR decomposition

As was shown in Figure 9, most of the relations in knowledge graphs have no correlations, therefore maintaining the orthogonality of relation embeddings creates better vector representation. For this purpose, on each iteration we update the hyperbolic relation embedding matrix with the help of the QR decomposition as follows:
(8) |
Then we compute the score function using instead of :
In these formulas, is the core of Tucker decomposition, and are the Euclidean and hyperbolic entity embeddings, and are respectively the -th rows of the Euclidean and of the hyperbolic entity embedding matrices. The vector is the Euclidean relation embedding and is a row of from (8). We call this modified MIG-TF as MIG-TFQR.
The MIG-TF model with QR decomposition significantly outperform MIG-TF on knowledge graphs WN18RR and YAGO3-10 where the condition of non correlation of relations is fulfilled almost everywhere. However, MIG-TFQR slightly loses to MIG-TF on FB15k-237 graph, where the condition of non correlation is usually violated (see Figure 9).
10 ROBUSTNESS
To analyze our models’ robustness, we add random links in the training data knowledge graph. If is the number of links in train data, then we add random links to train data. We train the mixed geometry model on poisoned training data and test it on the test data, see Figure 10. As we see, even with high levels of noise (), both MIG-TF and models do not significantly degrade.

11 THE IMPACT OF EUCLIDEAN AND HYPERBOLIC TERMS
In this section, we analyze impact of different terms in the MIG-TF score function. We represent the score function as:
where and are respectively Euclidean and hyperbolic components of the score function and is a parameter. By changing , we measure the effect of the score function components on the HR@10 metric on WN18RR knowledge graph, see Figure 11. It is interesting to note that the best quality is achieved with the equal effect of the Euclidean and hyperbolic terms. This justifies our choice of the score function.

12 OPTIMAL CURVATURE
This section demonstrates dependence of link prediction quality on WN18RR knowledge graph on the curvature in hyperbolic parts of TPTF and MIG-TF models, see Figure 12. Note that the optimal curvature for hyperbolic and mixed geometry models are different from each other.

13 HYPERPARAMETERS
In this section, we present the hyperparameters of both the Euclidean term (Table 4) and the hyperbolic term (Table 6) of our MIG-TF model. In this tables, are the sizes of entity and relation embeddings, is a learning rate, – distributions of initial weights of entities and relations embeddings in the TPTF component of the hybrid model and is a curvature of hyperboloid in the Lorentz term of MIG-TF model. All the experiments we performed were done on one GPU NVIDIA V100.
200 | 200 | 200 | |
30 | 200 | 30 | |
0.01 | 0.001 | 0.003 | |
500 | 500 | 500 | |
0.2 | 0.3 | 0.2 | |
0.2 | 0.4 | 0.2 | |
0.3 | 0.5 | 0.2 |
50 | 50 | 50 | |
0.003 | 0.002 | 0.005 | |
1.3 | 1 | 1.1 | |
250 | 150 | 250 | |
0.005 | 0.001 | 0.001 | |
0.005 | 0.001 | 0.001 | |
0.2 | 0.3 | 0.2 | |
0.2 | 0.3 | 0.2 |
50 | 50 | 50 | |
0.003 | 0.001 | 0.005 | |
1.5 | 1 | 1.1 | |
250 | 100 | 150 | |
0.005 | 0.001 | 0.001 | |
0.005 | 0.001 | 0.001 | |
0.2 | 0.3 | 0.2 | |
0.2 | 0.3 | 0.2 |