Knowledge Graph Completion with Mixed Geometry Tensor Factorization


 


Viacheslav Yusupov                        Maxim Rakhuba                        Evgeny Frolov

HSE University                        HSE University                        AIRI HSE University

Abstract

In this paper, we propose a new geometric approach for knowledge graph completion via low rank tensor approximation. We augment a pretrained and well-established Euclidean model based on a Tucker tensor decomposition with a novel hyperbolic interaction term. This correction enables more nuanced capturing of distributional properties in data better aligned with real-world knowledge graphs. By combining two geometries together, our approach improves expressivity of the resulting model achieving new state-of-the-art link prediction accuracy with a significantly lower number of parameters compared to the previous Euclidean and hyperbolic models.

1 INTRODUCTION

Most of the information in the world can be expressed in terms of entities and the relationships between them. This information is effectively represented in the form of a knowledge graph (d’Amato, 2021; Peng et al., 2023), which serves as a repository for storing various forms of relational data with their interconnections. Particular examples include storing user profiles on social networking platforms (Xu et al., 2018), organizing Internet resources and the links between them, constructing knowledge bases that capture user preferences to enhance the functionality of recommender systems (Wang et al., 2019a; Guo et al., 2020). With the recent emergence of large language models (LLM), knowledge graphs have become an essential tool for improving the consistency and trustworthiness of linguistic tasks. Among notable examples of their application are fact checking (Pan et al., 2024), hallucinations mitigation (Agrawal et al., 2023), retrieval-augmented generation (Lewis et al., 2020), and generation of corpus for LLM pretraining (Agarwal et al., 2021). This utilization underscores the versatility and utility of knowledge graphs in managing complex datasets and facilitating the manipulation of interconnected information in various domains and downstream tasks.

On the other hand, knowledge graphs may present an incomplete view of the world. Relations can evolve and change over time, be subject to errors, processing limitations, and gaps in available information. Therefore, the tasks of restoring missing links or predicting the new ones are particularly important to maintain the accuracy and relevance of knowledge graphs. Algorithms designed to address this challenge can significantly reduce the manual labor involved in updating information and ensure consistent representation of the most current and up-to-date knowledge. In this work, we aim to design a new efficient graph completion algorithm that will capture inherent structural properties of the underlying data to improve the link prediction quality.

The key challenge of this task lies in the choice of proper architectural components of the solution suitable for handling a non-trivial nature of data. We start by noticing that knowledge graphs exhibit a strictly non-uniform arrangement of interconnections. Some nodes on the graph are present more frequently than others, which renders a hierarchical structure often reproducing the features of a power law-like distribution. It places knowledge graphs into the so called “complex network” category. In the seminal work on the theoretical foundations of complex networks analysis (Krioukov et al., 2010), it was shown that hyperbolic geometries are especially suitable for modelling this type of structures. In these geometries, as one moves further away from the origin, the distances between points and the area grow exponentially in contrast to the Euclidean geometry with its respective linear and quadratic growth. Due to this effect, hyperbolic embeddings have a higher expressive ability in terms of capturing hierarchical relations within data. Consequent studies have proved the practicality of such geometric approach in various domains such as natural language processing (Nickel and Kiela, 2017), computer vision (Ermolov et al., 2022) and recommender systems (Mirvakhabova et al., 2020).

Despite the promising results, in our experiments, we observe that real-world knowledge graphs may not consistently align with the assumption of strict hierarchical internal structure and may only partially follow a power-law distribution (see Figure  1). More specifically, the characteristics of relations in a knowledge graph or even data preprocessing may render connectivity components consisting of vertices with a large number of links almost uniformly distributed between them. We will call these vertices “active”. As we demonstrate in our work, having a large number of active vertices violates hierarchy and disrupts the total compatibility of hyperbolic geometry with real data (see Figure  2). Therefore, in this paper, we aim to find a balanced solution using different geometry types so that the weakness of each geometry is compensated. We introduce a hybrid mixed geometry model that combines the strengths of both Euclidean and hyperbolic geometry within a single architecture. In this model, the Euclidean component serves the goal of properly extracting information from the active vertices, while the hyperbolic component with embeddings from a hyperbolic space captures additional information from vertices in the graph that are distributed according to a power-law.

Overall, our contributions can be summarized as follows:

  • We introduce a new mixed-geometry tensor factorisation (MIG-TF) model that combines Tucker decomposition defined in the Euclidean space with a new low-parametric hyperbolic ternary interaction term.

  • We highlight intricacies of applying geometric approach to real-world knowledge graphs and demonstrate the associated with it limitations of using single-geometry modelling.

  • The proposed combined approach significantly reduces the number of model parameters compared to state-of-the-art methods. It does so without sacrificing expressive power and achieves more accurate results in most of the common benchmarks.

Refer to caption
Figure 1: Links distribution on three benchmark knowledge graphs considered in this work. The FB15k-237 dataset violates the power law the most, which is indicated by the longest platue in the leftmost part of the curve. Correspondingly, the top-performing model on this dataset among previous state-of-the-art turns out to be Euclidean. In contrast, our MIG-TF approach outperforms both Euclidean and hyperbolic models, see Table 3.

The rest of the paper is organized as follows. In Section 2, we review most relevant Euclidean and hyperbolic models. We introduce our parameter-efficient hyperbolic interaction model tetrahedron pooling tensor factorization (TPTF) along with the basics of hyperbolic geometry in Section 3.2. Section 3.3 is devoted to mixed geometry modelling MIG-TF, where we present the mixture of the Euclidean TuckER and the hyperbolic TPTF models. Finally, Section  4 contains the results of numerical experiments, where we showcase the performance of our model and also investigate the impact of using a hyperbolic correction. The anonymized code for reproducing our results is publicly available111https://github.com/hse-cs/MIGTF.

Refer to caption
Figure 2: The graphs at the top row demonstrate the top 20 vertices in terms of their number of connections in the FB15k-237 knowledge graph. The pink marks indicate vertices that were predicted in more than 50%percent5050\%50 % of cases by both the Euclidean (left), hyperbolic (center) and mixed geometry (right) models. The bar charts illustrate the hit rate (HR@10) of the predictions for links between vertices from each group. Vertices are arranged in descending order based on the number of incoming links. As can be observed, due to the high number of relations among active vertices, the hyperbolic model (TPTF ) significantly underperforms in predicting relations between active vertices compared to the Euclidean model(TuckER). However, the combination of Euclidean and hyperbolic models outperforms both of them.

2 RELATED WORK

Knowledge graphs represent facts in the triplet form (es,r,eo)subscript𝑒𝑠𝑟subscript𝑒𝑜(e_{s},r,e_{o})( italic_e start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_r , italic_e start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ), where essubscript𝑒𝑠e_{s}italic_e start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and eosubscript𝑒𝑜e_{o}italic_e start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT denote a pair of subject and objects entities and r𝑟ritalic_r denotes the corresponding relation between them. This representation can be naturally expressed in the third-order binary tensor form where a non-zero entry at position (i,j,k)𝑖𝑗𝑘(i,j,k)( italic_i , italic_j , italic_k ) encodes an existing (ei,rk,ej)subscript𝑒𝑖subscript𝑟𝑘subscript𝑒𝑗(e_{i},r_{k},e_{j})( italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) triplet of the knowledge graph. To identify connections in knowledge graphs that have been incorrectly specified in the interaction tensor, various tensor factorization-based models are utilized.

For instance, the SimplIE (Kazemi and Poole, 2018) and ComplEX(Trouillon et al., 2016) models employ Canonical decomposition (CP𝐶𝑃CPitalic_C italic_P) (Hitchcock, 1927) with embeddings in the real or complex Euclidean space. The PITF (Rendle and Schmidt-Thieme, 2010) model utilizes Pairwise Interaction Tensor Factorisation in the Euclidean space, which splits the standard ternary interaction term of CP𝐶𝑃CPitalic_C italic_P into a linear sum of independent interactions between entity-entity and entity-relation pairs. Some other models use distances between embeddings of entities and relations in the vector space. For instance, TransE (Bordes et al., 2013b) uses Euclidean vector representations of relations, which are used as connection vectors between Euclidean embeddings of entities. The further development of this approach named Quate (Zhang et al., 2019) represents embeddings as rotations in the hypercomplex space.

Various artificial neural networks-based approaches were also proposed to solve the link prediction task. For instance, the models like ConvE and ConvKB (Dettmers et al., 2018; Nguyen et al., 2017) utilize convolutional neural networks to work with knowledge graphs. The R-GCN (Schlichtkrull et al., 2018) model proposes a modification of graph-convolutional neural networks GCN (Kipf and Welling, 2016) to adapt graphs convolutions to relational data. Models, such as GAAT (Wang et al., 2019b) or KBGAT (Nathani et al., 2019), use the attention mechanisms on graphs.

Some models rely on the shared-factors paradigm, which utilizes a special structure of the problem based on the fact that objects and subjects belong to the same space. While making the problem inherently non-linear, this approach significantly reduces the number of learned parameters of models and is better suited for predicting symmetric links (Hospedales, 2019) in knowledge graphs. For instance, the MEKER (Chekalina et al., 2022)) model employs CP decomposition with low-rank shared factors, and RESCAL (Nickel et al., 2011) utilizes DEDICOM (Harshman et al., 1982) factorisation decomposing each tensor slice along the relation axis. The TuckER (Hospedales, 2019) and R-TuckER (Peshekhonov et al., 2024) models employ shared-factors TuckER decomposition (Tucker, 1966) to represent knowledge graph entities and relations and supports the prediction of symmetric as well as asymmetric links in knowledge graphs.

Recent advances in understanding of hyperbolic geometry applications has lead to the development of hyperbolic models for knowledge graph completion as well. According to (Krioukov et al., 2010), the structural properties of hyperbolic spaces make them plausible for analyzing a special family of non-uniformly distributed data, which knowledge graphs also belong to. For instance, the RotH and RefH (Re, 2020) propose specific rotation and reflection transformations in a hyperbolic space to improve the expressive power of the learned embeddings. Even though these model show remarkable improvements on some knowledge graphs, there are still some cases where Euclidean models outperform hyperbolic ones. We attribute this observation to the violation of some basic distributional properties required for full compatibility with hyperbolic geometry. It motivates us to develop a hybrid approach similarly to (Gu et al., 2019; Seth and Sharaff, 2022; Kong, 2019) that takes the best from both geometries and therefore compensates for their shortcomings in analyzing real-world data. Similar idea was utilized in models M2GNN (Wang et al., 2021) generalizing graph neural network for multi-curvature and GIE (Cao et al., 2022) introducing message passing and attention algorithms on hyperbolic and hyperspherical spaces. In comparison to previous methods, our MIG-TF approach uses less complex and more stable operations (Section 3.4). Additionally, it implies learning only low-parametric hyperbolic addition to a pretrained model instead of learning the whole model as in RotH, M2GNN and GIE. Further in the text, we provide the detailed description of our approach.

3 MIXED GEOMETRY TENSOR FACTORIZATION

In this section, we introduce our new parameter-efficient hyperbolic interaction term TPTF, and the new mixed geometry model MIG-TF, where the hyperbolic term is used as a correction to the Euclidean model.

3.1 Hyperbolic geometry and Lorentz Distance

We start with basic notation and concepts from hyperbolic geometry. There are several models of hyperbolic geometry, among which Poincare Ball, Klein, and Lorentz model are frequently used in various machine learning approaches. All hyperbolic models are isomorphic – they describe the same geometry and can be interchangeably utilized by means of the corresponding transformations (Ungar, 2005). In our work, we use the Lorentz model of hyperbolic geometry. In the following, we only introduce the basics of Lorentz geometry and refer the reader to (Beem, 2017) for more details.

Let us define the Lorentz inner product for vectors x,yn+1𝑥𝑦superscript𝑛1x,y\in\mathbb{R}^{n+1}italic_x , italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT, which is used for measuring distance between hyperbolic embeddings:

x,y=x0y0+i=1nxiyi,subscript𝑥𝑦subscript𝑥0subscript𝑦0superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑦𝑖\langle x,y\rangle_{\mathcal{L}}=-x_{0}y_{0}+\sum_{i=1}^{n}x_{i}y_{i},⟨ italic_x , italic_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where

x0=β+i=1nxi2.subscript𝑥0𝛽superscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2x_{0}=\sqrt{\beta+\sum_{i=1}^{n}x_{i}^{2}}.italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = square-root start_ARG italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

The induced norm of a vector xn+1𝑥superscript𝑛1x\in\mathbb{R}^{n+1}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT in Lorentz geometry is defined as x2=x,x=βsuperscriptsubscriptnorm𝑥2subscript𝑥𝑥𝛽\|x\|_{\mathcal{L}}^{2}=\langle x,x\rangle_{\mathcal{L}}=-\beta∥ italic_x ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ⟨ italic_x , italic_x ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = - italic_β. The corresponding n𝑛nitalic_n-dimensional space n,βn+1superscript𝑛𝛽superscript𝑛1\mathcal{H}^{n,\beta}\subset\mathbb{R}^{n+1}caligraphic_H start_POSTSUPERSCRIPT italic_n , italic_β end_POSTSUPERSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT is called Hyperboloid and defined as follows:

n,β={xn+1|x2=β,β0}.superscript𝑛𝛽conditional-set𝑥superscript𝑛1formulae-sequencesuperscriptsubscriptnorm𝑥2𝛽𝛽0\mathcal{H}^{n,\beta}=\left\{x\in\mathbb{R}^{n+1}\,\big{|}\ \|x\|_{\mathcal{L}% }^{2}=-\beta,\ \beta\geq 0\right\}.caligraphic_H start_POSTSUPERSCRIPT italic_n , italic_β end_POSTSUPERSCRIPT = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT | ∥ italic_x ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = - italic_β , italic_β ≥ 0 } .

The origin vector of the hyperboloid n,βsuperscript𝑛𝛽\mathcal{H}^{n,\beta}caligraphic_H start_POSTSUPERSCRIPT italic_n , italic_β end_POSTSUPERSCRIPT equals to 0=(β,0,,0)n+10𝛽00superscript𝑛1\textbf{0}=(\beta,0,...,0)\in\mathbb{R}^{n+1}0 = ( italic_β , 0 , … , 0 ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT. The inner product of 0 and x𝑥xitalic_x is, hence, 0,x=βx0subscript0𝑥𝛽subscript𝑥0\langle\textbf{0},x\rangle_{\mathcal{L}}=-\beta x_{0}⟨ 0 , italic_x ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = - italic_β italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

The associated geodesic distance is defined as

dl(x,y)=arccosh(x,y).subscript𝑑𝑙𝑥𝑦𝑎𝑟𝑐𝑐𝑜𝑠subscript𝑥𝑦d_{l}(x,y)=arccosh(-\langle x,y\rangle_{\mathcal{L}}).italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_a italic_r italic_c italic_c italic_o italic_s italic_h ( - ⟨ italic_x , italic_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ) .

Similarly to (Xu and Wu, 2020), we introduce the square Lorentz distance between x,yn𝑥𝑦superscript𝑛x,y\in\mathcal{H}^{n}italic_x , italic_y ∈ caligraphic_H start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is defined as

d2(x,y)=xy2=22x,y.subscriptsuperscript𝑑2𝑥𝑦superscriptsubscriptnorm𝑥𝑦222subscript𝑥𝑦d^{2}_{\mathcal{L}}(x,y)=\|x-y\|_{\mathcal{L}}^{2}=-2-2\langle x,y\rangle_{% \mathcal{L}}.italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ( italic_x , italic_y ) = ∥ italic_x - italic_y ∥ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = - 2 - 2 ⟨ italic_x , italic_y ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT . (1)

This distance fulfills all of the Euclidean distance axioms, except for the triangle inequality:

d(x,z)d(x,y)+d(y,z).𝑑𝑥𝑧𝑑𝑥𝑦𝑑𝑦𝑧d(x,z)\leq d(x,y)+d(y,z).italic_d ( italic_x , italic_z ) ≤ italic_d ( italic_x , italic_y ) + italic_d ( italic_y , italic_z ) . (2)

Nevertheless, it does not hold in general in the hyperbolic setting. The idea to benefit from this fact appears, for instance, in a two-dimensional hyperbolic model LorentzFM (Xu and Wu, 2020). Inequality violation in this model implies that two entities are far away and, hence, become irrelevant to each other. Generalizing the triangle inequality to the three-dimensional case is not a straightforward task, which we address in the following section.

3.2 Tetrahedron Pooling Tensor Factorization

Solving the knowledge graph link prediction problem requires learning ternary interactions in the form of entity-relation-entity triplets. This can be naturally modelled through tensor representation in the Euclidean space. However, unlike the 2D case, generalizing a representation of such third-order interactions to the hyperbolic space is not straightforward. To the best of our knowledge, there is no unified approach.

We propose to modify (2) to capture ternary interactions. In particular, we utilize the so-called tetrahedron inequality (Itoh et al., 2021): for the points u,v,t,o𝑢𝑣𝑡𝑜u,v,t,oitalic_u , italic_v , italic_t , italic_o in the Euclidean space, it holds

d(u,v)+d(o,t)d(u,t)+d(v,t)+d(o,u)+d(o,v).𝑑𝑢𝑣𝑑𝑜𝑡𝑑𝑢𝑡𝑑𝑣𝑡𝑑𝑜𝑢𝑑𝑜𝑣d(u,v)+d(o,t)\leq d(u,t)+d(v,t)+d(o,u)+d(o,v).italic_d ( italic_u , italic_v ) + italic_d ( italic_o , italic_t ) ≤ italic_d ( italic_u , italic_t ) + italic_d ( italic_v , italic_t ) + italic_d ( italic_o , italic_u ) + italic_d ( italic_o , italic_v ) . (3)

In the hyperbolic context, this inequality does not universally apply. However, as we explore further, the regions where this inequality is violated can be advantageous for our objectives. Consequently, we can naturally introduce the following score function:

SG=subscript𝑆𝐺absent\displaystyle S_{G}=italic_S start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = dl(u,v)+dl(0,t)dl(u,t)subscript𝑑𝑙𝑢𝑣subscript𝑑𝑙0𝑡subscript𝑑𝑙𝑢𝑡\displaystyle d_{l}(u,v)+d_{l}(\textbf{0},t)-d_{l}(u,t)italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_u , italic_v ) + italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( 0 , italic_t ) - italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_u , italic_t ) (4)
dl(v,t)dl(0,u)dl(0,v),subscript𝑑𝑙𝑣𝑡subscript𝑑𝑙0𝑢subscript𝑑𝑙0𝑣\displaystyle-d_{l}(v,t)-d_{l}(\textbf{0},u)-d_{l}(\textbf{0},v),- italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_v , italic_t ) - italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( 0 , italic_u ) - italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( 0 , italic_v ) ,

where u,vdl𝑢𝑣superscriptsubscript𝑑𝑙u,v\in\mathbb{R}^{d_{l}}italic_u , italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are entity embeddings and tdl𝑡superscriptsubscript𝑑𝑙t\in\mathbb{R}^{d_{l}}italic_t ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the relation embedding. Note that this score function is negative when inequality (3) holds and is positive in the opposite case. Unfortunately, the score function (4) involves arccosh, which is not differentiable everywhere and, hence, may lead to difficulties for gradient-based optimization methods.

Similarly to LorentzFM, (4) is replaced with the following “smoothed” score function that mimics the desired behaviour:

SH(u,v,t)=subscript𝑆𝐻𝑢𝑣𝑡absent\displaystyle S_{H}(u,v,t)=italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_u , italic_v , italic_t ) = (5)
12(d2(u,v)+d2(0,t)d2(u,t)d2(t,v)d2(0,u)d2(0,v))0,v0,t+0,u0,v+0,t0,v,12superscriptsubscript𝑑2𝑢𝑣superscriptsubscript𝑑20𝑡superscriptsubscript𝑑2𝑢𝑡superscriptsubscript𝑑2𝑡𝑣superscriptsubscript𝑑20𝑢superscriptsubscript𝑑20𝑣subscript0𝑣subscript0𝑡subscript0𝑢subscript0𝑣subscript0𝑡subscript0𝑣\displaystyle\frac{\dfrac{1}{2}\left(\begin{array}[]{c}d_{\mathcal{L}}^{2}(u,v% )+d_{\mathcal{L}}^{2}(\textbf{0},t)-d_{\mathcal{L}}^{2}(u,t)\\ \ -d_{\mathcal{L}}^{2}(t,v)-d_{\mathcal{L}}^{2}(\textbf{0},u)-d_{\mathcal{L}}^% {2}(\textbf{0},v)\end{array}\right)}{\langle\textbf{0},v\rangle_{\mathcal{L}}% \langle\textbf{0},t\rangle_{\mathcal{L}}+\langle\textbf{0},u\rangle_{\mathcal{% L}}\langle\textbf{0},v\rangle_{\mathcal{L}}+\langle\textbf{0},t\rangle_{% \mathcal{L}}\langle\textbf{0},v\rangle_{\mathcal{L}}},divide start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( start_ARRAY start_ROW start_CELL italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_u , italic_v ) + italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 , italic_t ) - italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_u , italic_t ) end_CELL end_ROW start_ROW start_CELL - italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t , italic_v ) - italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 , italic_u ) - italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 0 , italic_v ) end_CELL end_ROW end_ARRAY ) end_ARG start_ARG ⟨ 0 , italic_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ⟨ 0 , italic_t ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT + ⟨ 0 , italic_u ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ⟨ 0 , italic_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT + ⟨ 0 , italic_t ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ⟨ 0 , italic_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT end_ARG ,

see Figure 3 for comparison of the two score functions.

Note also that we may rewrite SHsubscript𝑆𝐻S_{H}italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT as:

SH=(2+t0u0v0)(u,vu,tt,v)u0v0+u0t0+t0v0.subscript𝑆𝐻2subscript𝑡0subscript𝑢0subscript𝑣0subscript𝑢𝑣subscript𝑢𝑡subscript𝑡𝑣subscript𝑢0subscript𝑣0subscript𝑢0subscript𝑡0subscript𝑡0subscript𝑣0S_{H}=\frac{(2+t_{0}-u_{0}-v_{0})-(\langle u,v\rangle_{\mathcal{L}}-\langle u,% t\rangle_{\mathcal{L}}-\langle t,v\rangle_{\mathcal{L}})}{u_{0}v_{0}+u_{0}t_{0% }+t_{0}v_{0}}.italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT = divide start_ARG ( 2 + italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - ( ⟨ italic_u , italic_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT - ⟨ italic_u , italic_t ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT - ⟨ italic_t , italic_v ⟩ start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT ) end_ARG start_ARG italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG .
Refer to caption
Figure 3: Left score function landscape corresponds to our score function (5), whilst right one corresponds to (4). As seen from the plots the landscapes of score functions for different distances are similar. For more score function landscapes see Appendix.

The score function yields a positive value when the points corresponding to linked entities are in close proximity and a negative value when they are distant (see Appendix). Additionally, we note that the terms d2(u,t)superscriptsubscript𝑑2𝑢𝑡d_{\mathcal{L}}^{2}(u,t)italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_u , italic_t ) and d2(t,v)superscriptsubscript𝑑2𝑡𝑣d_{\mathcal{L}}^{2}(t,v)italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t , italic_v ), which account for interactions between relations and entities, effectively draw entities u𝑢uitalic_u and v𝑣vitalic_v closer to the relation t𝑡titalic_t. Consequently, the trained model exhibits a distinct behavior: entities tend to cluster around well-separated relations, as illustrated in Figure 4.

The model’s predictions p𝑝pitalic_p for entity u𝑢uitalic_u and relation t𝑡titalic_t can be obtained with p={pj}j=1ne𝑝superscriptsubscriptsubscript𝑝𝑗𝑗1subscript𝑛𝑒p=\{p_{j}\}_{j=1}^{n_{e}}italic_p = { italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT:

p=σ(SH(u,V,t)),𝑝𝜎subscript𝑆𝐻𝑢𝑉𝑡p=\sigma(S_{H}(u,V,t)),italic_p = italic_σ ( italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_u , italic_V , italic_t ) ) ,

where Vdl×ne𝑉superscriptsubscript𝑑𝑙subscript𝑛𝑒V\in\mathbb{R}^{d_{l}\times n_{e}}italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the matrix of all entity embeddings and nesubscript𝑛𝑒n_{e}italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT – the number of entities. In TPTF model, we use the binary cross-entropy (BCE) loss function:

lBCE(yi,pj)=yilog(pj)(1yi)log(1pj),subscript𝑙𝐵𝐶𝐸subscript𝑦𝑖subscript𝑝𝑗subscript𝑦𝑖subscript𝑝𝑗1subscript𝑦𝑖1subscript𝑝𝑗l_{BCE}(y_{i},p_{j})=-y_{i}\log(p_{j})-(1-y_{i})\log(1-p_{j}),italic_l start_POSTSUBSCRIPT italic_B italic_C italic_E end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - ( 1 - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_log ( 1 - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,

where y={yi}i=1ne𝑦superscriptsubscriptsubscript𝑦𝑖𝑖1subscript𝑛𝑒y=\{y_{i}\}_{i=1}^{n_{e}}italic_y = { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT – binary label vector of the ground truth. Similarly to (Law et al., 2019), we optimize u𝑢uitalic_u, v𝑣vitalic_v and t𝑡titalic_t embeddings in the Euclidean space with the AdamW (Loshchilov and Hutter, 2019) optimizer and then map them to the hyperboloid as follows: vn𝑣superscript𝑛v\in\mathbb{R}^{n}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT maps to [β+v22,v]n+1𝛽superscriptsubscriptnorm𝑣22𝑣superscript𝑛1[\sqrt{\beta+\|v\|_{2}^{2}},v]\in\mathbb{R}^{n+1}[ square-root start_ARG italic_β + ∥ italic_v ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_v ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT.

𝐮𝐮\mathbf{u}bold_u𝐭𝟏subscript𝐭1\mathbf{t_{1}}bold_t start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT𝐭𝟐subscript𝐭2\mathbf{t_{2}}bold_t start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT𝐭𝐧subscript𝐭𝐧\mathbf{t_{n}}bold_t start_POSTSUBSCRIPT bold_n end_POSTSUBSCRIPT...
Figure 4: Each embedding of a relation (t1,,tnsubscript𝑡1subscript𝑡𝑛t_{1},...,t_{n}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT) defines a cone that encompasses the embeddings of the entities associated with that relation.

3.3 Mixed Geometry Tensor Factorisation

To improve the link prediction in knowledge graphs we introduce a shared-factor mixed geometry model combining Euclidean and hyperbolic models (see Figure 5). Due to the complexity of the knowledge graph, some vertices are better described with hyperbolic embeddings whilst others are with the Euclidean ones. Therefore with a combination of hyperbolic and Euclidean embeddings, significantly more information on the structure of the knowledge graph is saved with Euclidean and hyperbolic vector representations together. For the hyperbolic model, we used TPTF model (5), and for Euclidean one we used TuckER (Hospedales, 2019). The score function of the Euclidean model is defined as:

SE(G,u,v,t)=α,β,γ=1d1,d1,d2Gαβγuαvβtγ,subscript𝑆𝐸𝐺𝑢𝑣𝑡superscriptsubscript𝛼𝛽𝛾1subscript𝑑1subscript𝑑1subscript𝑑2subscript𝐺𝛼𝛽𝛾subscript𝑢𝛼subscript𝑣𝛽subscript𝑡𝛾S_{E}(G,u,v,t)=\sum_{\alpha,\beta,\gamma=1}^{d_{1},d_{1},d_{2}}G_{\alpha\beta% \gamma}u_{\alpha}v_{\beta}t_{\gamma},italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_G , italic_u , italic_v , italic_t ) = ∑ start_POSTSUBSCRIPT italic_α , italic_β , italic_γ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_α italic_β italic_γ end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , (6)

where u,vd1𝑢𝑣superscriptsubscript𝑑1u,v\in\mathbb{R}^{d_{1}}italic_u , italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT – embeddings of entities, td2𝑡superscriptsubscript𝑑2t\in\mathbb{R}^{d_{2}}italic_t ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT – embedding of relations and Gd1×d1×d2𝐺superscriptsubscript𝑑1subscript𝑑1subscript𝑑2G\in\mathbb{R}^{d_{1}\times d_{1}\times d_{2}}italic_G ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT – tensor of learnable parameters. Here, d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the ranks of the Tucker decomposition.

The score function of our mixed-geometry model MIG-TF is the sum of score functions of the Lorentzian and Euclidean models:

(SMIG-TF)i=(SE)i+(SH)i,subscriptsubscript𝑆MIG-TF𝑖subscriptsubscript𝑆𝐸𝑖subscriptsubscript𝑆𝐻𝑖(S_{\textit{MIG-TF}})_{i}=(S_{E})_{i}+(S_{H})_{i},( italic_S start_POSTSUBSCRIPT MIG-TF end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (7)

where (SE)i=SE(G,u,Vi,t)subscriptsubscript𝑆𝐸𝑖subscript𝑆𝐸𝐺𝑢subscript𝑉𝑖𝑡(S_{E})_{i}=S_{E}(G,u,V_{i},t)( italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_G , italic_u , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t ) defined in (6) is an Euclidean interaction term, (SH)i=SH(u,Vi,t)subscriptsubscript𝑆𝐻𝑖subscript𝑆𝐻superscript𝑢superscriptsubscript𝑉𝑖superscript𝑡(S_{H})_{i}=S_{H}(u^{\mathcal{L}},V_{i}^{\mathcal{L}},t^{\mathcal{L}})( italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ) is a hyperbolic interaction term (5). The vectors u𝑢uitalic_u and t𝑡titalic_t Euclidean entity and relation embeddings, usuperscript𝑢u^{\mathcal{L}}italic_u start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT and tsuperscript𝑡t^{\mathcal{L}}italic_t start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT – hyperbolic embeddings, Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Visuperscriptsubscript𝑉𝑖V_{i}^{\mathcal{L}}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT are i𝑖iitalic_i-th rows of matrices V𝑉Vitalic_V and Vsuperscript𝑉V^{\mathcal{L}}italic_V start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT of nesubscript𝑛𝑒n_{e}italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT Euclidean and hyperbolic entity embeddings, respectively, and Gd1×d1×d2𝐺superscriptsubscript𝑑1subscript𝑑1subscript𝑑2G\in\mathbb{R}^{d_{1}\times d_{1}\times d_{2}}italic_G ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT – 3-D tensor of learnable parameters. In MIG-TF model, we utilize pretrained TuckER model (6) and optimize the hyperbolic term parameters (5) of the score function (7) to minimize the BCE loss:

MIG-TF=1nei=1nelBCE(ai,σ((SMIG-TF)i)),subscriptMIG-TF1subscript𝑛𝑒superscriptsubscript𝑖1subscript𝑛𝑒subscript𝑙𝐵𝐶𝐸subscript𝑎𝑖𝜎subscriptsubscript𝑆MIG-TF𝑖\mathcal{L}_{\textit{MIG-TF}}=\frac{1}{n_{e}}\sum_{i=1}^{n_{e}}l_{BCE}(a_{i},% \,\sigma((S_{\textit{MIG-TF}})_{i})),caligraphic_L start_POSTSUBSCRIPT MIG-TF end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_B italic_C italic_E end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ ( ( italic_S start_POSTSUBSCRIPT MIG-TF end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ,

where aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an i𝑖iitalic_i-th element of interaction vector a𝑎aitalic_a for entity and relation with embeddings u𝑢uitalic_u and t𝑡titalic_t, respectively. We optimize hyperbolic embeddings as in Section 3.2. To prevent model overfitting, we employ the dropout layer (Salakhutdinov, 2012) in both Euclidean and hyperbolic interaction terms.

TuckER (frozen)TPTF (trainable)+σ𝜎\sigmaitalic_σLossUpdate TPTF weights
Figure 5: The proposed MIG-TF model architecture.

3.4 Mixed-geometry models comparison

The next subsection is devoted to comparing our approach with other mixed-geometry methods: M2GNN (Wang et al., 2021) and GIE (Cao et al., 2022).

The translational distance (Zhang et al., 2020) model M2GNN generalizes graph neural networks for multi-relational setting on knowledge graphs and modified it for multi-curvature setting capturing graph structures more effectively, especially hierarchical and cyclic ones.

The model GIE introduces the message passing and attention mechanism for the knowledge graph completion by utilizing mapping on hyperbolic and hyperspherical spaces.

Both of the existing approaches suffers from computational complexity and numerical instability due to utilizing exponential and logarithmic mapping, which are even more numerically unstable than the standard Lorentz model operations (Peng et al., 2021). In contrast, MIG-TF proposes using Square Lorentz distance, a convenient replacement for Geodesic distance. Square Lorentz distance reduces hyperbolic operations, such as exponential and logarithmic mappings and arccosh, to mere inner product computations, decreases complexity of operations and increases numerical stability.

Additionally, only the low-parametric addition TPTF is learned in our MIG-TF method instead of learning the whole models in M2GNN and GIE.

Unlike (Wang et al., 2021) and (Cao et al., 2022), we do not utilize spherical geometry and still our mixed geometry approach achieves the best metrics in the majority of cases. Nevertheless, our framework is sufficiently flexible to accommodate the incorporation of spherical geometry, presenting a promising direction for future research.

4 EXPERIMENTS

In experiments, we shows that mixed geometry model, combining Euclidean and hyperbolic embeddings, with low-dimensional hyperbolic ones outperforms Euclidean, Complex and hyperbolic models with significantly higher dimensions of embeddings. Additionally, we have studied the dependence of curvature of the space of hyperbolic embeddings and expressive ability of hyperbolic and mixed geometry models. Moreover, we have shown that hyperbolic addition to TuckER (Hospedales, 2019) model improves the quality of link prediction between hierarchically distributed entities.

4.1 Datasets

We evaluate our mixed geometry model for link prediction task on three standard knowledge graphs: WN18RR (Bordes et al., 2013b; Dettmers et al., 2018), FB15k-237 (Bordes et al., 2013b; Toutanova et al., 2015) and YAGO3-10 (Mahdisoltani et al., 2013). WN18RR is a subset of WordNet (Miller, 1995) – knowledge base containing lexical relations between words. FB15k-237 is a subset of FreeBase (Bollacker et al., 2008) – collaborative knowledge base containing general world knowledge, such as nationalities of famous people, locations of some buildings or genre of music. YAGO3-10 is a subset of YAGO3 (Suchanek et al., 2007) which mostly contains descriptions o people. The statistics of these knowledge graphs are shown in following table:

Table 1: Knowledge graphs statistics.
𝖣𝖺𝗍𝖺𝗌𝖾𝗍𝖣𝖺𝗍𝖺𝗌𝖾𝗍\mathsf{Dataset}sansserif_Dataset 𝖤𝗇𝗍𝗂𝗍𝗂𝖾𝗌𝖤𝗇𝗍𝗂𝗍𝗂𝖾𝗌\mathsf{Entities}sansserif_Entities 𝖱𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝗌𝖱𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝗌\mathsf{Relations}sansserif_Relations 𝖳𝗋𝗂𝗉𝗅𝖾𝗍𝗌𝖳𝗋𝗂𝗉𝗅𝖾𝗍𝗌\mathsf{Triplets}sansserif_Triplets
WN18RR𝑊𝑁18𝑅𝑅WN18RRitalic_W italic_N 18 italic_R italic_R 40,9434094340,94340 , 943 11111111 93,0039300393,00393 , 003
FB15k-237𝐹𝐵15𝑘-237FB15k\text{-}237italic_F italic_B 15 italic_k - 237 14,5411454114,54114 , 541 237237237237 310,116310116310,116310 , 116
YAGO3-10𝑌𝐴𝐺𝑂3-10YAGO3\text{-}10italic_Y italic_A italic_G italic_O 3 - 10 123,182123182123,182123 , 182 37373737 1,179,04011790401,179,0401 , 179 , 040

For each knowledge graph we follow standard data augmentation by adding inverse relations (Bordes et al., 2013a).

4.2 Baselines

We compare our models with 8888 baselines: 4444 models with euclidean embeddings, 2222 with complex embeddings and 2222 with hyperbolic embeddings:

  • Euclidean models: PITF (Pairwise Interaction Tensor Factorisation) – model using PITF tensor decomposition and Euclidean embeddings (Rendle and Schmidt-Thieme, 2010). DistMult (Distance-based Multiplication) – model utilizing canonical tensor decomposition (Yang et al., 2014).

  • TuckER-based Euclidean models: TuckER (Tucker decomposition) – model using Tucker tensor decomposition and Euclidean embeddings (Hospedales, 2019). RTuckERTuckER model with a Riemanian method of optimization (Peshekhonov et al., 2024).

  • Complex models: ComplEx-N3 (Complex Embeddings) – model using real part of the canonical tensor decomposition and complex embeddings (Chen et al., 2021). RotatE (Relational Rotation in complex space) – model using rotations in complex space and complex embeddings (Tang, 2019).

  • Hyperbolic models: RotH (Hyperbolic Rotations) and RefH (Hyperbolic Reflections) – attention models using rotations and reflections in hyperbolic space and utilizing hyperbolic embeddings from Poincare Ball (Re, 2020).

  • Mixed-geometry models: M2GNN (Mixed-Curvature Multi-Relational Graph Neural Network) – model using multi-curvature embeddings for translational distance and multi-relational graph neural networks. GIE (Geometry Interaction Knowledge Graph Embeddings) – model utilizing attention layers and embeddings from Euclidean, hyperbolic and spherical spaces.

To demonstrate the benefits of our mixed geometry approach to link prediction task we compare baselines with our hyperbolic model TPTF (5) and our mixed geometry models MIG-TF (7) and MIG-TFQR. Here, MIG-TFQR is mixed geometry model with orthogonalization heuristic. For more details see Appendix. Additionaly, to better understand the role of hyperbolic interaction term we compare TPTF and MIG-TF models with different fixed hyperbolicities.

4.3 Evaluation metrics

At the test mode, we rank correct entities against all entities using score fictions (5) for TPTF and (7) for MIG-TF. We compare models using two rank-base metrics: MRR𝑀𝑅𝑅MRRitalic_M italic_R italic_R (Mean Reciprocal Rank) and HR@k,𝐻𝑅@𝑘HR@k,italic_H italic_R @ italic_k , k1,3,10𝑘1310k\in{1,3,10}italic_k ∈ 1 , 3 , 10 (Hit Rates). MRR𝑀𝑅𝑅MRRitalic_M italic_R italic_R measures mean inverse rank of correct entity, whilst HR@k𝐻𝑅@𝑘HR@kitalic_H italic_R @ italic_k measures the share of correct triples among Topk𝑇𝑜𝑝𝑘Top-kitalic_T italic_o italic_p - italic_k recommendations (Vasilev, 2021). We follow the standard evaluation protocol in the filtered setting: all true triples in the KG are filtered out during evaluation (Lacroix et al., 2018).

5 RESULTS

Our mixed geometry models achieved new state-of-the-art results on FB15k-237, YAGO3-10 and WN18RR in the majority of metrics. Moreover, our models have significantly less parameters then the best models on WN18RR and YAGO3-10. For instance, our models have 8888 times less parameters than RotH or RefH. Additionally, as we expected, Lorentz addition to the TuckER model significantly increases MRR𝑀𝑅𝑅MRRitalic_M italic_R italic_R and HR@k𝐻𝑅@𝑘HR@kitalic_H italic_R @ italic_k metrics, with low - Lorentzian embeddings. We also show, that MIG-TF model is better that TuckER model in prediction of various relations, especially rare ones Figure 6.

Table 2: Approximate number of models’ parameters on knowledge graphs WN18RR𝑊𝑁18𝑅𝑅WN18RRitalic_W italic_N 18 italic_R italic_R, FB15k237𝐹𝐵15𝑘237FB15k-237italic_F italic_B 15 italic_k - 237 and YAGO310𝑌𝐴𝐺𝑂310YAGO3-10italic_Y italic_A italic_G italic_O 3 - 10.
Models 𝖥𝖡𝟣𝟧𝗄-𝟤𝟥𝟩𝖥𝖡𝟣𝟧𝗄-237\mathsf{FB15k\text{-}237}sansserif_FB15k - sansserif_237 𝖶𝖭𝟣𝟪𝖱𝖱𝖶𝖭𝟣𝟪𝖱𝖱\mathsf{WN18RR}sansserif_WN18RR 𝖸𝖠𝖦𝖮𝟥-𝟣𝟢𝖸𝖠𝖦𝖮𝟥-10\mathsf{YAGO3\text{-}10}sansserif_YAGO3 - sansserif_10
PITF𝑃𝐼𝑇𝐹PITFitalic_P italic_I italic_T italic_F 41064superscript1064\cdot 10^{6}4 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 81068superscript1068\cdot 10^{6}8 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2510625superscript10625\cdot 10^{6}25 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
DistMult𝐷𝑖𝑠𝑡𝑀𝑢𝑙𝑡DistMultitalic_D italic_i italic_s italic_t italic_M italic_u italic_l italic_t 41064superscript1064\cdot 10^{6}4 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 81068superscript1068\cdot 10^{6}8 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2510625superscript10625\cdot 10^{6}25 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
TuckERSE+0SH𝑇𝑢𝑐𝑘𝐸subscript𝑅subscript𝑆𝐸0subscript𝑆𝐻TuckER_{S_{E}+0\cdot S_{H}}italic_T italic_u italic_c italic_k italic_E italic_R start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + 0 ⋅ italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 41064superscript1064\cdot 10^{6}4 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 81068superscript1068\cdot 10^{6}8 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2510625superscript10625\cdot 10^{6}25 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
RTuckERSE+0SH𝑅𝑇𝑢𝑐𝑘𝐸subscript𝑅subscript𝑆𝐸0subscript𝑆𝐻RTuckER_{S_{E}+0\cdot S_{H}}italic_R italic_T italic_u italic_c italic_k italic_E italic_R start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + 0 ⋅ italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 41064superscript1064\cdot 10^{6}4 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 81068superscript1068\cdot 10^{6}8 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2510625superscript10625\cdot 10^{6}25 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
ComplEx-N3𝐶𝑜𝑚𝑝𝑙𝐸𝑥-𝑁3ComplEx\text{-}N3italic_C italic_o italic_m italic_p italic_l italic_E italic_x - italic_N 3 41064superscript1064\cdot 10^{6}4 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 81068superscript1068\cdot 10^{6}8 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2510625superscript10625\cdot 10^{6}25 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
RotatE𝑅𝑜𝑡𝑎𝑡𝐸RotatEitalic_R italic_o italic_t italic_a italic_t italic_E 6010660superscript10660\cdot 10^{6}60 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 6010660superscript10660\cdot 10^{6}60 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 120106120superscript106120\cdot 10^{6}120 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
RotH𝑅𝑜𝑡𝐻RotHitalic_R italic_o italic_t italic_H 4010640superscript10640\cdot 10^{6}40 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 8010680superscript10680\cdot 10^{6}80 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 120106120superscript106120\cdot 10^{6}120 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
RefH𝑅𝑒𝑓𝐻RefHitalic_R italic_e italic_f italic_H 4010640superscript10640\cdot 10^{6}40 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 8010680superscript10680\cdot 10^{6}80 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 120106120superscript106120\cdot 10^{6}120 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
M2GNNsuperscript𝑀2𝐺𝑁𝑁M^{2}GNNitalic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G italic_N italic_N 1210612superscript10612\cdot 10^{6}12 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2410624superscript10624\cdot 10^{6}24 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 7510675superscript10675\cdot 10^{6}75 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
GIE𝐺𝐼𝐸GIEitalic_G italic_I italic_E 41064superscript1064\cdot 10^{6}4 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1810618superscript10618\cdot 10^{6}18 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2410624superscript10624\cdot 10^{6}24 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Our models
TPTF0SE+SH𝑇𝑃𝑇subscript𝐹0subscript𝑆𝐸subscript𝑆𝐻TPTF_{0\cdot S_{E}+S_{H}}italic_T italic_P italic_T italic_F start_POSTSUBSCRIPT 0 ⋅ italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 21062superscript1062\cdot 10^{6}2 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 41064superscript1064\cdot 10^{6}4 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1210612superscript10612\cdot 10^{6}12 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
MIG-TFSE+SH𝑀𝐼𝐺-𝑇subscript𝐹subscript𝑆𝐸subscript𝑆𝐻MIG\text{-}TF_{S_{E}+S_{H}}italic_M italic_I italic_G - italic_T italic_F start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 51065superscript1065\cdot 10^{6}5 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1010610superscript10610\cdot 10^{6}10 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 3110631superscript10631\cdot 10^{6}31 ⋅ 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Refer to caption
Figure 6: The top bar chart demonstrates the cumulative HR@10𝐻𝑅@10HR@10italic_H italic_R @ 10 metrics in the WN18RR𝑊𝑁18𝑅𝑅WN18RRitalic_W italic_N 18 italic_R italic_R knowledge graph with respect to all types of relations ordered by their occurrence frequency. It is seen that Mixed Geometry model (red) has higher cumulative metrics in comparison to TuckER (blue). In addition, Mixed Geometry has significant increase in HR@10𝐻𝑅@10HR@10italic_H italic_R @ 10 metric on rare relations. For instance, on “member of domain region” +140%percent140+140\%+ 140 %, “member of domain usage” +48%percent48+48\%+ 48 % and “member meronym” +36%percent36+36\%+ 36 %. The bottom bar chart illustrates the number of links with each relation in knowledge graph.
Table 3: Metrics on knowledge graphs: HR@k𝐻𝑅@𝑘HR@kitalic_H italic_R @ italic_k and MRR𝑀𝑅𝑅MRRitalic_M italic_R italic_R metrics of models on knowledge graphs WN18RR, FB12k-237 and YAGO3-10. Bold means the best metric on the knowledge graph and underlined – the second best metric. For each knowledge graph we follow standard data augmentation by adding inverse relations (Bordes et al., 2013a).
𝖥𝖡𝟣𝟧𝗄-𝟤𝟥𝟩𝖥𝖡𝟣𝟧𝗄-237\mathsf{FB15k\text{-}237}sansserif_FB15k - sansserif_237 𝖶𝖭𝟣𝟪𝖱𝖱𝖶𝖭𝟣𝟪𝖱𝖱\mathsf{WN18RR}sansserif_WN18RR 𝖸𝖠𝖦𝖮𝟥-𝟣𝟢𝖸𝖠𝖦𝖮𝟥-10\mathsf{YAGO3\text{-}10}sansserif_YAGO3 - sansserif_10
Models 𝖧𝖱@𝟣𝟢𝖧𝖱@10\mathsf{HR@10}sansserif_HR @ sansserif_10 𝖧𝖱@𝟥𝖧𝖱@3\mathsf{HR@3}sansserif_HR @ sansserif_3 𝖧𝖱@𝟣𝖧𝖱@1\mathsf{HR@1}sansserif_HR @ sansserif_1 𝖬𝖱𝖱𝖬𝖱𝖱\mathsf{MRR}sansserif_MRR 𝖧𝖱@𝟣𝟢𝖧𝖱@10\mathsf{HR@10}sansserif_HR @ sansserif_10 𝖧𝖱@𝟥𝖧𝖱@3\mathsf{HR@3}sansserif_HR @ sansserif_3 𝖧𝖱@𝟣𝖧𝖱@1\mathsf{HR@1}sansserif_HR @ sansserif_1 𝖬𝖱𝖱𝖬𝖱𝖱\mathsf{MRR}sansserif_MRR 𝖧𝖱@𝟣𝟢𝖧𝖱@10\mathsf{HR@10}sansserif_HR @ sansserif_10 𝖧𝖱@𝟥𝖧𝖱@3\mathsf{HR@3}sansserif_HR @ sansserif_3 𝖧𝖱@𝟣𝖧𝖱@1\mathsf{HR@1}sansserif_HR @ sansserif_1 𝖬𝖱𝖱𝖬𝖱𝖱\mathsf{MRR}sansserif_MRR
PITF𝑃𝐼𝑇𝐹PITFitalic_P italic_I italic_T italic_F 0.459 0.347 0.186 0.284 0.403 0.384 0.392 0.296 0.610 0.494 0.350 0.453
DistMult𝐷𝑖𝑠𝑡𝑀𝑢𝑙𝑡DistMultitalic_D italic_i italic_s italic_t italic_M italic_u italic_l italic_t 0.419 0.263 0.155 0.241 0.490 0.440 0.390 0.430 0.540 0.380 0.240 0.340
TuckERSE+0SH𝑇𝑢𝑐𝑘𝐸subscript𝑅subscript𝑆𝐸0subscript𝑆𝐻TuckER_{S_{E}+0\cdot S_{H}}italic_T italic_u italic_c italic_k italic_E italic_R start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + 0 ⋅ italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.544 0.394 0.266 0.358 0.526 0.482 0.443 0.470 0.681 0.579 0.466 0.544
RTuckERSE+0SH𝑅𝑇𝑢𝑐𝑘𝐸subscript𝑅subscript𝑆𝐸0subscript𝑆𝐻RTuckER_{S_{E}+0\cdot S_{H}}italic_R italic_T italic_u italic_c italic_k italic_E italic_R start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + 0 ⋅ italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.505 0.387 0.237 0.326 0.546 0.495 0.446 0.479
ComplEx-N3𝐶𝑜𝑚𝑝𝑙𝐸𝑥-𝑁3ComplEx\text{-}N3italic_C italic_o italic_m italic_p italic_l italic_E italic_x - italic_N 3 0.547 0.392 0.264 0.357 0.572 0.495 0.435 0.480 0.701 0.609 0.498 0.569
RotatE𝑅𝑜𝑡𝑎𝑡𝐸RotatEitalic_R italic_o italic_t italic_a italic_t italic_E 0.533 0.375 0.245 0.338 0.571 0.492 0.428 0.476 0.670 0.550 0.402 0.495
RotH𝑅𝑜𝑡𝐻RotHitalic_R italic_o italic_t italic_H 0.535 0.380 0.246 0.344 0.586 0.514 0.449 0.496 0.706 0.612 0.495 0.573
RefH𝑅𝑒𝑓𝐻RefHitalic_R italic_e italic_f italic_H 0.536 0.383 0.252 0.346 0.568 0.485 0.404 0.461 0.711 0.619 0.502 0.576
M2GNNsuperscript𝑀2𝐺𝑁𝑁M^{2}GNNitalic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G italic_N italic_N 0.565 0.398 0.275 0.362 0.572 0.498 0.444 0.485 0.702 0.605 0.478 0.573
GIE𝐺𝐼𝐸GIEitalic_G italic_I italic_E 0.552 0.401 0.271 0.362 0.575 0.505 0.452 0.491 0.709 0.618 0.505 0.579
Our models
TPTF0SE+SH𝑇𝑃𝑇subscript𝐹0subscript𝑆𝐸subscript𝑆𝐻TPTF_{0\cdot S_{E}+S_{H}}italic_T italic_P italic_T italic_F start_POSTSUBSCRIPT 0 ⋅ italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.504 0.321 0.186 0.238 0.489 0.378 0.252 0.314 0.665 0.521 0.383 0.481
MIG-TFSE+SH𝑀𝐼𝐺-𝑇subscript𝐹subscript𝑆𝐸subscript𝑆𝐻MIG\text{-}TF_{S_{E}+S_{H}}italic_M italic_I italic_G - italic_T italic_F start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.554 0.402 0.277 0.367 0.561 0.507 0.450 0.496 0.717 0.621 0.502 0.579
MIG-TFQR,SE+SH𝑀𝐼𝐺-𝑇subscript𝐹𝑄𝑅subscript𝑆𝐸subscript𝑆𝐻MIG\text{-}TF_{QR,S_{E}+S_{H}}italic_M italic_I italic_G - italic_T italic_F start_POSTSUBSCRIPT italic_Q italic_R , italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT 0.553 0.402 0.277 0.366 0.574 0.514 0.452 0.499 0.720 0.622 0.502 0.580

5.1 Hyperbolicity

To study the impact of hyperbolic interaction term in hybrid model we trained TPTF and MIG-TF models with various fixed curvatures. Models with low curvature are similar to euclidean ones, therefore they worse predict links between hierarchically distributed entities of knowledge graph. In contrast, models with curvature c1.5𝑐1.5c\approx 1.5italic_c ≈ 1.5 show the highest metrics both in mixed geometry and hyperbolic cases Figure 7.

Euclidean component in MIG-TF model embeds features of knowledge graph structure, whilst hyperbolic ones are not fully represented with embeddings. Therefore, the hyperbolic component of mixed geometry model represents only the hyperbolic features of knowledge graph that left after Euclidean component application (Ni et al., 2015; Gu et al., 2019). In contrast, RotH and RefH architectures (Re, 2020), are based on rotations and reflections in hyperbolic and the Euclidean space, respectively. In contrast, our model utilizes the hyperbolic triangle inequality, as well as uses embeddings in both Euclidean and hyperbolic spaces. Additionally, our model has less stringent requirements on the structure of the input data.

5.2 Link prediction quality

As demonstrated in Table 3, our mixed geometry approach with mixed geometries outperforms all other competitors on the two out of three datasets. It slightly underperforms its main competitor (the hyperbolic RotH model) on the WN18RR𝑊𝑁18𝑅𝑅WN18RRitalic_W italic_N 18 italic_R italic_R dataset on one metric. We attribute this result to the fact that the structure of this dataset is the most plausible for analysis with hyperbolic geometry. Indeed, by inspecting Figure 1, we see that this dataset is the most compatible with the ideal power law distribution, it contains the minimal amount of active nodes and exhibits almost linear dependency over the majority of entities. This result aligns well with our initial design that is targeted towards mixed hierarchical and non-hierarchical structures and therefore is may not take full advantage of more homogeneous data organization. A possible remedy to this effect would be changing Euclidean component of MIG-TF model to more powerful model, such as Euclidean RotE (Re, 2020) or complex ComplEx-N3 (Chen et al., 2021). We leave investigation of this approach for future work.

On the other two datasets our approach dominates its competitors, which aligns well with our assumptions on the structural complexities of these datasets that prevent purely hyperbolic or purely Euclidean model to learn the best representations. Our approach not only outperforms other models but also requires a lesser amount of parameters, which indicates the higher representation capacity of the mixed geometry embeddings.

Refer to caption
Figure 7: The graph presents the dependence of HR@10𝐻𝑅@10HR@10italic_H italic_R @ 10 metric from curvature of Hyperbolic space β𝛽\betaitalic_β for MIG-TF model on WN18RR dataset.

By consulting to Figure 1, one can see that the most prominent results are obtained on the FB15k237𝐹𝐵15𝑘237FB15k-237italic_F italic_B 15 italic_k - 237 dataset exhibiting the highest degree of violation of the power law distribution. Remarkably, the otherwise top-performing hyperbolic models underperform the best Euclidean model on this dataset. This signifies the detrimental effect of the structural non-homogeneity on single-geometry models and highlight the advantages of our mixed geometry approach.

6 CONCLUSION

The use of the proposed mixed geometry model MIG-TF combining the Euclidean and hyperbolic embedding techniques can significantly improve the quality of link predictions in knowledge graphs with non-homogeneous structure. It provides a significant advantage over state-of-the-art single-geometry models, such as the Euclidean TuckER and the hyperbolic RotH and RefH. This effect can be attributed to the fact that both Euclidean and Hyperbolic embeddings capture different aspects of relationships between entities in a knowledge graph. By combining these two approaches, mixed geometry models can capture a broader range of relationships, leading to more accurate predictions. We also show that hybrid models are particularly well-suited for addressing the challenge of link prediction in knowledge graphs under limited computing resources. Due to the geometry-driven improved representation capacity, our proposed approach leads to fewer parameters compared to complex or attention-based models such as RotatE and RotH. Additionally, our model requires training only the minority of its parameter in comparison to the other mixed geometry models such as GIE and M2GNN. This facts make MIG-TF approach more efficient and scalable.

Acknowledgments

The work of Viacheslav Yusupov and Maxim Rakhuba was prepared within the framework of the HSE University Basic Research Program. The calculations were performed in part through the computational resources of HPC facilities at HSE University (Kostenetskiy et al., 2021).

References

  • Agarwal et al. (2021) Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3554–3565. Association for Computational Linguistics, June 2021.
  • Agrawal et al. (2023) Garima Agrawal, Tharindu Kumarage, Zeyad Alghami, and Huan Liu. Can knowledge graphs reduce hallucinations in llms?: A survey. arXiv preprint arXiv:2311.07914, 2023.
  • Beem (2017) John K Beem. Global lorentzian geometry. Routledge, 2017.
  • Bollacker et al. (2008) Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008.
  • Bordes et al. (2013a) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26, 2013a.
  • Bordes et al. (2013b) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26, 2013b.
  • Cao et al. (2022) Zongsheng Cao, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, and Qingming Huang. Geometry interaction knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 5521–5529, 2022.
  • Chekalina et al. (2022) Viktoriia Chekalina, Anton Razzhigaev, Albert Sayapin, Evgeny Frolov, and Alexander Panchenko. MEKER: Memory efficient knowledge embedding representation for link prediction and question answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 355–365, Dublin, Ireland, May 2022. Association for Computational Linguistics.
  • Chen et al. (2021) Yihong Chen, Pasquale Minervini, Sebastian Riedel, and Pontus Stenetorp. Relation prediction as an auxiliary training objective for improving multi-relational graph representations. In 3rd Conference on Automated Knowledge Base Construction, 2021.
  • d’Amato (2021) Aidan Hogan Eva Blomqvist Michael Cochez Claudia d’Amato. Knowledge graphs. ACM, 2021.
  • Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  • Ermolov et al. (2022) Aleksandr Ermolov, Leyla Mirvakhabova, Valentin Khrulkov, Nicu Sebe, and Ivan Oseledets. Hyperbolic vision transformers: Combining improvements in metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7409–7419, 2022.
  • Gu et al. (2019) Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. Learning mixed-curvature representations in products of model spaces. In International conference on learning representations, volume 5, 2019.
  • Guo et al. (2020) Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering, 34(8):3549–3568, 2020.
  • Harshman et al. (1982) Richard A Harshman, Paul E Green, Yoram Wind, and Margaret E Lundy. A model for the analysis of asymmetric data in marketing research. Marketing Science, 1(2):205–242, 1982.
  • Hitchcock (1927) Frank L Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1-4):164–189, 1927.
  • Hospedales (2019) Ivana Balažević Carl Allen Timothy M. Hospedales. Tucker: Tensor factorization for knowledge graph completion. ACL, 2019.
  • Itoh et al. (2021) Jin-ichi Itoh, Joël Rouyer, and Costin Vilcu. Some inequalities for tetrahedra. Beiträge zur Algebra und Geometrie/Contributions to Algebra and Geometry, 62:705–715, 2021.
  • Kazemi and Poole (2018) Seyed Mehran Kazemi and David Poole. Simple embedding for link prediction in knowledge graphs. Advances in neural information processing systems, 31, 2018.
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • Kong (2019) Rui Chen Qingyi Hua Yan-Shuo Chang Bo Wang Lei Zhang Xiangjie Kong. A survey of collaborative filtering-based recommender systems: From traditional methods to hybrid methods based on social networks. IEEE, 2019.
  • Kostenetskiy et al. (2021) PS Kostenetskiy, RA Chulkevich, and VI Kozyrev. Hpc resources of the higher school of economics. In Journal of Physics: Conference Series, volume 1740, page 012050. IOP Publishing, 2021.
  • Krioukov et al. (2010) Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguná. Hyperbolic geometry of complex networks. Physical Review E, 82(3):036106, 2010.
  • Lacroix et al. (2018) Timothée Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical tensor decomposition for knowledge base completion. In International Conference on Machine Learning, pages 2863–2872. PMLR, 2018.
  • Law et al. (2019) Marc Law, Renjie Liao, Jake Snell, and Richard Zemel. Lorentzian distance learning for hyperbolic representations. In International Conference on Machine Learning, pages 3672–3681. PMLR, 2019.
  • Lewis et al. (2020) Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  • Loshchilov and Hutter (2019) Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  • Mahdisoltani et al. (2013) Farzaneh Mahdisoltani, Joanna Biega, and Fabian M Suchanek. Yago3: A knowledge base from multilingual wikipedias. In CIDR, 2013.
  • Miller (1995) George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
  • Mirvakhabova et al. (2020) Leyla Mirvakhabova, Evgeny Frolov, Valentin Khrulkov, Ivan Oseledets, and Alexander Tuzhilin. Performance of hyperbolic geometry models on top-n recommendation tasks. In Proceedings of the 14th ACM Conference on Recommender Systems, pages 527–532, 2020.
  • Nathani et al. (2019) Deepak Nathani, Jatin Chauhan, Charu Sharma, and Manohar Kaul. Learning attention-based embeddings for relation prediction in knowledge graphs. In Annual Meeting of the Association for Computational Linguistics, 2019.
  • Nguyen et al. (2017) Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Phung. A novel embedding model for knowledge base completion based on convolutional neural network. arXiv preprint arXiv:1712.02121, 2017.
  • Ni et al. (2015) Chien-Chun Ni, Yu-Yao Lin, Jie Gao, Xianfeng David Gu, and Emil Saucan. Ricci curvature of the internet topology. In 2015 IEEE conference on computer communications (INFOCOM), pages 2758–2766. IEEE, 2015.
  • Nickel et al. (2011) Maximilian Nickel, Volker Tresp, Hans-Peter Kriegel, et al. A three-way model for collective learning on multi-relational data. In Icml, volume 11, pages 3104482–3104584, 2011.
  • Nickel and Kiela (2017) Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
  • Pan et al. (2024) Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 2024.
  • Peng et al. (2023) Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, and Francesco Osborne. Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review, 56(11):13071–13102, 2023.
  • Peng et al. (2021) Wei Peng, Tuomas Varanka, Abdelrahman Mostafa, Henglin Shi, and Guoying Zhao. Hyperbolic deep neural networks: A survey. IEEE Transactions on pattern analysis and machine intelligence, 44(12):10023–10044, 2021.
  • Peshekhonov et al. (2024) Ivan Peshekhonov, Aleksey Arzhantsev, and Maxim Rakhuba. Training a tucker model with shared factors: a riemannian optimization approach. In International Conference on Artificial Intelligence and Statistics, pages 3304–3312. PMLR, 2024.
  • Re (2020) Ines Chami Adva Wolf Da-Cheng Juan Frederic Sala Sujith Ravi Christopher Re. Low-dimensional hyperbolic knowledge graph embeddings. ACL, 2020.
  • Rendle and Schmidt-Thieme (2010) Steffen Rendle and Lars Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the third ACM international conference on Web search and data mining, pages 81–90, 2010.
  • Salakhutdinov (2012) Geoffrey E. Hinton Nitish Srivastava Alex Krizhevsky Ilya Sutskever Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, 2012.
  • Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15, pages 593–607. Springer, 2018.
  • Seth and Sharaff (2022) Rakhi Seth and Aakanksha Sharaff. A comparative overview of hybrid recommender systems: Review, challenges, and prospects. Data mining and machine learning applications, pages 57–98, 2022.
  • Suchanek et al. (2007) Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pages 697–706, 2007.
  • Tang (2019) Zhiqing Sun Zhi-Hong Deng Jian-Yun Nie Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. ICLR, 2019.
  • Toutanova et al. (2015) Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1499–1509, 2015.
  • Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In International conference on machine learning, pages 2071–2080. PMLR, 2016.
  • Tucker (1966) Ledyard R Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279–311, 1966.
  • Ungar (2005) Abraham A Ungar. Analytic hyperbolic geometry: Mathematical foundations and applications. World Scientific, 2005.
  • Vasilev (2021) Yan-Martin Tamm Rinchin Damdinov Alexey Vasilev. Qality metrics in recommender systems: Do we calculate metrics consistently? ACM, 2021.
  • Wang et al. (2019a) Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. Knowledge graph convolutional networks for recommender systems. In The world wide web conference, pages 3307–3313, 2019a.
  • Wang et al. (2019b) Rui Wang, Bicheng Li, Shengwei Hu, Wenqian Du, and Min Zhang. Knowledge graph embedding via graph attenuated attention networks. IEEE access, 8:5212–5224, 2019b.
  • Wang et al. (2021) Shen Wang, Xiaokai Wei, Cicero Nogueira Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew Arnold, Bing Xiang, Philip S Yu, and Isabel F Cruz. Mixed-curvature multi-relational graph neural network for knowledge graph completion. In Proceedings of the web conference 2021, pages 1761–1771, 2021.
  • Xu and Wu (2020) Canran Xu and Ming Wu. Learning feature interactions with lorentzian factorization machine. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6470–6477, 2020.
  • Xu et al. (2018) Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pages 5453–5462. PMLR, 2018.
  • Yang et al. (2014) Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. CoRR, 2014.
  • Zhang et al. (2019) Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. Quaternion knowledge graph embeddings. Advances in neural information processing systems, 32, 2019.
  • Zhang et al. (2020) Siheng Zhang, Zhengya Sun, and Wensheng Zhang. Improve the translational distance models for knowledge graph embedding. Journal of Intelligent Information Systems, 55:445–467, 2020.

7 SQUARE LORENTZ DISTANCE VS. GEODESIC DISTANCE

In this section, we illustrate that the score function based on the utilized Lorentz distance behaves similarly to the geodesic distance for different t𝑡titalic_t. In particular, in Figure 8, we present the two-dimensional landscapes of the TPTF score function for t{10,0,10}𝑡10010t\in\{-10,0,10\}italic_t ∈ { - 10 , 0 , 10 }.

Refer to caption
Figure 8: The landscapes of the TPTF score function for the Lorentz and geodesic distances. As is seen in the plots, the normalized score function with the Lorentz distance mimics the behaviour of ones with geodesic distance.

8 COMPLEXITY

In this section, we estimate the complexity of our MIG-TF model.

Let us introduce the operation G×iVsubscript𝑖𝐺𝑉G\times_{i}Vitalic_G × start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V for a tensor Gd1×d2××dn𝐺superscriptsubscript𝑑1subscript𝑑2subscript𝑑𝑛G\in\mathbb{R}^{d_{1}\times d_{2}\times...\times d_{n}}italic_G ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × … × italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and matrix Vdi×m𝑉superscriptsubscript𝑑𝑖𝑚V\in\mathbb{R}^{d_{i}\times m}italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_m end_POSTSUPERSCRIPT

(G×iV)a1,,ai1,k,ai+1,,an=ai=1diGa1,a2,,anVai,k.subscriptsubscript𝑖𝐺𝑉subscript𝑎1subscript𝑎𝑖1𝑘subscript𝑎𝑖1subscript𝑎𝑛superscriptsubscriptsubscript𝑎𝑖1subscript𝑑𝑖subscript𝐺subscript𝑎1subscript𝑎2subscript𝑎𝑛subscript𝑉subscript𝑎𝑖𝑘(G\times_{i}V)_{a_{1},...,a_{i-1},k,a_{i+1},...,a_{n}}=\sum_{a_{i}=1}^{d_{i}}G% _{a_{1},a_{2},...,a_{n}}V_{a_{i},k}.( italic_G × start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V ) start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_k , italic_a start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT .

Let nesubscript𝑛𝑒n_{e}italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and nrsubscript𝑛𝑟n_{r}italic_n start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT be respectively the numbers of entities and relations in a knowledge graph, b𝑏bitalic_b – a batch size, desubscript𝑑𝑒d_{e}italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and drsubscript𝑑𝑟d_{r}italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT – embedding sizes of the TuckER model, dhsubscript𝑑d_{h}italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT – the size of hyperbolic embeddings. Let the core of the Tucker decomposition be Gde×de×dr𝐺superscriptsubscript𝑑𝑒subscript𝑑𝑒subscript𝑑𝑟G\in\mathbb{R}^{d_{e}\times d_{e}\times d_{r}}italic_G ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, the matrices of embeddings – the entities embedding matrix Ub×de𝑈superscript𝑏subscript𝑑𝑒U\in\mathbb{R}^{b\times d_{e}}italic_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_b × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, the relation embedding matrix Tb×dr𝑇superscript𝑏subscript𝑑𝑟T\in\mathbb{R}^{b\times d_{r}}italic_T ∈ blackboard_R start_POSTSUPERSCRIPT italic_b × italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and another entity embedding matrix Vne×de𝑉superscriptsubscript𝑛𝑒subscript𝑑𝑒V\in\mathbb{R}^{n_{e}\times d_{e}}italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Similarly, the matrices in the hyperbolic model are Ub×dhsuperscript𝑈superscript𝑏subscript𝑑U^{\mathcal{L}}\in\mathbb{R}^{b\times d_{h}}italic_U start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_b × italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, Tb×dhsuperscript𝑇superscript𝑏subscript𝑑T^{\mathcal{L}}\in\mathbb{R}^{b\times d_{h}}italic_T start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_b × italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and Vne×dhsuperscript𝑉superscriptsubscript𝑛𝑒subscript𝑑V^{\mathcal{L}}\in\mathbb{R}^{n_{e}\times d_{h}}italic_V start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

Firstly, we find the complexity of the TuckER model. To compute the TuckER score function, we firstly find G1=G×1Ub×de×drsubscript𝐺1subscript1𝐺𝑈superscript𝑏subscript𝑑𝑒subscript𝑑𝑟G_{1}=G\times_{1}U\in\mathbb{R}^{b\times d_{e}\times d_{r}}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_G × start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_b × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. It takes O(bde(dedr))𝑂𝑏subscript𝑑𝑒subscript𝑑𝑒subscript𝑑𝑟O(bd_{e}(d_{e}d_{r}))italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ) operations. Then we compute G2=G1×3Tb×de×bsubscript𝐺2subscript3subscript𝐺1𝑇superscript𝑏subscript𝑑𝑒𝑏G_{2}=G_{1}\times_{3}T\in\mathbb{R}^{b\times d_{e}\times b}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_T ∈ blackboard_R start_POSTSUPERSCRIPT italic_b × italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT × italic_b end_POSTSUPERSCRIPT with O(bdr(bde))𝑂𝑏subscript𝑑𝑟𝑏subscript𝑑𝑒O(bd_{r}(bd_{e}))italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) ) operations. Finally, the diagonal slice of G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is taken and multiplied with V𝑉Vitalic_V with O(bdene)𝑂𝑏subscript𝑑𝑒subscript𝑛𝑒O(bd_{e}n_{e})italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) operations. Overall, the total number of operations is:

OTuckER=O(bde(dedr)+bdr(bde)+bdene)=O(bde(dedr+bde+ne)).subscript𝑂TuckER𝑂𝑏subscript𝑑𝑒subscript𝑑𝑒subscript𝑑𝑟𝑏subscript𝑑𝑟𝑏subscript𝑑𝑒𝑏subscript𝑑𝑒subscript𝑛𝑒𝑂𝑏subscript𝑑𝑒subscript𝑑𝑒subscript𝑑𝑟𝑏subscript𝑑𝑒subscript𝑛𝑒\displaystyle O_{\textit{TuckER}}=O(bd_{e}(d_{e}d_{r})+bd_{r}(bd_{e})+bd_{e}n_% {e})=O(bd_{e}(d_{e}d_{r}+bd_{e}+n_{e})).italic_O start_POSTSUBSCRIPT TuckER end_POSTSUBSCRIPT = italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) + italic_b italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) + italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) ) .

Secondly, we find the complexity of the TPTF model. First of all, we compute Lorentz zero element of embeddings x𝑥xitalic_x as

x0=β+i=1dxi2,subscript𝑥0𝛽superscriptsubscript𝑖1𝑑superscriptsubscript𝑥𝑖2x_{0}=\sqrt{\beta+\sum_{i=1}^{d}x_{i}^{2}},italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = square-root start_ARG italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

which is O(dh(b+ne))𝑂subscript𝑑𝑏subscript𝑛𝑒O(d_{h}(b+n_{e}))italic_O ( italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_b + italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) ) operations. Than we compute Lorentz inner products and it takes O(bdhne+bdhne+bdhne)=O(bdhne)𝑂𝑏subscript𝑑subscript𝑛𝑒𝑏subscript𝑑subscript𝑛𝑒𝑏subscript𝑑subscript𝑛𝑒𝑂𝑏subscript𝑑subscript𝑛𝑒O(bd_{h}n_{e}+bd_{h}n_{e}+bd_{h}n_{e})=O(bd_{h}n_{e})italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_b italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_b italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ). Then we compute the sum of all inner products SHsubscript𝑆𝐻S_{H}italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT using O(bne)𝑂𝑏subscript𝑛𝑒O(bn_{e})italic_O ( italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) operations. Finally, we compute the normalization NHsubscript𝑁𝐻N_{H}italic_N start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT with O(bne)𝑂𝑏subscript𝑛𝑒O(bn_{e})italic_O ( italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) operations and make elementwise division SH/NHsubscript𝑆𝐻subscript𝑁𝐻{S_{H}}/{N_{H}}italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT / italic_N start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT with the complexity O(bne)𝑂𝑏subscript𝑛𝑒O(bn_{e})italic_O ( italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ). Overall, the TPTF complexity is:

OTPTF=O(dh(b+ne)+bdhne+bne+bne+bne)=O(bnedh).subscript𝑂TPTF𝑂subscript𝑑𝑏subscript𝑛𝑒𝑏subscript𝑑subscript𝑛𝑒𝑏subscript𝑛𝑒𝑏subscript𝑛𝑒𝑏subscript𝑛𝑒𝑂𝑏subscript𝑛𝑒subscript𝑑\displaystyle O_{\textit{TPTF}}=O(d_{h}(b+n_{e})+bd_{h}n_{e}+bn_{e}+bn_{e}+bn_% {e})=O(bn_{e}d_{h}).italic_O start_POSTSUBSCRIPT TPTF end_POSTSUBSCRIPT = italic_O ( italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_b + italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) + italic_b italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = italic_O ( italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) .

The complexity of MIG-TF iteration is a sum of TuckER and TPTF complexities and is equal to:

OMIG-TF=OTuckER+OTPTF+O(bne)=O(bde(dedr+bde+ne)+bnedh).subscript𝑂MIG-TFsubscript𝑂TuckERsubscript𝑂TPTF𝑂𝑏subscript𝑛𝑒𝑂𝑏subscript𝑑𝑒subscript𝑑𝑒subscript𝑑𝑟𝑏subscript𝑑𝑒subscript𝑛𝑒𝑏subscript𝑛𝑒subscript𝑑\displaystyle O_{\textit{MIG-TF}}=O_{\textit{TuckER}}+O_{\textit{TPTF}}+O(bn_{% e})=O(bd_{e}(d_{e}d_{r}+bd_{e}+n_{e})+bn_{e}d_{h}).italic_O start_POSTSUBSCRIPT MIG-TF end_POSTSUBSCRIPT = italic_O start_POSTSUBSCRIPT TuckER end_POSTSUBSCRIPT + italic_O start_POSTSUBSCRIPT TPTF end_POSTSUBSCRIPT + italic_O ( italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) = italic_O ( italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_b italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) + italic_b italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) .

In terms of time and parameter complexity, our MIG-TF model adds only 30%percent3030\%30 % overhead on training time and 25%percent2525\%25 % overhead on inference time in comparison to TuckER.

9 MIXED GEOMETRY TENSOR FACTORIZATION MODIFICATIONS

In this section, we describe in more detail additional techniques that we use for our MIG-TF model.

9.1 Riemannian MIG-TF

To improve the quality of the TuckER (Hospedales, 2019) model we use the so-called RTuckER model (Peshekhonov et al., 2024). This model has the same architecture as TuckER, but was trained with the methods of Riemannian optimisation. This Riemannian model significant outperforms TuckER on some knowledge graphs. For more details we refer to (Peshekhonov et al., 2024). We utilize RTuckER as a Euclidean term of our MIG-TF model only on WN18RR knowledge graph on which the Riemannian model significantly outperforms TuckER.

9.2 MIG-TF with QR decomposition

Refer to caption
Figure 9: Correlation maps for relations of knowledge graphs FB15k-237, WN18RR and YAGO3-10. As can be seen the vast majority of relations in knowledge graphs are uncorrelated. Therefore, the orthoganality of relation embeddings is promising as embeddings of uncorrelated relations do not correlate.

As was shown in Figure 9, most of the relations in knowledge graphs have no correlations, therefore maintaining the orthogonality of relation embeddings creates better vector representation. For this purpose, on each iteration we update the hyperbolic relation embedding matrix Tsuperscript𝑇T^{\mathcal{L}}italic_T start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT with the help of the QR decomposition as follows:

T=QR.superscript𝑇superscript𝑄𝑅T^{\mathcal{L}}=Q^{\mathcal{L}}R.italic_T start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT = italic_Q start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT italic_R . (8)

Then we compute the score function using Qsuperscript𝑄Q^{\mathcal{L}}italic_Q start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT instead of Tsuperscript𝑇T^{\mathcal{L}}italic_T start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT:

(SMIG-TF)i=SE(G,u,Vi,t)+SH(u,Vi,q).subscriptsubscript𝑆MIG-TF𝑖subscript𝑆𝐸𝐺𝑢subscript𝑉𝑖𝑡subscript𝑆𝐻superscript𝑢superscriptsubscript𝑉𝑖superscript𝑞(S_{\textit{MIG-TF}})_{i}=S_{E}(G,u,V_{i},t)+S_{H}(u^{\mathcal{L}},V_{i}^{% \mathcal{L}},q^{\mathcal{L}}).( italic_S start_POSTSUBSCRIPT MIG-TF end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_G , italic_u , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t ) + italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT , italic_q start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT ) .

In these formulas, G𝐺Gitalic_G is the core of Tucker decomposition, u𝑢uitalic_u and usuperscript𝑢u^{\mathcal{L}}italic_u start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT are the Euclidean and hyperbolic entity embeddings, Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Visuperscriptsubscript𝑉𝑖V_{i}^{\mathcal{L}}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT are respectively the i𝑖iitalic_i-th rows of the Euclidean V𝑉Vitalic_V and of the hyperbolic Vsuperscript𝑉V^{\mathcal{L}}italic_V start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT entity embedding matrices. The vector t𝑡titalic_t is the Euclidean relation embedding and qsuperscript𝑞q^{\mathcal{L}}italic_q start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT is a row of Qsuperscript𝑄Q^{\mathcal{L}}italic_Q start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT from (8). We call this modified MIG-TF as MIG-TFQR.

The MIG-TF model with QR decomposition significantly outperform MIG-TF on knowledge graphs WN18RR and YAGO3-10 where the condition of non correlation of relations is fulfilled almost everywhere. However, MIG-TFQR slightly loses to MIG-TF on FB15k-237 graph, where the condition of non correlation is usually violated (see Figure 9).

10 ROBUSTNESS

To analyze our models’ robustness, we add random links in the training data knowledge graph. If ntrainsubscript𝑛𝑡𝑟𝑎𝑖𝑛n_{train}italic_n start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT is the number of links in train data, then we add αn𝛼𝑛\alpha nitalic_α italic_n random links to train data. We train the mixed geometry model on poisoned training data and test it on the test data, see Figure 10. As we see, even with high levels of noise (10%percent1010\%10 %), both MIG-TF and MIG-TFQRsubscriptMIG-TF𝑄𝑅\mbox{MIG-TF}_{QR}MIG-TF start_POSTSUBSCRIPT italic_Q italic_R end_POSTSUBSCRIPT models do not significantly degrade.

Refer to caption
Figure 10: The HR@10 metric on WN18RR knowledge graph depended on the fraction of poison α𝛼\alphaitalic_α in training data for 2222 mixed geometry models.

11 THE IMPACT OF EUCLIDEAN AND HYPERBOLIC TERMS

In this section, we analyze impact of different terms in the MIG-TF score function. We represent the score function SMIG-TFsubscript𝑆MIG-TFS_{\textit{MIG-TF}}italic_S start_POSTSUBSCRIPT MIG-TF end_POSTSUBSCRIPT as:

SMIG-TF=μSE+(1μ)SH,μ(0,1),formulae-sequencesubscript𝑆MIG-TF𝜇subscript𝑆𝐸1𝜇subscript𝑆𝐻𝜇01S_{\textit{MIG-TF}}=\mu S_{E}+(1-\mu)S_{H},\quad\mu\in(0,1),italic_S start_POSTSUBSCRIPT MIG-TF end_POSTSUBSCRIPT = italic_μ italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT + ( 1 - italic_μ ) italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT , italic_μ ∈ ( 0 , 1 ) ,

where SEsubscript𝑆𝐸S_{E}italic_S start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT and SHsubscript𝑆𝐻S_{H}italic_S start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT are respectively Euclidean and hyperbolic components of the score function and μ𝜇\muitalic_μ is a parameter. By changing μ𝜇\muitalic_μ, we measure the effect of the score function components on the HR@10 metric on WN18RR knowledge graph, see Figure 11. It is interesting to note that the best quality is achieved with the equal effect of the Euclidean and hyperbolic terms. This justifies our choice of the score function.

Refer to caption
Figure 11: HR@10 against the μ𝜇\muitalic_μ parameter for 2222 mixed geometry models. The maximum of HR@10 metrics on WN18RR is achieved in μ=0.5𝜇0.5\mu=0.5italic_μ = 0.5, where the contributions of Euclidean and hyperbolic terms of score function are equal.

12 OPTIMAL CURVATURE

This section demonstrates dependence of link prediction quality on WN18RR knowledge graph on the curvature in hyperbolic parts of TPTF and MIG-TF models, see Figure 12. Note that the optimal curvature for hyperbolic and mixed geometry models are different from each other.

Refer to caption
Figure 12: The top graph presents the dependence of HR@10𝐻𝑅@10HR@10italic_H italic_R @ 10 metric on WN18RR knowledge graph from curvature of Hyperbolic space β𝛽\betaitalic_β for TPTF model. The bottom one presents the dependence of metric from curvature for MIG-TF model.

13 HYPERPARAMETERS

In this section, we present the hyperparameters of both the Euclidean term (Table 4) and the hyperbolic term (Table 6) of our MIG-TF model. In this tables, de,drsubscript𝑑𝑒subscript𝑑𝑟d_{e},d_{r}italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are the sizes of entity and relation embeddings, lr𝑙𝑟lritalic_l italic_r is a learning rate, 𝒩(0,ρe2I),𝒩(0,ρr2I)𝒩0superscriptsubscript𝜌𝑒2𝐼𝒩0superscriptsubscript𝜌𝑟2𝐼\mathcal{N}(0,\rho_{e}^{2}I),\mathcal{N}(0,\rho_{r}^{2}I)caligraphic_N ( 0 , italic_ρ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) , caligraphic_N ( 0 , italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) – distributions of initial weights of entities and relations embeddings in the TPTF component of the hybrid model and β𝛽\betaitalic_β is a curvature of hyperboloid in the Lorentz term of MIG-TF model. All the experiments we performed were done on one GPU NVIDIA V100.

Table 4: Hyperparameters for the TuckER component.
𝖧𝗒𝗉𝖾𝗋𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝗋𝖧𝗒𝗉𝖾𝗋𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝗋\mathsf{Hyperparametr}sansserif_Hyperparametr 𝖶𝖭𝟣𝟪𝖱𝖱𝖶𝖭𝟣𝟪𝖱𝖱\mathsf{WN18RR}sansserif_WN18RR 𝖥𝖡𝟣𝟧𝗄-𝟤𝟥𝟩𝖥𝖡𝟣𝟧𝗄-237\mathsf{FB15k\text{-}237}sansserif_FB15k - sansserif_237 𝖸𝖠𝖦𝖮𝟥-𝟣𝟢𝖸𝖠𝖦𝖮𝟥-10\mathsf{YAGO3\text{-}10}sansserif_YAGO3 - sansserif_10
desubscript𝑑𝑒d_{e}italic_d start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 200 200 200
drsubscript𝑑𝑟d_{r}italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 30 200 30
lr𝑙𝑟lritalic_l italic_r 0.01 0.001 0.003
epochs𝑒𝑝𝑜𝑐𝑠epochsitalic_e italic_p italic_o italic_c italic_h italic_s 500 500 500
Dropout1𝐷𝑟𝑜𝑝𝑜𝑢subscript𝑡1Dropout_{1}italic_D italic_r italic_o italic_p italic_o italic_u italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 0.2 0.3 0.2
Dropout2𝐷𝑟𝑜𝑝𝑜𝑢subscript𝑡2Dropout_{2}italic_D italic_r italic_o italic_p italic_o italic_u italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.2 0.4 0.2
Dropout3𝐷𝑟𝑜𝑝𝑜𝑢subscript𝑡3Dropout_{3}italic_D italic_r italic_o italic_p italic_o italic_u italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.3 0.5 0.2
Table 5: Hyperparameters for the TPTF component.
𝖧𝗒𝗉𝖾𝗋𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝗋𝖧𝗒𝗉𝖾𝗋𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝗋\mathsf{Hyperparametr}sansserif_Hyperparametr 𝖶𝖭𝟣𝟪𝖱𝖱𝖶𝖭𝟣𝟪𝖱𝖱\mathsf{WN18RR}sansserif_WN18RR 𝖥𝖡𝟣𝟧𝗄-𝟤𝟥𝟩𝖥𝖡𝟣𝟧𝗄-237\mathsf{FB15k\text{-}237}sansserif_FB15k - sansserif_237 𝖸𝖠𝖦𝖮𝟥-𝟣𝟢𝖸𝖠𝖦𝖮𝟥-10\mathsf{YAGO3\text{-}10}sansserif_YAGO3 - sansserif_10
dsubscript𝑑d_{\mathcal{L}}italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT 50 50 50
lr𝑙𝑟lritalic_l italic_r 0.003 0.002 0.005
β𝛽\betaitalic_β 1.3 1 1.1
epochs𝑒𝑝𝑜𝑐𝑠epochsitalic_e italic_p italic_o italic_c italic_h italic_s 250 150 250
ρesubscript𝜌𝑒\rho_{e}italic_ρ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 0.005 0.001 0.001
ρrsubscript𝜌𝑟\rho_{r}italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 0.005 0.001 0.001
Dropoute𝐷𝑟𝑜𝑝𝑜𝑢subscript𝑡𝑒Dropout_{e}italic_D italic_r italic_o italic_p italic_o italic_u italic_t start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 0.2 0.3 0.2
Dropoutr𝐷𝑟𝑜𝑝𝑜𝑢subscript𝑡𝑟Dropout_{r}italic_D italic_r italic_o italic_p italic_o italic_u italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 0.2 0.3 0.2
Table 6: Hyperparameters for the MIG-TF component.
𝖧𝗒𝗉𝖾𝗋𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝗋𝖧𝗒𝗉𝖾𝗋𝗉𝖺𝗋𝖺𝗆𝖾𝗍𝗋\mathsf{Hyperparametr}sansserif_Hyperparametr 𝖶𝖭𝟣𝟪𝖱𝖱𝖶𝖭𝟣𝟪𝖱𝖱\mathsf{WN18RR}sansserif_WN18RR 𝖥𝖡𝟣𝟧𝗄-𝟤𝟥𝟩𝖥𝖡𝟣𝟧𝗄-237\mathsf{FB15k\text{-}237}sansserif_FB15k - sansserif_237 𝖸𝖠𝖦𝖮𝟥-𝟣𝟢𝖸𝖠𝖦𝖮𝟥-10\mathsf{YAGO3\text{-}10}sansserif_YAGO3 - sansserif_10
dsubscript𝑑d_{\mathcal{L}}italic_d start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT 50 50 50
lr𝑙𝑟lritalic_l italic_r 0.003 0.001 0.005
β𝛽\betaitalic_β 1.5 1 1.1
epochs𝑒𝑝𝑜𝑐𝑠epochsitalic_e italic_p italic_o italic_c italic_h italic_s 250 100 150
ρesubscript𝜌𝑒\rho_{e}italic_ρ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 0.005 0.001 0.001
ρrsubscript𝜌𝑟\rho_{r}italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 0.005 0.001 0.001
Dropoute𝐷𝑟𝑜𝑝𝑜𝑢subscript𝑡𝑒Dropout_{e}italic_D italic_r italic_o italic_p italic_o italic_u italic_t start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 0.2 0.3 0.2
Dropoutr𝐷𝑟𝑜𝑝𝑜𝑢subscript𝑡𝑟Dropout_{r}italic_D italic_r italic_o italic_p italic_o italic_u italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 0.2 0.3 0.2