License: CC BY 4.0
arXiv:2306.00010v3 [cs.LG] 12 Mar 2024

[orcid=0000-0002-4280-5945] \cormark[1] [orcid=0000-0001-9937-0033] url]https://personal.us.es/rogodi [orcid=0000-0002-3624-6139] URL]http://www.cs.us.es/ naranjo/ \cortext[cor1]Corresponding author

Trainable and Explainable Simplicial Map Neural Networks

Eduardo Paluzo-Hidalgo [email protected] Department of Quantitative Methods, Universidad Loyola Andalucía, Campus Sevilla, Dos Hermanas, Seville, Spain    Rocio Gonzalez-Diaz [email protected][ Department of Applied Mathematics I, School of engineering, University of Seville, Seville, Spain    Miguel A. Gutiérrez-Naranjo [email protected][ Department of Computer Science and Artificial Intelligence, School of Engineering, University of Seville, Seville, Spain
Abstract

Simplicial map neural networks (SMNNs) are topology-based neural networks with interesting properties such as universal approximation ability and robustness to adversarial examples under appropriate conditions. However, SMNNs present some bottlenecks for their possible application in high-dimensional datasets. First, SMNNs have precomputed fixed weight and no SMNN training process has been defined so far, so they lack generalization ability. Second, SMNNs require the construction of a convex polytope surrounding the input dataset. In this paper, we overcome these issues by proposing an SMNN training procedure based on a support subset of the given dataset and replacing the construction of the convex polytope by a method based on projections to a hypersphere. In addition, the explainability capacity of SMNNs and effective implementation are also newly introduced in this paper.

keywords:
Training neural network \sepSimplicial maps \sep\sepExplainable artificial intelligence

1 Introduction

In recent years, Artificial Intelligence (AI) methods in general and Machine Learning methods in particular have reached success in real-life problems that were unexpected only a few years ago. Many different areas have contributed to this development. Among them, we can cite the research on new theoretical algorithms, the increasing computational power of the latest generation hardware, and the rapid access to a huge amount of data. Such a combination of factors leads to the development of increasingly complex self-regulated AI methods.

Many AI models currently used are based on backpropagation algorithms, which train and regulate themselves to achieve a goal, such as classification, recommendation, or prediction. These self-regulating models achieve some kind of knowledge as they successfully evaluate test data independent of the data used to train them. Nonetheless, such knowledge is usually expressed in a non-human-readable way.

To fill the gap between the recent development of AI models and their social use, many researchers have focused on the development of Explainable Artificial Intelligence (XAI), which consists of a set of techniques to provide clear, understandable, transparent, intelligible, trustworthy, and interpretable explanations of the decisions, predictions, and reasoning processes made by the AI models, rather than just presenting their output, especially in domains where AI decisions can have significant consequences on human life. A global taxonomy of interpretable AI with the aim of unifying terminology to achieve clarity and efficiency in the definition of regulations for the development of ethical and reliable AI can be found in [1]. Moreover, a nice introduction and general vision can be found in [2]. Another clarifying paper with definitions, concepts, and applications of XAI is [3].

The so-called Simplicial Map Neural Networks (SMNNs) were introduced in [4] as a constructive approach to the problem of approximating a continuous function on a compact set in a triangulated space. Since the original aim of the definition of SMNNs was focused on building a constructive approach, the computation of their weights was not based on an optimization process, as usual in neural networks, but on a deterministic calculus. The architecture of SMNNs and the computation of the set of weights are based on a combinatorial topology tool called simplicial maps. Moreover, SMNNs can be used for classification and can be constructed to be robust to adversarial examples [5]. Besides, their architecture can be reduced while maintaining accuracy [6], being invariant to transformation if the transformation preserves the barycentric coordinates (scale, rotation, symmetries, etc.). As defined in [4, 6, 5], SMNNs are built as a two-hidden-layer feed-forward network where the set of weights is precomputed based on the calculation of a triangulated convex polytope surrounding the input data. As other approximations to continuous functions with arbitrary precision (see, for example, [7]), SMNNs have fixed weights, which means that the weights depend only on the triangulation made with the points of the dataset as the support set and no training process is applied.

Summing up, some of the limitations of the SMNNs until now are that they are costly to calculate since the number of neurons is proportional to the number of simplices of the triangulation supported on the input dataset, and they suffer from overfitting and therefore not generalize well. These aspects make SMNNs not used in practice so far, although the idea of relating simplicial maps to neural networks is disruptive and provides a new bridge that can enrich both areas.

In this paper, we propose a method to make SMNNs efficient by reducing their size (in terms of the number of neurons that depends on the vertices of the triangulation) and that successfully makes SMNNs trainable and with generalization ability. Besides, we also present a study of the selection of the vertices from which we obtain the triangulation. Although SMNNs consider the vertices of a simplex as part of the necessary information for the classification task, the approach presented in this paper is far from the classic Machine Learning instance-based methods. Such methods rely on a deterministic computation based on distances, but, in the approach presented in this paper, the computation of the weights is the result of an optimization method in a probability distribution space. Finally, from an XAI point of view, we will see in this paper that SMNNs are explainable models since all decision steps to compute the output of SMNNs are understandable and transparent, and therefore trustworthy.

The paper is organized as follows. First, some concepts of computational topology and the definition of SMNNs are recalled in Section 2. Next, in Section 3 we develop several technical details needed for the SMNN training process, which will be introduced in Section 4. Section 5 is devoted to the explainability of the model. Section 6 is devoted to discussion and limitations. Finally, the paper ends with some experiments and conclusions.

2 Background

In this section, we assume that the reader is familiar with the basic concepts of computational topology. For a comprehensive presentation, we refer to [8].

2.1 Simplicial complexes

Consider a finite set of points V={v1,,vβ}n𝑉superscript𝑣1superscript𝑣𝛽superscript𝑛V=\{v^{1},\dots,v^{\beta}\}\subset\mathbb{R}^{n}italic_V = { italic_v start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_v start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT } ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT whose elements will be called vertices. A subset

σ=vi0,vi1,,vid𝜎superscript𝑣subscript𝑖0superscript𝑣subscript𝑖1superscript𝑣subscript𝑖𝑑\sigma=\langle v^{i_{0}},v^{i_{1}},\dots,v^{i_{d}}\rangleitalic_σ = ⟨ italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩

of V𝑉Vitalic_V with d+1𝑑1d+1italic_d + 1 vertices (in general position) is called a d𝑑ditalic_d-simplex. The convex hull of the vertices of σ𝜎\sigmaitalic_σ will be denoted by |σ|𝜎|\sigma|| italic_σ | and corresponds to the set:

{xn:x=j[[0,d]]bj(x)vij}conditional-set𝑥superscript𝑛𝑥subscript𝑗delimited-[]0𝑑subscript𝑏𝑗𝑥superscript𝑣subscript𝑖𝑗\Big{\{}x\in\mathbb{R}^{n}:\,x=\sum_{\scriptscriptstyle j\in[\![0,d]\!]}b_{j}(% x)v^{i_{j}}\Big{\}}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_x = ∑ start_POSTSUBSCRIPT italic_j ∈ [ [ 0 , italic_d ] ] end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }

where [[a,b]]={a,a+1,,b}delimited-[]𝑎𝑏𝑎𝑎1𝑏[\![a,b]\!]=\{a,a+1,\dots,b\}[ [ italic_a , italic_b ] ] = { italic_a , italic_a + 1 , … , italic_b } for a<b𝑎𝑏a<b\in\mathbb{Z}italic_a < italic_b ∈ blackboard_Z, and

b(x)=(b0(x),b1(x),,bd(x))𝑏𝑥subscript𝑏0𝑥subscript𝑏1𝑥subscript𝑏𝑑𝑥b(x)=(b_{0}(x),b_{1}(x),\dots,b_{d}(x))italic_b ( italic_x ) = ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , … , italic_b start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) )

are called the barycentric coordinates of x𝑥xitalic_x with respect to σ𝜎\sigmaitalic_σ, and satisfy that:

j[[0,d]]bj(x)=1 and bj(x)0j[[0,d]].subscript𝑗delimited-[]0𝑑subscript𝑏𝑗𝑥1 and subscript𝑏𝑗𝑥0for-all𝑗delimited-[]0𝑑\mbox{$\sum_{\scriptscriptstyle j\in[\![0,d]\!]}b_{j}(x)=1\;\mbox{ and }\;b_{j% }(x)\geq 0\;\forall j\in[\![0,d]\!]$}\,.∑ start_POSTSUBSCRIPT italic_j ∈ [ [ 0 , italic_d ] ] end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) = 1 and italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≥ 0 ∀ italic_j ∈ [ [ 0 , italic_d ] ] .

The barycentric coordinates of x𝑥xitalic_x can be interpreted as masses placed at the vertices of σ𝜎\sigmaitalic_σ so x𝑥xitalic_x is the centre of mass. All these masses are positive if and only if x𝑥xitalic_x is inside σ𝜎\sigmaitalic_σ. For example, let us consider the 1-simplex ϵ=vi0,vi1italic-ϵsuperscript𝑣subscript𝑖0superscript𝑣subscript𝑖1\epsilon=\langle v^{i_{0}},v^{i_{1}}\rangleitalic_ϵ = ⟨ italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ which is composed of two vertices of V𝑉Vitalic_V. Then |ϵ|italic-ϵ|\epsilon|| italic_ϵ | is the set of points in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT corresponding to the edge with endpoints vi0superscript𝑣subscript𝑖0v^{i_{0}}italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and vi1superscript𝑣subscript𝑖1v^{i_{1}}italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and if, for example, b(x)=(0.5,0.5)𝑏𝑥0.50.5b(x)=(0.5,0.5)italic_b ( italic_x ) = ( 0.5 , 0.5 ) then x𝑥xitalic_x is the midpoint of |ϵ|italic-ϵ|\epsilon|| italic_ϵ |.

A simplicial complex K𝐾Kitalic_K with vertex set V𝑉Vitalic_V consists of a finite collection of simplices satisfying that if σK𝜎𝐾\sigma\in Kitalic_σ ∈ italic_K then either σ=v𝜎delimited-⟨⟩𝑣\sigma=\langle v\rangleitalic_σ = ⟨ italic_v ⟩ for some vV𝑣𝑉v\in Vitalic_v ∈ italic_V or any face (that is, a nonempty subset) of σ𝜎\sigmaitalic_σ is a simplex of K𝐾Kitalic_K. Furthermore, if σ,μK𝜎𝜇𝐾\sigma,\mu\in Kitalic_σ , italic_μ ∈ italic_K then |σ||μ|=𝜎𝜇|\sigma|\cap|\mu|=\emptyset| italic_σ | ∩ | italic_μ | = ∅ or |σ||μ|=|γ|𝜎𝜇𝛾|\sigma|\cap|\mu|=|\gamma|| italic_σ | ∩ | italic_μ | = | italic_γ | for some γK𝛾𝐾\gamma\in Kitalic_γ ∈ italic_K. The set σK|σ|subscript𝜎𝐾𝜎\bigcup_{\sigma\in K}|\sigma|⋃ start_POSTSUBSCRIPT italic_σ ∈ italic_K end_POSTSUBSCRIPT | italic_σ | will be denoted by |K|𝐾|K|| italic_K |. A maximal simplex of K𝐾Kitalic_K is a simplex that is not the face of any other simplex of K𝐾Kitalic_K. If the maximal simplices of K𝐾Kitalic_K are all d𝑑ditalic_d-simplices then K𝐾Kitalic_K is called a pure d𝑑ditalic_d-simplicial complex. These concepts are illustrated in Figure 1.

Refer to caption
Figure 1: On the left, two triangles that do not intersect in a common face (an edge or a vertex). On the right, the geometric representation |K|𝐾|K|| italic_K | of a pure 2-simplicial complex K𝐾Kitalic_K composed of three maximal 2222-simplices (the triangles σ1superscript𝜎1\sigma^{1}italic_σ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and σ3superscript𝜎3\sigma^{3}italic_σ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT). The edge μ2superscript𝜇2\mu^{2}italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a common face of σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and σ3superscript𝜎3\sigma^{3}italic_σ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. The edge μ1superscript𝜇1\mu^{1}italic_μ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT is a face of σ1superscript𝜎1\sigma^{1}italic_σ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT.

The barycentric coordinates of x𝑥xitalic_x with respect to the simplicial complex |K|𝐾|K|| italic_K | are defined as the barycentric coordinates of x𝑥xitalic_x with respect to σK𝜎𝐾\sigma\in Kitalic_σ ∈ italic_K such that x|σ|𝑥𝜎x\in|\sigma|italic_x ∈ | italic_σ |. Let us observe that the barycentric coordinates of x|K|𝑥𝐾x\in|K|italic_x ∈ | italic_K | are unique.

An example of simplicial complexes is the Delaunay triangulation Del(V)Del𝑉\operatorname{Del}(V)roman_Del ( italic_V ) defined from the Voronoi diagram of a given finite set of vertices V𝑉Vitalic_V. The following result extracted from [9, page 48] is just an alternative definition of Delaunay triangulations.

The empty ball property [9]: Any subset σ𝜎\sigmaitalic_σ of V𝑉Vitalic_V is a simplex of Del(V)Del𝑉\,\operatorname{Del}(V)roman_Del ( italic_V ) if and only if |σ|𝜎|\sigma|| italic_σ | has a circumscribing open ball empty of points of V𝑉Vitalic_V.

2.2 Simplicial maps

Let K𝐾Kitalic_K be a pure n𝑛nitalic_n-simplicial complex and L𝐿Litalic_L a pure k𝑘kitalic_k-simplicial complex with vertex sets V𝑉Vitalic_V and W𝑊Witalic_W, respectively. The map φ(0):VW:superscript𝜑0𝑉𝑊\varphi^{\scriptscriptstyle(0)}:V\to Witalic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT : italic_V → italic_W is called a vertex map if it satisfies that the set obtained from {φ(0)(vi0),,φ(0)(vid)}superscript𝜑0superscript𝑣subscript𝑖0superscript𝜑0superscript𝑣subscript𝑖𝑑\big{\{}\varphi^{\scriptscriptstyle(0)}(v^{i_{0}}),\dots,\varphi^{% \scriptscriptstyle(0)}(v^{i_{d}})\big{\}}{ italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) , … , italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) } after removing duplicated vertices is a simplex in L𝐿Litalic_L whenever vi0,,vidsuperscript𝑣subscript𝑖0superscript𝑣subscript𝑖𝑑\langle v^{i_{0}},\dots,v^{i_{d}}\rangle⟨ italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ is a simplex in K𝐾Kitalic_K. The vertex map φ(0)superscript𝜑0\varphi^{\scriptscriptstyle(0)}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT always induces a continuous function, called a simplicial map φ:|K||L|:𝜑𝐾𝐿\varphi:|K|\to|L|italic_φ : | italic_K | → | italic_L |, which is defined as follows. Let b(x)=(b0(x),,bn(x))𝑏𝑥subscript𝑏0𝑥subscript𝑏𝑛𝑥b(x)=(b_{0}(x),\dots,b_{n}(x))italic_b ( italic_x ) = ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) , … , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) ) be the barycentric coordinates of x|K|𝑥𝐾x\in|K|italic_x ∈ | italic_K | with respect to σ=vi0,,vinK𝜎superscript𝑣subscript𝑖0superscript𝑣subscript𝑖𝑛𝐾\sigma=\langle v^{i_{0}},\dots,v^{i_{n}}\rangle\in Kitalic_σ = ⟨ italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ ∈ italic_K. Then

φ(x)=j[[0,n]]bj(x)φ(0)(vij).𝜑𝑥subscript𝑗delimited-[]0𝑛subscript𝑏𝑗𝑥superscript𝜑0superscript𝑣subscript𝑖𝑗\mbox{$\varphi(x)=\sum_{\scriptscriptstyle j\in[\![0,n]\!]}b_{j}(x)\varphi^{% \scriptscriptstyle(0)}(v^{i_{j}})$}.italic_φ ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_j ∈ [ [ 0 , italic_n ] ] end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_v start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) .

Let us observe that φ(x)=φ(0)(x)𝜑𝑥superscript𝜑0𝑥\varphi(x)=\varphi^{\scriptscriptstyle(0)}(x)italic_φ ( italic_x ) = italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_x ) if xV𝑥𝑉x\in Vitalic_x ∈ italic_V.

A special kind of simplicial map used to solve classification tasks will be introduced in the next subsection.

2.3 Simplicial maps for classification tasks

Next, we will show how a simplicial map can be used to solve a classification problem (see [5] for details). From now on, we will assume that the input dataset is a finite set of points V𝑉Vitalic_V in nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT together with a set of k𝑘kitalic_k labels ΛΛ\Lambdaroman_Λ such that each vV𝑣𝑉v\in Vitalic_v ∈ italic_V is tagged with a label λ𝜆\lambdaitalic_λ taken from ΛΛ\Lambdaroman_Λ.

Firstly, the intuition is that the space surrounding the dataset is labelled as unknown. For this, we add a new label to ΛΛ\Lambdaroman_Λ, called unknown label, and a one-hot encoding representation Wk+1k+1superscript𝑊𝑘1superscript𝑘1W^{k+1}\subset\mathbb{R}^{k+1}italic_W start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT of these k+1𝑘1k+1italic_k + 1 labels being:

Wk+1={j=(0,j,0,1,0,kj,0):j[[1,k]]},superscript𝑊𝑘1conditional-setsuperscript𝑗0superscript𝑗010superscript𝑘𝑗0𝑗delimited-[]1𝑘W^{k+1}=\big{\{}\ell^{j}=(0,\stackrel{{\scriptstyle j}}{{\dots}},0,1,0,% \stackrel{{\scriptstyle k-j}}{{\dots}},0):\;j\in[\![1,k]\!]\big{\}}\,,italic_W start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = { roman_ℓ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = ( 0 , start_RELOP SUPERSCRIPTOP start_ARG … end_ARG start_ARG italic_j end_ARG end_RELOP , 0 , 1 , 0 , start_RELOP SUPERSCRIPTOP start_ARG … end_ARG start_ARG italic_k - italic_j end_ARG end_RELOP , 0 ) : italic_j ∈ [ [ 1 , italic_k ] ] } ,

where the one-hot vector jsuperscript𝑗\ell^{j}roman_ℓ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT encodes the j𝑗jitalic_j-th label of ΛΛ\Lambdaroman_Λ for j[[1,k]]𝑗delimited-[]1𝑘j\in[\![1,k]\!]italic_j ∈ [ [ 1 , italic_k ] ] and where 0superscript0\ell^{0}roman_ℓ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT encodes the unknown label.

We now consider a convex polytope 𝒫𝒫\mathcal{P}caligraphic_P with a vertex set P𝑃Pitalic_P surrounding the set V𝑉Vitalic_V. The polytope 𝒫𝒫\mathcal{P}caligraphic_P always exists since V𝑉Vitalic_V is finite. Next, we define a map φ(0):VPWk+1:superscript𝜑0𝑉𝑃superscript𝑊𝑘1\varphi^{\scriptscriptstyle(0)}:V\cup P\to W^{k+1}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT : italic_V ∪ italic_P → italic_W start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT mapping each vertex vV𝑣𝑉v\in Vitalic_v ∈ italic_V tagged with label λ𝜆\lambdaitalic_λ to the one-hot vector in Wk+1superscript𝑊𝑘1W^{k+1}italic_W start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT that encodes the label λ𝜆\lambdaitalic_λ. The vertices of P𝑃Pitalic_P are sent to the vertex 0superscript0\ell^{0}roman_ℓ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. Observe that φ(0)superscript𝜑0\varphi^{\scriptscriptstyle(0)}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT is a vertex map.

Let L𝐿Litalic_L denote the simplicial complex with vertex set Wk+1superscript𝑊𝑘1W^{k+1}italic_W start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT consisting of only one maximal k𝑘kitalic_k-simplex and let Del(VP)Del𝑉𝑃\operatorname{Del}(V\cup P)roman_Del ( italic_V ∪ italic_P ) denote the Delauney triangulation computed for the set of points VP𝑉𝑃V\cup Pitalic_V ∪ italic_P. Note that |Del(VP)|=𝒫Del𝑉𝑃𝒫|\operatorname{Del}(V\cup P)|=\mathcal{P}| roman_Del ( italic_V ∪ italic_P ) | = caligraphic_P. The simplicial map φ:𝒫|L|:𝜑𝒫𝐿\varphi:\mathcal{P}\to|L|italic_φ : caligraphic_P → | italic_L | is induced by the vertex map φ(0)superscript𝜑0\varphi^{\scriptscriptstyle(0)}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT as explained in Subsection 2.2.

Remark 1

The space |L|𝐿|L|| italic_L | can be interpreted as the discrete probability distribution space Ωk+1superscriptnormal-Ω𝑘1\Omega^{k+1}roman_Ω start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT with k+1𝑘1k+1italic_k + 1 variables.

As an example, in Figure 2, on the left, we can see a dataset with four points V={b,c,k,d}𝑉𝑏𝑐𝑘𝑑V=\{b,c,k,d\}italic_V = { italic_b , italic_c , italic_k , italic_d }, labelled red and blue. The green points P={a,e,g,f}𝑃𝑎𝑒𝑔𝑓P=\{a,e,g,f\}italic_P = { italic_a , italic_e , italic_g , italic_f } are the vertices of a convex polytope 𝒫𝒫\mathcal{P}caligraphic_P containing V𝑉Vitalic_V and are sent by φ(0)superscript𝜑0\varphi^{\scriptscriptstyle(0)}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT to the green vertex 0superscript0\ell^{0}roman_ℓ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT on the right. The simplicial complex K=Del(VP)𝐾Del𝑉𝑃K=\operatorname{Del}(V\cup P)italic_K = roman_Del ( italic_V ∪ italic_P ) is drawn on the left and consists of ten maximal 2-simplices. On the right, the simplicial complex L𝐿Litalic_L consists of one maximal 2-simplex. The dotted arrows illustrate some examples of φ:𝒫|L|:𝜑𝒫𝐿\varphi:\mathcal{P}\to|L|italic_φ : caligraphic_P → | italic_L |.

Refer to caption
Figure 2: Illustration of a simplicial map for a classification task.

2.4 Simplicial map neural networks

Artificial neural networks can be seen as parametrized real-valued mappings between multidimensional spaces. Such mappings are the composition of several maps (usually many of them) that can be structured in layers. In [5], the simplicial map φ𝜑\varphiitalic_φ defined above was represented as a two-hidden-layer feed-forward neural network 𝒩φsubscript𝒩𝜑\mathcal{N}_{\varphi}caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT. This kind of artificial neural network is called simplicial map neural network (SMNN).

In the original definition [5], the first hidden layer of an SMNN computes the barycentric coordinates of Del(VP)Del𝑉𝑃\operatorname{Del}(V\cup P)roman_Del ( italic_V ∪ italic_P ). However, we will see here that if we precompute the barycentric coordinates, we can simplify the architecture of 𝒩φsubscript𝒩𝜑\mathcal{N}_{\varphi}caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT as follows.

As before, consider an input dataset consisting of a finite set Vn𝑉superscript𝑛V\subset\mathbb{R}^{n}italic_V ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT endowed with a set of k𝑘kitalic_k labels and a convex polytope 𝒫𝒫\mathcal{P}caligraphic_P with a set of vertices P𝑃Pitalic_P surrounding V𝑉Vitalic_V. Let Del(VP)Del𝑉𝑃\operatorname{Del}(V\cup P)roman_Del ( italic_V ∪ italic_P ) be the Delaunay triangulation with vertex set

VP={ω1,,ωα}n.𝑉𝑃superscript𝜔1superscript𝜔𝛼superscript𝑛V\cup P=\{\omega^{1},\dots,\omega^{\alpha}\}\subseteq\mathbb{R}^{n}\,.italic_V ∪ italic_P = { italic_ω start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_ω start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT } ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

Then, φ(0):VPWk+1:superscript𝜑0𝑉𝑃superscript𝑊𝑘1\varphi^{\scriptscriptstyle(0)}:V\cup P\to W^{k+1}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT : italic_V ∪ italic_P → italic_W start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT is a vertex map. Let us assume that, given x𝒫𝑥𝒫x\in\mathcal{P}italic_x ∈ caligraphic_P, we precompute the barycentric coordinates b(x)=(b0(x),,bn(x))n+1𝑏𝑥subscript𝑏0𝑥subscript𝑏𝑛𝑥superscript𝑛1b(x)=(b_{0}(x),\dots,b_{n}(x))\in\mathbb{R}^{n+1}italic_b ( italic_x ) = ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) , … , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT of x𝑥xitalic_x with respect to the n𝑛nitalic_n-simplex σ=ωi0,,ωinDel(VP)𝜎superscript𝜔subscript𝑖0superscript𝜔subscript𝑖𝑛Del𝑉𝑃\sigma=\langle\omega^{i_{0}},\dots,\omega^{i_{n}}\rangle\in\operatorname{Del}(% V\cup P)italic_σ = ⟨ italic_ω start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_ω start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ ∈ roman_Del ( italic_V ∪ italic_P ) such that x|σ|𝑥𝜎x\in|\sigma|italic_x ∈ | italic_σ |, and that we also precompute the vector

ξ(x)=(ξ1(x),,ξα(x))α𝜉𝑥subscript𝜉1𝑥subscript𝜉𝛼𝑥superscript𝛼\xi(x)=(\xi_{1}(x),\dots,\xi_{\alpha}(x))\in\mathbb{R}^{\alpha}italic_ξ ( italic_x ) = ( italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , … , italic_ξ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT

satisfying that, for t[[1,α]]𝑡delimited-[]1𝛼t\in[\![1,\alpha]\!]italic_t ∈ [ [ 1 , italic_α ] ], ξt(x)=bj(x)subscript𝜉𝑡𝑥subscript𝑏𝑗𝑥\xi_{t}(x)=b_{j}(x)italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) = italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) if ij=tsubscript𝑖𝑗𝑡i_{j}=titalic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_t for some j[[0,n]]𝑗delimited-[]0𝑛j\in[\![0,n]\!]italic_j ∈ [ [ 0 , italic_n ] ]. Let us remark that ξ(x)𝜉𝑥\xi(x)italic_ξ ( italic_x ) should be a column vector, but we will use row notation, for simplicity.

The SMNN 𝒩φsubscript𝒩𝜑\mathcal{N}_{\varphi}caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT induced by φ𝜑\varphiitalic_φ that predicts the hhitalic_h-label of x𝑥xitalic_x, for h[[0,k]]delimited-[]0𝑘h\in[\![0,k]\!]italic_h ∈ [ [ 0 , italic_k ] ], has the following architecture:

  • The number of neurons in the input layer is α𝛼\alphaitalic_α.

  • The number of neurons in the output layer is k+1𝑘1k+1italic_k + 1.

  • The set of weights is represented as a (k+1)×α𝑘1𝛼(k+1)\times\alpha( italic_k + 1 ) × italic_α matrix \mathcal{M}caligraphic_M such that the j𝑗jitalic_j-th column of \mathcal{M}caligraphic_M is φ(0)(ωt)superscript𝜑0superscript𝜔𝑡\varphi^{\scriptscriptstyle(0)}(\omega^{t})italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) for t[[1,α]]𝑡delimited-[]1𝛼t\in[\![1,\alpha]\!]italic_t ∈ [ [ 1 , italic_α ] ].

Then,

𝒩φ(x)=ξ(x).subscript𝒩𝜑𝑥𝜉𝑥\mathcal{N}_{\varphi}(x)=\mathcal{M}\cdot\xi(x)\ .caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_x ) = caligraphic_M ⋅ italic_ξ ( italic_x ) .

Observe that as defined so far, the SMNN weights are precomputed. Furthermore, the computation of the barycentric coordinates of the points around V𝑉Vitalic_V implies the calculation of the convex polytope 𝒫𝒫\mathcal{P}caligraphic_P surrounding V𝑉Vitalic_V. Finally, the computation of the Delaunay triangulation Del(VP)Del𝑉𝑃\operatorname{Del}(V\cup P)roman_Del ( italic_V ∪ italic_P ) is costly if VP𝑉𝑃V\cup Pitalic_V ∪ italic_P has many points since its time complexity is O(nlogn+nd2)𝑂𝑛𝑛superscript𝑛𝑑2O(n\log{n}+n^{\lceil\frac{d}{2}\rceil})italic_O ( italic_n roman_log italic_n + italic_n start_POSTSUPERSCRIPT ⌈ divide start_ARG italic_d end_ARG start_ARG 2 end_ARG ⌉ end_POSTSUPERSCRIPT ) (see [9, Chapter 4]).

In the next sections, we will propose some techniques to overcome the SMNN construction drawbacks while maintaining its advantages. We will see that one way to overcome the computation of the convex polytope 𝒫𝒫\mathcal{P}caligraphic_P is to consider a hypersphere Snsuperscript𝑆𝑛S^{n}italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT instead. We will also see how to avoid the use of the artificially created unknown label. Furthermore, to reduce the cost of Delaunay computation and add trainability to 𝒩φsubscript𝒩𝜑\mathcal{N}_{\varphi}caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT to avoid overfitting, a subset UV𝑈𝑉U\subset Vitalic_U ⊂ italic_V will be considered. The set V𝑉Vitalic_V will be used to train and test a map φU(0):Uk:superscriptsubscript𝜑𝑈0𝑈superscript𝑘\varphi_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}:U\to\mathbb{R}^{k}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT : italic_U → blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. Such a map will induce a continuous function φU:Bn|L|:subscript𝜑𝑈superscript𝐵𝑛𝐿\varphi_{\scriptscriptstyle U}:B^{n}\to|L|italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT : italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → | italic_L | which approximates φ𝜑\varphiitalic_φ.

3 The unknown boundary and the function φUsubscript𝜑𝑈\varphi_{\scriptscriptstyle U}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT

In this section, we will see how to compute a function φUsubscript𝜑𝑈\varphi_{\scriptscriptstyle U}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT that approximates the simplicial map φ𝜑\varphiitalic_φ and avoids the computation of the convex polytope 𝒫𝒫\mathcal{P}caligraphic_P and the artificial consideration of the unknown label, reducing, at the same time, the computation of the Delaunay triangulation used to construct SMNNs. The general description of the methodology is described in Algorithm 1.

First, let us compute a hypersphere surrounding V𝑉Vitalic_V. One of the simplest ways to do that is to translate V𝑉Vitalic_V so that its center of mass is placed at the origin on𝑜superscript𝑛o\in\mathbb{R}^{n}italic_o ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Then, the hypersphere

Sn={wn:w=R}superscript𝑆𝑛conditional-set𝑤superscript𝑛norm𝑤𝑅S^{n}=\{w\in\mathbb{R}^{n}:\;||w||=R\}italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = { italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : | | italic_w | | = italic_R }

such that R>max{v:vV}𝑅:norm𝑣𝑣𝑉R>\max\{||v||:\,v\in V\}italic_R > roman_max { | | italic_v | | : italic_v ∈ italic_V } satisfies that Snsuperscript𝑆𝑛S^{n}italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT surrounds V𝑉Vitalic_V. Second, let us assume that we have selected a subset U={u1,,um}V𝑈superscript𝑢1superscript𝑢𝑚𝑉U=\{u^{1},\dots,u^{m}\}\subseteq Vitalic_U = { italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT } ⊆ italic_V (we will compare different strategies to select U𝑈Uitalic_U in Section 7) and that we have computed Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ). Then, we have that

VBn={xn:xR} and o|Del(U)|.𝑉superscript𝐵𝑛conditional-set𝑥superscript𝑛norm𝑥𝑅 and 𝑜Del𝑈V\subset B^{n}=\{x\in\mathbb{R}^{n}:\;||x||\leq R\}\;\mbox{ and }\;o\in|% \operatorname{Del}(U)|\,.italic_V ⊂ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : | | italic_x | | ≤ italic_R } and italic_o ∈ | roman_Del ( italic_U ) | .

Let us consider the boundary of Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ), denoted as δDel(U)𝛿Del𝑈\delta\operatorname{Del}(U)italic_δ roman_Del ( italic_U ), which consists of the set of (n1)𝑛1(n-1)( italic_n - 1 )-simplices that are faces of exactly one maximal simplex of Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ).

Now, let us define ξU(x)msubscript𝜉𝑈𝑥superscript𝑚\xi_{U}(x)\in\mathbb{R}^{m}italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT for any xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as follows. Given xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, to find the n𝑛nitalic_n-simplex σ=ω0,,ωn𝜎superscript𝜔0superscript𝜔𝑛\sigma=\langle\omega^{0},\dots,\omega^{n}\rangleitalic_σ = ⟨ italic_ω start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , … , italic_ω start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⟩ such that x|σ|𝑥𝜎x\in|\sigma|italic_x ∈ | italic_σ |, we have to consider two cases: x|Del(U)|𝑥Del𝑈x\in|\operatorname{Del}(U)|italic_x ∈ | roman_Del ( italic_U ) | and x|Del(U)|𝑥Del𝑈x\not\in|\operatorname{Del}(U)|italic_x ∉ | roman_Del ( italic_U ) |.

If x|Del(U)|𝑥Del𝑈x\in|\operatorname{Del}(U)|italic_x ∈ | roman_Del ( italic_U ) | then σ𝜎\sigmaitalic_σ is the n𝑛nitalic_n-simplex in Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ) such that x|σ|𝑥𝜎x\in|\sigma|italic_x ∈ | italic_σ |. If x|Del(U)|𝑥Del𝑈x\not\in|\operatorname{Del}(U)|italic_x ∉ | roman_Del ( italic_U ) | then σ𝜎\sigmaitalic_σ is a new n𝑛nitalic_n-simplex defined by the vertices of an (n1)𝑛1(n-1)( italic_n - 1 )-simplex of δDel(U)𝛿Del𝑈\delta\operatorname{Del}(U)italic_δ roman_Del ( italic_U ) and a new vertex consisting of the projection of x𝑥xitalic_x to Snsuperscript𝑆𝑛S^{n}italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Specifically, if x|Del(U)|𝑥Del𝑈x\not\in|\operatorname{Del}(U)|italic_x ∉ | roman_Del ( italic_U ) | then σ𝜎\sigmaitalic_σ is computed in the following way:

  1. 1.

    Consider the set

    Γ={μδDel(U):(Nui0+c)(Nx+c)<0}Γconditional-set𝜇𝛿Del𝑈𝑁superscript𝑢subscript𝑖0𝑐𝑁𝑥𝑐0\Gamma=\big{\{}\mu\in\delta\operatorname{Del}(U):(N\cdot u^{i_{0}}+c)(N\cdot x% +c)<0\big{\}}roman_Γ = { italic_μ ∈ italic_δ roman_Del ( italic_U ) : ( italic_N ⋅ italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_c ) ( italic_N ⋅ italic_x + italic_c ) < 0 }

    where N𝑁Nitalic_N is the vector normal to the hyperplane containing μ=ui1,,uin𝜇superscript𝑢subscript𝑖1superscript𝑢subscript𝑖𝑛\mu=\langle u^{i_{1}},\dots,u^{i_{n}}\rangleitalic_μ = ⟨ italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩, c=Nui1𝑐𝑁superscript𝑢subscript𝑖1c=N\cdot u^{i_{1}}italic_c = italic_N ⋅ italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and ui0Usuperscript𝑢subscript𝑖0𝑈u^{i_{0}}\in Uitalic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∈ italic_U such that ui0,ui1,,uinDel(U)superscript𝑢subscript𝑖0superscript𝑢subscript𝑖1superscript𝑢subscript𝑖𝑛Del𝑈\langle u^{i_{0}},u^{i_{1}},\dots,u^{i_{n}}\rangle\in\operatorname{Del}(U)⟨ italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ ∈ roman_Del ( italic_U ).

  2. 2.

    Compute the point wx=RxxSnsuperscript𝑤𝑥𝑅𝑥norm𝑥superscript𝑆𝑛w^{x}=R\frac{x}{||x||}\in S^{n}italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = italic_R divide start_ARG italic_x end_ARG start_ARG | | italic_x | | end_ARG ∈ italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

  3. 3.

    Find σ=wx,ui1,,uin𝜎superscript𝑤𝑥superscript𝑢subscript𝑖1superscript𝑢subscript𝑖𝑛\sigma=\langle w^{x},u^{i_{1}},\dots,u^{i_{n}}\rangleitalic_σ = ⟨ italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ such that

    μ=ui1,,uinΓ𝜇superscript𝑢subscript𝑖1superscript𝑢subscript𝑖𝑛Γ\mu=\langle u^{i_{1}},\dots,u^{i_{n}}\rangle\in\Gamma\;italic_μ = ⟨ italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ ∈ roman_Γ and x|σ|𝑥𝜎\;x\in|\sigma|italic_x ∈ | italic_σ |.

    Observe that, by construction, μ𝜇\muitalic_μ always exists since |Del(U)|Del𝑈|\operatorname{Del}(U)|| roman_Del ( italic_U ) | is a convex polytope.

Now, let b(x)=(b0(x),,bn(x))n+1𝑏𝑥subscript𝑏0𝑥subscript𝑏𝑛𝑥superscript𝑛1b(x)=(b_{0}(x),\dots,b_{n}(x))\in\mathbb{R}^{n+1}italic_b ( italic_x ) = ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) , … , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT be the barycentric coordinates of x𝑥xitalic_x with respect to σ𝜎\sigmaitalic_σ. Then, ξU(x)=(ξ1(x),,ξm(x))subscript𝜉𝑈𝑥subscript𝜉1𝑥subscript𝜉𝑚𝑥\xi_{U}(x)=(\xi_{1}(x),\dots,\xi_{m}(x))italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) = ( italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , … , italic_ξ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) ) is the point in msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT satisfying that, for t[[1,m]]𝑡delimited-[]1𝑚t\in[\![1,m]\!]italic_t ∈ [ [ 1 , italic_m ] ],

ξt(x)={bj(x) if ut=ωj for some j[[0,n]]0 otherwise.subscript𝜉𝑡𝑥casessubscript𝑏𝑗𝑥 if ut=ωj for some j[[0,n]]0 otherwise.\xi_{t}(x)=\left\{\begin{array}[]{cl}b_{j}(x)&\mbox{ if $u^{t}=\omega^{j}$ for some $j\in[\![0,n]\!]$, }\\ 0&\mbox{ otherwise.}\end{array}\right.italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) = { start_ARRAY start_ROW start_CELL italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL if italic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_ω start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT for some italic_j ∈ [ [ 0 , italic_n ] ] , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise. end_CELL end_ROW end_ARRAY

Observe that ξU(x)subscript𝜉𝑈𝑥\xi_{U}(x)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) always exists and is unique. An example of points x𝑥xitalic_x and wxsuperscript𝑤𝑥w^{x}italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT and simplex μ𝜇\muitalic_μ is shown in Figure 3 and Example 1.

Let us observe that, thanks to the new definition of ξU(x)subscript𝜉𝑈𝑥\xi_{U}(x)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) for xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, if we have a map φU(0):Uk:superscriptsubscript𝜑𝑈0𝑈superscript𝑘\varphi_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}:U\to\mathbb{R}^{k}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT : italic_U → blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT then it induces a continuous function φU:Bn|L|:subscript𝜑𝑈superscript𝐵𝑛𝐿\varphi_{\scriptscriptstyle U}:B^{n}\to|L|italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT : italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → | italic_L | defined for any xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as:

φU(x)=softmax(t[[1,m]]ξt(x)φU(0)(ut))=softmax(UξU(x))subscript𝜑𝑈𝑥absentsoftmaxsubscript𝑡delimited-[]1𝑚subscript𝜉𝑡𝑥superscriptsubscript𝜑𝑈0superscript𝑢𝑡missing-subexpressionabsentsoftmaxsubscript𝑈subscript𝜉𝑈𝑥\begin{array}[]{cl}\varphi_{\scriptscriptstyle U}(x)&=\operatorname{softmax}% \big{(}\sum_{\scriptscriptstyle t\in[\![1,m]\!]}\xi_{t}(x)\varphi_{% \scriptscriptstyle U}^{\scriptscriptstyle(0)}(u^{t})\big{)}\\ &=\operatorname{softmax}(\mathcal{M}_{U}\cdot\xi_{U}(x))\end{array}start_ARRAY start_ROW start_CELL italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL = roman_softmax ( ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_m ] ] end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_softmax ( caligraphic_M start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ⋅ italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) ) end_CELL end_ROW end_ARRAY

where for z=(z1,,zk)k𝑧subscript𝑧1subscript𝑧𝑘superscript𝑘z=(z_{1},\dots,z_{k})\in\mathbb{R}^{k}italic_z = ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT,

softmax(z)=(ez1h[[1,k]]ezh,,ezkh[[1,k]]ezh).softmax𝑧superscript𝑒subscript𝑧1subscriptdelimited-[]1𝑘superscript𝑒subscript𝑧superscript𝑒subscript𝑧𝑘subscriptdelimited-[]1𝑘superscript𝑒subscript𝑧\mbox{$\operatorname{softmax}(z)=\left(\frac{e^{z_{1}}}{\sum_{% \scriptscriptstyle h\in[\![1,k]\!]}e^{z_{h}}},\dots,\frac{e^{z_{k}}}{\sum_{% \scriptscriptstyle h\in[\![1,k]\!]}e^{z_{h}}}\right)$}.roman_softmax ( italic_z ) = ( divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG , … , divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) .

Let us observe that, to obtain a categorical distribution from φU(x)ksubscript𝜑𝑈𝑥superscript𝑘\varphi_{\scriptscriptstyle U}(x)\in\mathbb{R}^{k}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, we could just divide each of its coordinates by the total sum. However, softmaxsoftmax\operatorname{softmax}roman_softmax is introduced here to obtain a simplified formula for the gradient descent algorithm as shown in Theorem 1.

Refer to caption
Figure 3: An example of the point wxsuperscript𝑤𝑥w^{x}italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT computed from x𝑥xitalic_x and the (n1)𝑛1(n-1)( italic_n - 1 )-simplex μ=u1,u2Γ𝜇superscript𝑢1superscript𝑢2Γ\mu=\langle u^{1},u^{2}\rangle\in\Gammaitalic_μ = ⟨ italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ ∈ roman_Γ such that x|σ|𝑥𝜎x\in|\sigma|italic_x ∈ | italic_σ | for σ=wx,u1,u2𝜎superscript𝑤𝑥superscript𝑢1superscript𝑢2\sigma=\langle w^{x},u^{1},u^{2}\rangleitalic_σ = ⟨ italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩.
Example 1

Let us consider

V={v1=(12,12),v2=(12,1),v3=(1,12),v4=(1,1)}𝑉formulae-sequencesuperscript𝑣11212formulae-sequencesuperscript𝑣2121formulae-sequencesuperscript𝑣3112superscript𝑣411V=\big{\{}v^{1}=\big{(}\frac{1}{2},\frac{1}{2}\big{)},\,v^{2}=\big{(}\frac{1}{% 2},1\big{)},\,v^{3}=\big{(}1,\frac{1}{2}\big{)},\,v^{4}=\big{(}1,1\big{)}\big{\}}italic_V = { italic_v start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) , italic_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 1 ) , italic_v start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = ( 1 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) , italic_v start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = ( 1 , 1 ) }

with label λ1=0superscript𝜆10\lambda^{1}=0italic_λ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = 0 for visuperscript𝑣𝑖v^{i}italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT with i=1,2𝑖12i=1,2italic_i = 1 , 2 and λ2=1superscript𝜆21\lambda^{2}=1italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 for visuperscript𝑣𝑖v^{i}italic_v start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT with i=3,4𝑖34i=3,4italic_i = 3 , 4. Firstly, we translate V𝑉Vitalic_V so that the center of mass of V𝑉Vitalic_V is the origin o2𝑜superscript2o\in\mathbb{R}^{2}italic_o ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The translated dataset V~normal-~𝑉\tilde{V}over~ start_ARG italic_V end_ARG is

{v~1=(14,14),v~2=(14,14),v~3=(14,14),v~4=(14,14)}.formulae-sequencesuperscript~𝑣11414formulae-sequencesuperscript~𝑣21414formulae-sequencesuperscript~𝑣31414superscript~𝑣41414\mbox{$\big{\{}\tilde{v}^{1}=\big{(}\frac{-1}{4},\frac{-1}{4}\big{)},\tilde{v}% ^{2}=\big{(}\frac{-1}{4},\frac{1}{4}\big{)},\tilde{v}^{3}=\big{(}\frac{1}{4},% \frac{-1}{4}\big{)},\tilde{v}^{4}=\big{(}\frac{1}{4},\frac{1}{4}\big{)}\big{\}% }$}.{ over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = ( divide start_ARG - 1 end_ARG start_ARG 4 end_ARG , divide start_ARG - 1 end_ARG start_ARG 4 end_ARG ) , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( divide start_ARG - 1 end_ARG start_ARG 4 end_ARG , divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG , divide start_ARG - 1 end_ARG start_ARG 4 end_ARG ) , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG , divide start_ARG 1 end_ARG start_ARG 4 end_ARG ) } .

Let us consider x1=(34,35)superscript𝑥13435x^{1}=\big{(}\frac{3}{4},\frac{3}{5}\big{)}italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = ( divide start_ARG 3 end_ARG start_ARG 4 end_ARG , divide start_ARG 3 end_ARG start_ARG 5 end_ARG ) and x2=(34,54).superscript𝑥23454x^{2}=\big{(}\frac{3}{4},\frac{5}{4}\big{)}.italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( divide start_ARG 3 end_ARG start_ARG 4 end_ARG , divide start_ARG 5 end_ARG start_ARG 4 end_ARG ) . Hence, the translated input data is x~1=(0,320)superscriptnormal-~𝑥10320\tilde{x}^{1}=\big{(}0,\frac{-3}{20}\big{)}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = ( 0 , divide start_ARG - 3 end_ARG start_ARG 20 end_ARG ) and x~2=(0,12)superscriptnormal-~𝑥2012\tilde{x}^{2}=\big{(}0,\frac{1}{2}\big{)}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ).
To simplify the explanation of the method, consider U=V~𝑈normal-~𝑉U=\tilde{V}italic_U = over~ start_ARG italic_V end_ARG, that is, ui=v~isuperscript𝑢𝑖superscriptnormal-~𝑣𝑖u^{i}=\tilde{v}^{i}italic_u start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for i[[1,4]]𝑖delimited-[]14i\in[\![1,4]\!]italic_i ∈ [ [ 1 , 4 ] ]. Then, the maximal simplices of Del(U)normal-Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ) are σ~1=v~1,v~2,v~3superscriptnormal-~𝜎1superscriptnormal-~𝑣1superscriptnormal-~𝑣2superscriptnormal-~𝑣3\tilde{\sigma}^{1}=\langle\tilde{v}^{1},\tilde{v}^{2},\tilde{v}^{3}\rangleover~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = ⟨ over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ⟩ and σ~1=v~2,v~3,v~4superscriptnormal-~𝜎1superscriptnormal-~𝑣2superscriptnormal-~𝑣3superscriptnormal-~𝑣4\tilde{\sigma}^{1}=\langle\tilde{v}^{2},\tilde{v}^{3},\tilde{v}^{4}\rangleover~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = ⟨ over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ⟩

  • The matrix Usubscript𝑈\mathcal{M}_{U}caligraphic_M start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT is: (11000011)matrix11000011\begin{pmatrix}1&1&0&0\\ 0&0&1&1\\ \end{pmatrix}( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL end_ROW end_ARG )

  • Since the barycentric coordinates of x~1superscript~𝑥1\tilde{x}^{1}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT with respect to |σ~1|superscript~𝜎1|\tilde{\sigma}^{1}|| over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | are (0.5,0.3,0.2)0.50.30.2(0.5,0.3,0.2)( 0.5 , 0.3 , 0.2 ) then x~1superscript~𝑥1\tilde{x}^{1}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT is in |σ~1||Del(U)|superscript~𝜎1Del𝑈|\tilde{\sigma}^{1}|\subset|\operatorname{Del}(U)|| over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | ⊂ | roman_Del ( italic_U ) | and ξU(x~1)=(0.3,0.2,0,0.5)subscript𝜉𝑈superscript~𝑥10.30.200.5\xi_{U}(\tilde{x}^{1})=(0.3,0.2,0,0.5)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) = ( 0.3 , 0.2 , 0 , 0.5 ). Then

    φU(x~1)=softmax(UξU(x~1))=(0.5,0.5).subscript𝜑𝑈superscript~𝑥1softmaxsubscript𝑈subscript𝜉𝑈superscript~𝑥10.50.5\varphi_{U}(\tilde{x}^{1})=\operatorname{softmax}(\mathcal{M}_{U}\cdot\xi_{U}(% \tilde{x}^{1}))=(0.5,0.5).italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) = roman_softmax ( caligraphic_M start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ⋅ italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) ) = ( 0.5 , 0.5 ) .
  • On the other hand, the point x~2superscript~𝑥2\tilde{x}^{2}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is outside |Del(U)|Del𝑈|\operatorname{Del}(U)|| roman_Del ( italic_U ) |. Assuming that, for example, we have fixed R=1𝑅1R=1italic_R = 1, we add a new simplex σ3={wx,v~1,v~2}superscript𝜎3superscript𝑤𝑥superscript~𝑣1superscript~𝑣2\sigma^{3}=\{w^{x},\tilde{v}^{1},\tilde{v}^{2}\}italic_σ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = { italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } where wx=v~5=(0,1)superscript𝑤𝑥superscript~𝑣501w^{x}=\tilde{v}^{5}=(0,1)italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT = ( 0 , 1 ) which is the projection of x~2superscript~𝑥2\tilde{x}^{2}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to the hypersphere of radius R𝑅Ritalic_R centered in the origin. See Figure 4. Then, the barycentric coordinates of x~2superscript~𝑥2\tilde{x}^{2}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with respect to σ3superscript𝜎3\sigma^{3}italic_σ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT is (0.33,0.33,0.33)0.330.330.33(0.33,0.33,0.33)( 0.33 , 0.33 , 0.33 ) and then ξU(x~2)=(0,0.33,0.33,0)subscript𝜉𝑈superscript~𝑥200.330.330\xi_{U}(\tilde{x}^{2})=(0,0.33,0.33,0)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = ( 0 , 0.33 , 0.33 , 0 ), concluding that

    φU(x~2)=softmax(UξU(x~2))=(0.5,0.5).subscript𝜑𝑈superscript~𝑥2softmaxsubscript𝑈subscript𝜉𝑈superscript~𝑥20.50.5\varphi_{U}(\tilde{x}^{2})=\operatorname{softmax}(\mathcal{M}_{U}\cdot\xi_{U}(% \tilde{x}^{2}))=(0.5,0.5).italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = roman_softmax ( caligraphic_M start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ⋅ italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) = ( 0.5 , 0.5 ) .
Refer to caption
Figure 4: The relative positions of the vertices v~isuperscript~𝑣𝑖\tilde{v}^{i}over~ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for i[[1,5]]𝑖delimited-[]15i\in[\![1,5]\!]italic_i ∈ [ [ 1 , 5 ] ] and the points x~1superscript~𝑥1\tilde{x}^{1}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and x~2superscript~𝑥2\tilde{x}^{2}over~ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of Example 1.

The pseudocode for computing φU(x)subscript𝜑𝑈𝑥\varphi_{U}(x)italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) is provided in Algorithm 1.

Algorithm 1 Pseudocode to compute φU(x)subscript𝜑𝑈𝑥\varphi_{U}(x)italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) for xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and a subset U𝑈Uitalic_U of an input dataset V𝑉Vitalic_V surrounded by a hypersphere of radius R𝑅Ritalic_R
Input: UVn𝑈𝑉superscript𝑛U\subset V\subset\mathbb{R}^{n}italic_U ⊂ italic_V ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT labelled using a set of labels Λ={λ1,,λk}Λsuperscript𝜆1superscript𝜆𝑘\Lambda=\{\lambda^{1},\dots,\lambda^{k}\}roman_Λ = { italic_λ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }, a radius R𝑅Ritalic_R, and a point xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.
Output: The value of φU(x)subscript𝜑𝑈𝑥\varphi_{U}(x)italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x )
compute Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U )
Wk:={j:=(0,j1,0,1,0,kj,0)W^{k}:=\big{\{}\ell^{j}:=(0,\stackrel{{\scriptstyle j-1}}{{\dots}},0,1,0,% \stackrel{{\scriptstyle k-j}}{{\dots}},0)italic_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT := { roman_ℓ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT := ( 0 , start_RELOP SUPERSCRIPTOP start_ARG … end_ARG start_ARG italic_j - 1 end_ARG end_RELOP , 0 , 1 , 0 , start_RELOP SUPERSCRIPTOP start_ARG … end_ARG start_ARG italic_k - italic_j end_ARG end_RELOP , 0 ) for j[[1,k]]}j\in[\![1,k]\!]\big{\}}italic_j ∈ [ [ 1 , italic_k ] ] }
init an empty matrix Usubscript𝑈\mathcal{M}_{U}caligraphic_M start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT
init an empty vector xi(x)𝑥𝑖𝑥xi(x)italic_x italic_i ( italic_x )
for uU𝑢𝑈u\in Uitalic_u ∈ italic_U do
     if λjsuperscript𝜆𝑗\lambda^{j}italic_λ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is the label of u𝑢uitalic_u then
         add jsuperscript𝑗\ell^{j}roman_ℓ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT as a column of Usubscript𝑈\mathcal{M}_{U}caligraphic_M start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT
     end if
end for
for σ𝜎\sigmaitalic_σ maximal simplex of Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ) do
     compute b(x)𝑏𝑥b(x)italic_b ( italic_x )
     if bj0subscript𝑏𝑗0b_{j}\geq 0italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ 0 for all j[[0,n]]𝑗delimited-[]0𝑛j\in[\![0,n]\!]italic_j ∈ [ [ 0 , italic_n ] ] then
         stop: compute ξU(x)subscript𝜉𝑈𝑥\xi_{U}(x)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x )
     end if
end for
if ξU(x)subscript𝜉𝑈𝑥\xi_{U}(x)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) is empty then
     compute ΓΓ\Gammaroman_Γ, wxsuperscript𝑤𝑥w^{x}italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT, μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ
     compute b(x)𝑏𝑥b(x)italic_b ( italic_x ) with respect to σ𝜎\sigmaitalic_σ and ξU(x)subscript𝜉𝑈𝑥\xi_{U}(x)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x )
end if
φU(x):=softmax(UξU(x))assignsubscript𝜑𝑈𝑥softmaxsubscript𝑈subscript𝜉𝑈𝑥\varphi_{U}(x):=\operatorname{softmax}(\mathcal{M}_{U}\cdot\xi_{U}(x))italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) := roman_softmax ( caligraphic_M start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ⋅ italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) )

The following property holds.

Lemma 1 (Continuity)

Let xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Then,

limyxξU(y)=ξU(x).subscript𝑦𝑥subscript𝜉𝑈𝑦subscript𝜉𝑈𝑥\lim_{y\to x}\xi_{U}(y)=\xi_{U}(x).roman_lim start_POSTSUBSCRIPT italic_y → italic_x end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_y ) = italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) .

Proof. If x|Del(U)|𝑥Del𝑈x\in|\operatorname{Del}(U)|italic_x ∈ | roman_Del ( italic_U ) | then the result holds due to the continuity of the barycentric coordinates transformation. If x|Del(U)|𝑥Del𝑈x\not\in|\operatorname{Del}(U)|italic_x ∉ | roman_Del ( italic_U ) |, since the origin o|Del(U)|𝑜Del𝑈o\in|\operatorname{Del}(U)|italic_o ∈ | roman_Del ( italic_U ) |, then x0norm𝑥0||x||\neq 0| | italic_x | | ≠ 0. Therefore, for y𝑦yitalic_y close to x𝑥xitalic_x, y0norm𝑦0||y||\neq 0| | italic_y | | ≠ 0 and wy=Ryynsuperscript𝑤𝑦𝑅𝑦norm𝑦superscript𝑛w^{y}=R\frac{y}{||y||}\in\mathbb{R}^{n}italic_w start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT = italic_R divide start_ARG italic_y end_ARG start_ARG | | italic_y | | end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Besides,

limyxwy=wxsubscript𝑦𝑥superscript𝑤𝑦superscript𝑤𝑥\lim_{y\to x}w^{y}=w^{x}roman_lim start_POSTSUBSCRIPT italic_y → italic_x end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT = italic_w start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT

and therefore

limyxξU(y)=ξU(x),subscript𝑦𝑥subscript𝜉𝑈𝑦subscript𝜉𝑈𝑥\lim_{y\to x}\xi_{U}(y)=\xi_{U}(x)\ ,roman_lim start_POSTSUBSCRIPT italic_y → italic_x end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_y ) = italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) ,

concluding the proof. \qed

Lemma 2 (Consistence)

Let φ(0)superscript𝜑0\varphi^{\scriptscriptstyle(0)}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT be the map defined in Subsection 2.3. If U=V𝑈𝑉U=Vitalic_U = italic_V and φU(0)(u)=φ(0)(u)superscriptsubscript𝜑𝑈0𝑢superscript𝜑0𝑢\varphi_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}(u)=\varphi^{% \scriptscriptstyle(0)}(u)italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u ) = italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u ) for all uU𝑢𝑈u\in Uitalic_u ∈ italic_U then

argmaxφU(x)=argmaxφ(x); for all xDel(V).formulae-sequenceargmaxsubscript𝜑𝑈𝑥argmax𝜑𝑥 for all 𝑥Del𝑉\operatorname{arg\,max}{\varphi_{\scriptscriptstyle U}(x)}=\operatorname{arg\,% max}{\varphi(x)};\mbox{ for all }x\in\operatorname{Del}(V)\,.start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) = start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ ( italic_x ) ; for all italic_x ∈ roman_Del ( italic_V ) .

Proof. Let us observe that if U=V𝑈𝑉U=Vitalic_U = italic_V and φU(0)(u)=φ(0)(u)superscriptsubscript𝜑𝑈0𝑢superscript𝜑0𝑢\varphi_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}(u)=\varphi^{% \scriptscriptstyle(0)}(u)italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u ) = italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u ) for all uU𝑢𝑈u\in Uitalic_u ∈ italic_U then, for any xDel(V)𝑥Del𝑉x\in\operatorname{Del}(V)italic_x ∈ roman_Del ( italic_V ), we have that:

φ(x)=t[[1,m]]ξt(x)φ(0)(ut)=t[[1,m]]ξt(x)φU(0)(ut)𝜑𝑥subscript𝑡delimited-[]1𝑚subscript𝜉𝑡𝑥superscript𝜑0superscript𝑢𝑡subscript𝑡delimited-[]1𝑚subscript𝜉𝑡𝑥superscriptsubscript𝜑𝑈0superscript𝑢𝑡\varphi(x)=\sum_{\scriptscriptstyle t\in[\![1,m]\!]}\xi_{t}(x)\varphi^{% \scriptscriptstyle(0)}(u^{t})=\sum_{\scriptscriptstyle t\in[\![1,m]\!]}\xi_{t}% (x)\varphi_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}(u^{t})italic_φ ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_m ] ] end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_m ] ] end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ).

Then, φU(x)=softmax(φ(x))subscript𝜑𝑈𝑥softmax𝜑𝑥\varphi_{\scriptscriptstyle U}(x)=\operatorname{softmax}\big{(}\varphi(x)\big{)}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) = roman_softmax ( italic_φ ( italic_x ) ) and argmaxφU(x)=argmaxφ(x).argmaxsubscript𝜑𝑈𝑥argmax𝜑𝑥\operatorname{arg\,max}{\varphi_{\scriptscriptstyle U}(x)}=\operatorname{arg\,% max}{\varphi(x)}.start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) = start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ ( italic_x ) . \qed

One of the keys to our study is the identification of the points of nsuperscript𝑛{\mathbb{R}}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT allocated inside a given simplex, with the set of all probability distributions with n+1𝑛1n+1italic_n + 1 support values. In this way, the barycentric coordinates of a point can be seen as a probability distribution. From this point of view, given xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, then φ(x)𝜑𝑥\varphi(x)italic_φ ( italic_x ) and φU(x)subscript𝜑𝑈𝑥\varphi_{\scriptscriptstyle U}(x)italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) are both in the set |L|𝐿|L|| italic_L | of probability distributions with k𝑘kitalic_k support points. This is why the categorical cross-entropy loss function \mathcal{L}caligraphic_L will be used to compare the similarity between φ𝜑\varphiitalic_φ and φUsubscript𝜑𝑈\varphi_{\scriptscriptstyle U}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT. Specifically, for vV𝑣𝑉v\in Vitalic_v ∈ italic_V, \mathcal{L}caligraphic_L is defined as:

(φU,φ,v)=h[[1,k]]yhlog(sh),subscript𝜑𝑈𝜑𝑣subscriptdelimited-[]1𝑘subscript𝑦subscript𝑠\mbox{$\mathcal{L}(\varphi_{\scriptscriptstyle U},\varphi,v)=-\sum_{% \scriptscriptstyle h\in[\![1,k]\!]}y_{h}\log(s_{h})$}\,,caligraphic_L ( italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_φ , italic_v ) = - ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT roman_log ( italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ,

where φ(0)(v)=(y1,,yk)superscript𝜑0𝑣subscript𝑦1subscript𝑦𝑘\varphi^{\scriptscriptstyle(0)}(v)=(y_{1},\dots,y_{k})italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_v ) = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and φU(v)=(s1,,sk)subscript𝜑𝑈𝑣subscript𝑠1subscript𝑠𝑘\varphi_{\scriptscriptstyle U}(v)=(s_{1},\dots,s_{k})italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_v ) = ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).

The following lemma establishes a specific set UV𝑈𝑉U\subset Vitalic_U ⊂ italic_V and a function φ^Usubscript^𝜑𝑈\hat{\varphi}_{\scriptscriptstyle U}over^ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT such that (φU,φ,v)=0subscript𝜑𝑈𝜑𝑣0\mathcal{L}(\varphi_{\scriptscriptstyle U},\varphi,v)=0caligraphic_L ( italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_φ , italic_v ) = 0 for all vV𝑣𝑉v\in Vitalic_v ∈ italic_V.

Lemma 3 (\mathcal{L}caligraphic_L-optimum simplicial map)

Let U𝑈Uitalic_U be a subset of V𝑉Vitalic_V satisfying, for all uU𝑢𝑈u\in Uitalic_u ∈ italic_U, that:

  1. 1.

    φU(0)(u)=φ(0)(u)superscriptsubscript𝜑𝑈0𝑢superscript𝜑0𝑢{\varphi}_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}(u)=\varphi^{% \scriptscriptstyle(0)}(u)italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u ) = italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u ), and

  2. 2.

    if vV𝑣𝑉v\in Vitalic_v ∈ italic_V such that φ(0)(v)φ(0)(u)superscript𝜑0𝑣superscript𝜑0𝑢\varphi^{(0)}(v)\neq\varphi^{(0)}(u)italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_v ) ≠ italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u ) and v,uDel(V)𝑣𝑢Del𝑉\langle v,u\rangle\in\operatorname{Del}(V)⟨ italic_v , italic_u ⟩ ∈ roman_Del ( italic_V ) then vU𝑣𝑈v\in Uitalic_v ∈ italic_U.

Then,

argmaxφU(x)=argmaxφ(x); for all xDel(U).formulae-sequenceargmaxsubscript𝜑𝑈𝑥argmax𝜑𝑥 for all 𝑥Del𝑈\operatorname{arg\,max}{\varphi_{\scriptscriptstyle U}(x)}=\operatorname{arg\,% max}{\varphi(x)};\mbox{ for all }x\in\operatorname{Del}(U)\,.start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) = start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ ( italic_x ) ; for all italic_x ∈ roman_Del ( italic_U ) .

Proof. As proved in [6], under the assumptions stated in this lemma, we have that, for all x<|Del(U)|x\in<|\operatorname{Del}(U)|italic_x ∈ < | roman_Del ( italic_U ) |:

φ(x)=t[[1,m]]ξt(x)φU(0)(ut)𝜑𝑥subscript𝑡delimited-[]1𝑚subscript𝜉𝑡𝑥superscriptsubscript𝜑𝑈0superscript𝑢𝑡\varphi(x)=\sum_{\scriptscriptstyle t\in[\![1,m]\!]}\xi_{t}(x)\varphi_{% \scriptscriptstyle U}^{\scriptscriptstyle(0)}(u^{t})italic_φ ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_m ] ] end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )

and then argmaxφU(x)=argmaxφ(x)argmaxsubscript𝜑𝑈𝑥argmax𝜑𝑥\operatorname{arg\,max}{\varphi_{\scriptscriptstyle U}(x)}=\operatorname{arg\,% max}{\varphi(x)}start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) = start_OPFUNCTION roman_arg roman_max end_OPFUNCTION italic_φ ( italic_x ). \qed

Unfortunately, to compute the subset U𝑈Uitalic_U satisfying the conditions stated in Lemma 3, we need to compute the entire triangulation Del(V)Del𝑉\operatorname{Del}(V)roman_Del ( italic_V ) which is computationally expensive, as we have already mentioned above.

4 Training SMNNs

The novel idea of this paper is to learn the function φU(0)superscriptsubscript𝜑𝑈0{\varphi}_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT for any given UV𝑈𝑉U\subset Vitalic_U ⊂ italic_V, using the gradient descent algorithm, in order to minimize the loss function (φU,φ,v)subscript𝜑𝑈𝜑𝑣\mathcal{L}(\varphi_{\scriptscriptstyle U},\varphi,v)caligraphic_L ( italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_φ , italic_v ) for any vV𝑣𝑉v\in Vitalic_v ∈ italic_V. The following result provides an expression of the gradient of \mathcal{L}caligraphic_L in terms of the functions φUsubscript𝜑𝑈\varphi_{\scriptscriptstyle U}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and φ𝜑\varphiitalic_φ, and the set V𝑉Vitalic_V.

Theorem 1

Let U={u1,,um}𝑈superscript𝑢1normal-…superscript𝑢𝑚U=\{u^{1},\dots,u^{m}\}italic_U = { italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT } be a subset with m𝑚mitalic_m elements taken from a finite set of points Vn𝑉superscript𝑛V\in\mathbb{R}^{n}italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT tagged with labels taken from a set of k𝑘kitalic_k labels. Let φU:Bn|L|normal-:subscript𝜑𝑈normal-→superscript𝐵𝑛𝐿\varphi_{\scriptscriptstyle U}:B^{n}\to|L|italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT : italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → | italic_L | and φ(0):VWknormal-:superscript𝜑0normal-→𝑉superscript𝑊𝑘\varphi^{(0)}:V\to W^{k}italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT : italic_V → italic_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. Let us consider that

{φU(0)(ut)=(p1t,,pkt):t[[1,m]]}conditional-setsuperscriptsubscript𝜑𝑈0superscript𝑢𝑡superscriptsubscript𝑝1𝑡superscriptsubscript𝑝𝑘𝑡𝑡delimited-[]1𝑚\big{\{}\,\varphi_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}(u^{t})=(p_{1}% ^{t},\dots,p_{k}^{t}):\;t\in[\![1,m]\!]\,\big{\}}{ italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) : italic_t ∈ [ [ 1 , italic_m ] ] }

is a set of variables. Then, for vV𝑣𝑉v\in Vitalic_v ∈ italic_V,

(φU,φ,v)pjt=(sjyj)ξt(v)subscript𝜑𝑈𝜑𝑣superscriptsubscript𝑝𝑗𝑡subscript𝑠𝑗subscript𝑦𝑗subscript𝜉𝑡𝑣\frac{\partial\mathcal{L}(\varphi_{\scriptscriptstyle U},\varphi,v)}{\partial p% _{j}^{t}}=(s_{j}-y_{j})\xi_{t}(v)divide start_ARG ∂ caligraphic_L ( italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_φ , italic_v ) end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG = ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v )

where j[[1,k]]𝑗delimited-[]1𝑘j\in[\![1,k]\!]italic_j ∈ [ [ 1 , italic_k ] ], t[[1,m]]𝑡delimited-[]1𝑚t\in[\![1,m]\!]italic_t ∈ [ [ 1 , italic_m ] ], φ(0)(v)=(y1,,yk)superscript𝜑0𝑣subscript𝑦1normal-…subscript𝑦𝑘\varphi^{(0)}(v)=(y_{1},\dots,y_{k})italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_v ) = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and φU(v)=(s1,,sk)subscript𝜑𝑈𝑣subscript𝑠1normal-…subscript𝑠𝑘\varphi_{{\scriptscriptstyle U}}(v)=(s_{1},\dots,s_{k})italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_v ) = ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).

Proof. We have:

(φU,φ,v)pjtsubscript𝜑𝑈𝜑𝑣superscriptsubscript𝑝𝑗𝑡\frac{\partial\mathcal{L}(\varphi_{\scriptscriptstyle U},\varphi,v)}{\partial p% _{j}^{t}}divide start_ARG ∂ caligraphic_L ( italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , italic_φ , italic_v ) end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG =\displaystyle== (h[[1,k]]yhlog(sh))pjtsubscriptdelimited-[]1𝑘subscript𝑦subscript𝑠superscriptsubscript𝑝𝑗𝑡-\frac{\partial\big{(}\sum_{h\in[\![1,k]\!]}y_{h}\;\log(s_{h})\big{)}}{% \partial p_{j}^{t}}- divide start_ARG ∂ ( ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT roman_log ( italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ) end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG
=\displaystyle== h[[1,k]]yhlog(sh)pjisubscriptdelimited-[]1𝑘subscript𝑦subscript𝑠superscriptsubscript𝑝𝑗𝑖-\sum_{\scriptscriptstyle h\in[\![1,k]\!]}y_{h}\;\frac{\partial\log(s_{h})}{% \partial p_{j}^{i}}- ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT divide start_ARG ∂ roman_log ( italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG
=\displaystyle== h[[1,k]]yhlog(sh)zjzjpji.subscriptdelimited-[]1𝑘subscript𝑦subscript𝑠subscript𝑧𝑗subscript𝑧𝑗superscriptsubscript𝑝𝑗𝑖\displaystyle\mbox{$-\sum_{\scriptscriptstyle h\in[\![1,k]\!]}y_{h}\;\frac{% \partial\log(s_{h})}{\partial z_{j}}\frac{\partial z_{j}}{\partial p_{j}^{i}}$% }\,.- ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT divide start_ARG ∂ roman_log ( italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG .

Since sh=ezht[[1,k]]eztsubscript𝑠superscript𝑒subscript𝑧subscript𝑡delimited-[]1𝑘superscript𝑒subscript𝑧𝑡s_{h}=\frac{e^{z_{h}}}{\sum_{t\in[\![1,k]\!]}e^{z_{t}}}italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG then

log(sh)zjsubscript𝑠subscript𝑧𝑗\frac{\partial\log(s_{h})}{\partial z_{j}}divide start_ARG ∂ roman_log ( italic_s start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG =\displaystyle== log(ezht[[1,k]]ezt)zjsuperscript𝑒subscript𝑧subscript𝑡delimited-[]1𝑘superscript𝑒subscript𝑧𝑡subscript𝑧𝑗\frac{\partial\log\big{(}\frac{e^{z_{h}}}{\sum_{t\in[\![1,k]\!]}e^{z_{t}}}\big% {)}}{\partial z_{j}}divide start_ARG ∂ roman_log ( divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG
=\displaystyle== log(ezh)zjlog(t[[1,k]]ezt)zjsuperscript𝑒subscript𝑧subscript𝑧𝑗subscript𝑡delimited-[]1𝑘superscript𝑒subscript𝑧𝑡subscript𝑧𝑗\frac{\partial\log(e^{z_{h}})}{\partial z_{j}}-\frac{\partial\log\big{(}\sum_{% t\in[\![1,k]\!]}e^{z_{t}}\big{)}}{\partial z_{j}}divide start_ARG ∂ roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG - divide start_ARG ∂ roman_log ( ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG
=\displaystyle== zhzj1t[[1,k]]eztt[[1,k]]eztzjsubscript𝑧subscript𝑧𝑗1subscript𝑡delimited-[]1𝑘superscript𝑒subscript𝑧𝑡subscript𝑡delimited-[]1𝑘superscript𝑒subscript𝑧𝑡subscript𝑧𝑗\frac{\partial z_{h}}{\partial z_{j}}-\frac{1}{\sum_{t\in[\![1,k]\!]}e^{z_{t}}% }\;\sum_{\scriptscriptstyle t\in[\![1,k]\!]}\frac{\partial e^{z_{t}}}{\partial z% _{j}}divide start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT divide start_ARG ∂ italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG
=\displaystyle== δhjezjt[[1,k]]ezt=δhjsj.subscript𝛿𝑗superscript𝑒subscript𝑧𝑗subscript𝑡delimited-[]1𝑘superscript𝑒subscript𝑧𝑡subscript𝛿𝑗subscript𝑠𝑗\displaystyle\mbox{$\delta_{hj}-\frac{e^{z_{j}}}{\sum_{t\in[\![1,k]\!]}e^{z_{t% }}}$}=\delta_{hj}-s_{j}\,.italic_δ start_POSTSUBSCRIPT italic_h italic_j end_POSTSUBSCRIPT - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG = italic_δ start_POSTSUBSCRIPT italic_h italic_j end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT .

Besides, since zj=h[[1,m]]ξh(v)pjhsubscript𝑧𝑗subscriptdelimited-[]1𝑚subscript𝜉𝑣superscriptsubscript𝑝𝑗z_{j}=\sum_{\scriptscriptstyle h\in[\![1,m]\!]}\xi_{h}(v)p_{j}^{h}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_m ] ] end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_v ) italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT then

zrpjt=h[[1,m]]ξh(v)prhpjt=ξt(v).subscript𝑧𝑟superscriptsubscript𝑝𝑗𝑡subscriptdelimited-[]1𝑚subscript𝜉𝑣superscriptsubscript𝑝𝑟superscriptsubscript𝑝𝑗𝑡subscript𝜉𝑡𝑣\mbox{$\frac{\partial z_{r}}{\partial p_{j}^{t}}=\sum_{h\in[\![1,m]\!]}\xi_{h}% (v)\frac{\partial p_{r}^{h}}{\partial p_{j}^{t}}=\xi_{t}(v)$}\,.divide start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG = ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_m ] ] end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_v ) divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG = italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ) .

Finally,

(ψ,φ,v)pjt𝜓𝜑𝑣superscriptsubscript𝑝𝑗𝑡\frac{\partial\mathcal{L}(\psi,\varphi,v)}{\partial p_{j}^{t}}divide start_ARG ∂ caligraphic_L ( italic_ψ , italic_φ , italic_v ) end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG =\displaystyle== h[[1,k]]yh(δhjsj)ξt(v)subscriptdelimited-[]1𝑘subscript𝑦subscript𝛿𝑗subscript𝑠𝑗subscript𝜉𝑡𝑣-\sum_{\scriptscriptstyle h\in[\![1,k]\!]}y_{h}(\delta_{hj}-s_{j})\xi_{t}(v)- ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_h italic_j end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v )
=\displaystyle== ξt(v)(h[[1,k]]yhδhjsjh[[1,k]]yh)subscript𝜉𝑡𝑣subscriptdelimited-[]1𝑘subscript𝑦subscript𝛿𝑗subscript𝑠𝑗subscriptdelimited-[]1𝑘subscript𝑦-\xi_{t}(v)\big{(}\sum_{\scriptscriptstyle h\in[\![1,k]\!]}y_{h}\delta_{hj}-s_% {j}\sum_{\scriptscriptstyle h\in[\![1,k]\!]}y_{h}\big{)}- italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ) ( ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_h italic_j end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_h ∈ [ [ 1 , italic_k ] ] end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT )
=\displaystyle== (sjyj)ξt(v).subscript𝑠𝑗subscript𝑦𝑗subscript𝜉𝑡𝑣\displaystyle(s_{j}-y_{j})\xi_{t}(v)\,.( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ) .
\qed

Let us now see how we add trainability to the SMNN 𝒩φUsubscript𝒩subscript𝜑𝑈\mathcal{N}_{\varphi_{\scriptscriptstyle U}}caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT induced by φUsubscript𝜑𝑈\varphi_{\scriptscriptstyle U}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT. Let V𝑉Vitalic_V be the training set and let U𝑈Uitalic_U be a support set lying in the same space as V𝑉Vitalic_V. First, assuming that U={u1,,um}𝑈superscript𝑢1superscript𝑢𝑚U=\{u^{1},\dots,u^{m}\}italic_U = { italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT } has m𝑚mitalic_m elements, then 𝒩φUsubscript𝒩subscript𝜑𝑈\mathcal{N}_{\varphi_{\scriptscriptstyle U}}caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a multiclass perceptron with an input layer with m𝑚mitalic_m neurons that predicts the hhitalic_h-th label for h[[1,k]]delimited-[]1𝑘h\in[\![1,k]\!]italic_h ∈ [ [ 1 , italic_k ] ] using the formula:

𝒩φU(x)=softmax(~ξU(x))subscript𝒩subscript𝜑𝑈𝑥softmax~subscript𝜉𝑈𝑥\mathcal{N}_{\varphi_{\scriptscriptstyle U}}(x)=\operatorname{softmax}\big{(}% \tilde{\mathcal{M}}\cdot\xi_{U}(x)\big{)}caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) = roman_softmax ( over~ start_ARG caligraphic_M end_ARG ⋅ italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) )

where ~=(pjt)j[[1,k]],t[[1.m]]~subscriptsuperscriptsubscript𝑝𝑗𝑡𝑗delimited-[]1𝑘𝑡delimited-[]delimited-[]formulae-sequence1𝑚\tilde{\mathcal{M}}=(p_{j}^{t})_{j\in[\![1,k]\!],\,t\in[\![1.m]\!]}over~ start_ARG caligraphic_M end_ARG = ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ [ [ 1 , italic_k ] ] , italic_t ∈ [ [ 1 . italic_m ] ] end_POSTSUBSCRIPT is a matrix of weights and ξU(x)msubscript𝜉𝑈𝑥superscript𝑚\xi_{U}(x)\in\mathbb{R}^{m}italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is obtained from the barycentric coordinates of xBn𝑥superscript𝐵𝑛x\in B^{n}italic_x ∈ italic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as in Section 3. Let us observe that

softmax(~ξU(x))|L|.softmax~subscript𝜉𝑈𝑥𝐿\operatorname{softmax}\big{(}\tilde{\mathcal{M}}\cdot\xi_{U}(x)\big{)}\in|L|.roman_softmax ( over~ start_ARG caligraphic_M end_ARG ⋅ italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) ) ∈ | italic_L | .

The idea is to modify the initial values of

φU(0)(ut)=(p1t,,pkt)subscriptsuperscript𝜑0𝑈superscript𝑢𝑡superscriptsubscript𝑝1𝑡superscriptsubscript𝑝𝑘𝑡\varphi^{\scriptscriptstyle(0)}_{\scriptscriptstyle U}(u^{t})=(p_{1}^{t},\dots% ,p_{k}^{t})\;italic_φ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) for utUsuperscript𝑢𝑡𝑈u^{t}\in Uitalic_u start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ italic_U and t[[1,m]]𝑡delimited-[]1𝑚t\in[\![1,m]\!]italic_t ∈ [ [ 1 , italic_m ] ],

in order to obtain new values for 𝒩φU(v)subscript𝒩subscript𝜑𝑈𝑣\mathcal{N}_{\varphi_{\scriptscriptstyle U}}(v)caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v ) for vV𝑣𝑉v\in Vitalic_v ∈ italic_V in a way that the error (𝒩φU,φ,v)subscript𝒩subscript𝜑𝑈𝜑𝑣\mathcal{L}(\mathcal{N}_{\varphi_{\scriptscriptstyle U}},\varphi,v)caligraphic_L ( caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_φ , italic_v ) decreases. We will do it by avoiding recomputing Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ) or the barycentric coordinates (b0(v),,bn(v))subscript𝑏0𝑣subscript𝑏𝑛𝑣(b_{0}(v),\dots,b_{n}(v))( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_v ) , … , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_v ) ) for each vV𝑣𝑉v\in Vitalic_v ∈ italic_V during the training process.

In this way, given vV𝑣𝑉v\in Vitalic_v ∈ italic_V, if v|Del(U)|𝑣Del𝑈v\in|\operatorname{Del}(U)|italic_v ∈ | roman_Del ( italic_U ) |, we compute the maximal simplex σ=ui0,,uinDel(U)𝜎superscript𝑢subscript𝑖0superscript𝑢subscript𝑖𝑛Del𝑈\sigma=\langle u^{i_{0}},\dots,u^{i_{n}}\rangle\in\operatorname{Del}(U)italic_σ = ⟨ italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ ∈ roman_Del ( italic_U ) such that v|σ|𝑣𝜎v\in|\sigma|italic_v ∈ | italic_σ | and ih[[1,m]]subscript𝑖delimited-[]1𝑚i_{h}\in[\![1,m]\!]italic_i start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ [ [ 1 , italic_m ] ] for h[[0,n]]delimited-[]0𝑛h\in[\![0,n]\!]italic_h ∈ [ [ 0 , italic_n ] ]. If v|Del(U)|𝑣Del𝑈v\not\in|\operatorname{Del}(U)|italic_v ∉ | roman_Del ( italic_U ) |, we compute wSn𝑤superscript𝑆𝑛w\in S^{n}italic_w ∈ italic_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and the simplex σ=w,ui1,,uin𝜎𝑤superscript𝑢subscript𝑖1superscript𝑢subscript𝑖𝑛\sigma=\langle w,u^{i_{1}},\dots,u^{i_{n}}\rangleitalic_σ = ⟨ italic_w , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_u start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟩ such that v|σ|𝑣𝜎v\in|\sigma|italic_v ∈ | italic_σ | and ih[[1,m]]subscript𝑖delimited-[]1𝑚i_{h}\in[\![1,m]\!]italic_i start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ [ [ 1 , italic_m ] ] for h[[1,n]]delimited-[]1𝑛h\in[\![1,n]\!]italic_h ∈ [ [ 1 , italic_n ] ]. Then we compute the barycentric coordinates b(v)𝑏𝑣b(v)italic_b ( italic_v ) of v𝑣vitalic_v with respect to σ𝜎\sigmaitalic_σ and the point ξU(x)=(ξ1(x),,ξm(x))msubscript𝜉𝑈𝑥subscript𝜉1𝑥subscript𝜉𝑚𝑥superscript𝑚\xi_{U}(x)=(\xi_{1}(x),\dots,\xi_{m}(x))\in\mathbb{R}^{m}italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) = ( italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , … , italic_ξ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT as in Section 3.

Using the gradient descent algorithm, we update the variables pjtsuperscriptsubscript𝑝𝑗𝑡p_{j}^{t}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT for j[[1,k]]𝑗delimited-[]1𝑘j\in[\![1,k]\!]italic_j ∈ [ [ 1 , italic_k ] ] and t[[1,m]]𝑡delimited-[]1𝑚t\in[\![1,m]\!]italic_t ∈ [ [ 1 , italic_m ] ] as follows:

pjt:=pjtη(𝒩φU,φ,v)pjt=pjtη(sjyj)ξt(v)assignsuperscriptsubscript𝑝𝑗𝑡superscriptsubscript𝑝𝑗𝑡𝜂subscript𝒩subscript𝜑𝑈𝜑𝑣superscriptsubscript𝑝𝑗𝑡superscriptsubscript𝑝𝑗𝑡𝜂subscript𝑠𝑗subscript𝑦𝑗subscript𝜉𝑡𝑣p_{j}^{t}:=p_{j}^{t}-\eta\frac{\partial\mathcal{L}(\mathcal{N}_{\varphi_{U},}% \varphi,v)}{\partial p_{j}^{t}}=p_{j}^{t}-\eta(s_{j}-y_{j})\xi_{t}(v)italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT := italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_η divide start_ARG ∂ caligraphic_L ( caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT , end_POSTSUBSCRIPT italic_φ , italic_v ) end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG = italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_η ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v ).

An illustrative picture of the role of each point in a simple two-dimensional binary classification problem is provided in Figure 5. The pseudocode of the method to train SMNNs using Stochastic Gradient Descent is provided in Algorithm 2.

Refer to caption
Figure 5: Two-dimensional synthetic dataset with points divided into two classes: blue and yellow. Triangle-shaped points belong to the test set and square-shaped points belong to the support set U𝑈Uitalic_U. The diamond-shaped point is the vertex on the hypersphere (the blue circumference) used to classify the triangle-shaped point v𝑣vitalic_v (surrounded by a small red circumference) outside the triangulation.
Algorithm 2 Pseudocode of the proposed method to train SMNNs using SGD.
Input: Dataset Vn𝑉superscript𝑛V\subset\mathbb{R}^{n}italic_V ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT surrounded by a hypersphere of radius R𝑅Ritalic_R and a model 𝒩φUsubscript𝒩subscript𝜑𝑈\mathcal{N}_{\varphi_{U}}caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT.
Parameter: η>0𝜂0\eta>0italic_η > 0
Output: The trained model 𝒩φUsubscript𝒩subscript𝜑𝑈\mathcal{N}_{\varphi_{U}}caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT
init ~~\tilde{\mathcal{M}}over~ start_ARG caligraphic_M end_ARG, the matrix of weights of 𝒩φUsubscript𝒩subscript𝜑𝑈\mathcal{N}_{\varphi_{U}}caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT
for vV𝑣𝑉v\in Vitalic_v ∈ italic_V do
     compute ξU(v)subscript𝜉𝑈𝑣\xi_{U}(v)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_v ) as in Section 3
     for each column p𝑝pitalic_p of \mathcal{M}caligraphic_M do
         p:=pη(𝒩φU,φ,v)passign𝑝𝑝𝜂subscript𝒩subscript𝜑𝑈𝜑𝑣𝑝p:=p-\eta\frac{\partial\mathcal{L}(\mathcal{N}_{\varphi_{U}},\varphi,v)}{% \partial p}italic_p := italic_p - italic_η divide start_ARG ∂ caligraphic_L ( caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_φ , italic_v ) end_ARG start_ARG ∂ italic_p end_ARG
     end for
end for
𝒩φU(x):=softmax(~ξU(x))assignsubscript𝒩subscript𝜑𝑈𝑥softmax~subscript𝜉𝑈𝑥\mathcal{N}_{\varphi_{U}}(x):=\operatorname{softmax}(\tilde{\mathcal{M}}\cdot% \xi_{U}(x))caligraphic_N start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) := roman_softmax ( over~ start_ARG caligraphic_M end_ARG ⋅ italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) )
Refer to caption
Figure 6: Table with the five flowers taken from the training set that influence the classification of a given flower in the test set, i.e., the vertices of σ𝜎\sigmaitalic_σ. The SMNN was trained for 1000100010001000 epochs and R6.16𝑅6.16R\approx 6.16italic_R ≈ 6.16.

5 Explainability

In this section, we provide insight into the explainability capability of SMNNs. In the literature, many different approaches can be found to what is an explanation of the prediction of an AI model. In our case, explainability will be provided based on similarities and dissimilarities of the point x𝑥xitalic_x to be explained with the points corresponding to the vertices of the simplex σ𝜎\sigmaitalic_σ containing it. Based on this idea, the barycentric coordinates of x𝑥xitalic_x with respect to σ𝜎\sigmaitalic_σ can be considered indicators of how much a vertex of σ𝜎\sigmaitalic_σ will contribute to the prediction of x𝑥xitalic_x. Specifically, the barycentric coordinates of x𝑥xitalic_x multiplied by the evaluation of the trained map φU(0)superscriptsubscript𝜑𝑈0\varphi_{\scriptscriptstyle U}^{\scriptscriptstyle(0)}italic_φ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT at the vertices of σ𝜎\sigmaitalic_σ encodes the contribution of each vertex of σ𝜎\sigmaitalic_σ to the label assigned to x𝑥xitalic_x by the SMNN.

As an illustration, consider the Iris dataset111https://archive.ics.uci.edu/ml/datasets/iris as a toy example and split it into a training set (75%percent7575\%75 %) and a test set (25%percent2525\%25 %). Since we focus on this section on explainability, let us take U=V𝑈𝑉U=Vitalic_U = italic_V, containing 112 points. Then, initialize pjtsuperscriptsubscript𝑝𝑗𝑡p_{j}^{t}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT with a random value in [0,1]01[0,1][ 0 , 1 ], for j[[1,4]]𝑗delimited-[]14j\in[\![1,4]\!]italic_j ∈ [ [ 1 , 4 ] ] and t[[1,112]]𝑡delimited-[]1112t\in[\![1,112]\!]italic_t ∈ [ [ 1 , 112 ] ].

After the training process, the SMNN reached 92%percent9292\%92 % accuracy and 0.50.50.50.5 loss value on the test set. Once the SMNN is trained, we may be interested in determining why a certain output is given for a specific point x𝑥xitalic_x in the test set.

As mentioned above, the explanation for why the SMNN assigns a label to x𝑥xitalic_x is based on the labels of the vertices of the simplex of Del(U)Del𝑈\operatorname{Del}(U)roman_Del ( italic_U ) containing x𝑥xitalic_x. Therefore, the first step is to find the maximal simplex σ𝜎\sigmaitalic_σ that contains the point x𝑥xitalic_x to be explained. As an example, in Figure 6, the point x=(5.5,2.4,3.8,1.1)4𝑥5.52.43.81.1superscript4x=(5.5,2.4,3.8,1.1)\in\mathbb{R}^{4}italic_x = ( 5.5 , 2.4 , 3.8 , 1.1 ) ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT in the test set is chosen to be explained, to which the SMNN predicts to assign class 2. The coordinates of the five vertices (u26superscript𝑢26u^{26}italic_u start_POSTSUPERSCRIPT 26 end_POSTSUPERSCRIPT, u55superscript𝑢55u^{55}italic_u start_POSTSUPERSCRIPT 55 end_POSTSUPERSCRIPT, u69superscript𝑢69u^{69}italic_u start_POSTSUPERSCRIPT 69 end_POSTSUPERSCRIPT, u84superscript𝑢84u^{84}italic_u start_POSTSUPERSCRIPT 84 end_POSTSUPERSCRIPT and u95superscript𝑢95u^{95}italic_u start_POSTSUPERSCRIPT 95 end_POSTSUPERSCRIPT) of the simplex σ𝜎\sigmaitalic_σ containing x𝑥xitalic_x together with the classes they have been assigned are shown in the table at the bottom of Figure 6. The contribution of the class assigned to each vertex of σ𝜎\sigmaitalic_σ to the class assigned to x𝑥xitalic_x by the SMNN is displayed in the bar chart, and is measured in terms of pjtξt(x)superscriptsubscript𝑝𝑗𝑡subscript𝜉𝑡𝑥p_{j}^{t}\cdot\xi_{t}(x)italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⋅ italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) for j[[0,2]]𝑗delimited-[]02j\in[\![0,2]\!]italic_j ∈ [ [ 0 , 2 ] ] and t{26,55,69,84,95}𝑡2655698495t\in\{26,55,69,84,95\}italic_t ∈ { 26 , 55 , 69 , 84 , 95 }. Let us notice that the contributions can be positive or negative. For example, the vertex with index 95 with the greatest influence on the prediction negatively affected the classification of x𝑥xitalic_x corresponding to the first and third class, but positively to the second class, which is the correct classification. Let us note that the Euclidean distance between points is not the only factor making a vertex of σ𝜎\sigmaitalic_σ contribution greater. That is, even if two vertices are equally close to the point to be explained, they do not contribute the same. For example, the vertices 84848484 and 95959595 are similarly close to the test point, but their contribution is very different in magnitude.

6 Discussion and Limitations

Let us remark that SMNNs are in between an instance-based method and a multilayer perceptron. The previous definition of SMNNs ([4, 5]) shared advantages with instance-based methods such as the k-Nearest Neighbour Algorithm. Some of the advantages are: there was no training phase, it handles easily complex decision boundaries by barycentric subdivisions, it is effective in low-dimensional spaces, it adapts well to local patterns in the data, and the decision-making is transparent and interpretable. However, it was computationally expensive for large high-dimensional datasets due to the Delaunay triangulation computation and the required convex polytope, it suffered from overfitting and lacked generalization ability. The proposed update in this paper provides a substitute for the convex polytope which reduces the number of points needed to compute the Delaunay triangulation. Delaunay computation is also less expensive thanks to the use of a support set U𝑈Uitalic_U.

Nevertheless, one of the main limitations of SMNNs is the need for an input triangulable space. Hence, structured data such as images need to be embedded by, for example, applying a dimensionality reduction technique such as UMAP [10] so that the dataset is a point cloud.

Regarding the explainability of the model, it is not a feature-based explainability such as [11, 12] and, hence, it does not provide direct insight on the importance of the features of the input data. However, it provides instance-based explainability [13]. When predicting input data, the vertices of the simplex where it belongs and their contribution to the classification give a similarity measure inferred by the SMNN which is understandable by an expert.

Table 1: Accuracy score and loss values obtained after training both the SMNN and a feed-forward neural network (FFNN). The experiments were repeated 100100100100 times and the results provided are the mean of the accuracy values of the repetitions. The size m𝑚mitalic_m of the subset considered to compute the Delaunay triangulation also varies in each experiment depending on a parameter κ𝜅\kappaitalic_κ. The FFNN is composed of two hidden layers of size 32323232 and 16161616, respectively, with ReLu activation functions and an output layer with a softmax activation function. The datasets used are synthetic binary-class datasets with n=2,3,4,5𝑛2345n=2,3,4,5italic_n = 2 , 3 , 4 , 5 features.
SMNN FFNN
n𝑛nitalic_n κ𝜅\kappaitalic_κ m𝑚mitalic_m Acc. Loss Acc. Loss
2 1000 3560 0.87 0.64 0.91 0.23
100 1282 0.90 0.51
50 626 0.9 0.42
10 53 0.87 0.33
3 1000 3750 0.76 0.66 0.8 0.61
100 3664 0.76 0.66
50 3252 0.77 0.65
10 413 0.81 0.5
4 50 3728 0.69 0.67 0.72 0.69
10 1410 0.73 0.64
5 316 0.73 0.57
2 26 0.72 0.56
5 50 3743 0.77 0.66 0.8 0.91
10 1699 0.81 0.63
5 323 0.8 0.52
2 17 0.74 0.53

7 Experiments

In this section, we provide experiments that show the performance of SMNNs. In all the experiments, we split the given dataset into a training set and a test set composed of 75% and 25% of the dataset, respectively. The datasets used for experimentation are (1) a two-dimensional binary classification spiral synthetic dataset, and (2) dimension-varying binary classification synthetic datasets composed of different noisy clusters for each class (we refer to [14] for a specific description of how data is generated). All experiments were developed using a 3.70 GHz AMD Ryzen 9 5900X 12-Core Processor.

In the first two experiments, ε𝜀\varepsilonitalic_ε-representative subsets of the training set are used as the support set U𝑈Uitalic_U for different values of ε𝜀\varepsilonitalic_ε. The notion of ε𝜀\varepsilonitalic_ε-representative sets was introduced in [15]. Specifically, a support set U𝑈Uitalic_U is ε𝜀\varepsilonitalic_ε-representative of a set V𝑉Vitalic_V if, for any vV𝑣𝑉v\in Vitalic_v ∈ italic_V, there exists uU𝑢𝑈u\in Uitalic_u ∈ italic_U such that the distance between u𝑢uitalic_u and v𝑣vitalic_v is less than ε𝜀\varepsilonitalic_ε.

Let us now describe the methodology of each experiment.

First experiment: we consider a two-dimensional spiral dataset for binary classification composed of 400400400400 two-dimensional points. We selected three different values of ε𝜀\varepsilonitalic_ε obtaining three ε𝜀\varepsilonitalic_ε-representative sets (the support sets) of size 5, 9 and 95, respectively. In Figure 7, the spiral dataset and the three different support sets with the associated Delaunay triangulation are shown. In this case, we observed that the accuracy of the SMNNs increases with the size of the support set. We can also appreciate that the topology of the dataset is characterized by the support set, which we guess is responsible for the successful classification.

Refer to caption
(a) Using a support set of 5555 points, the SMNN reaches 80%percent8080\%80 % test accuracy.
Refer to caption
(b) Using a support set of 9999 points, the SMNN reaches 93%percent9393\%93 % test accuracy.
Refer to caption
(c) Using a support set of 95959595 points, the SMNN reaches 99%percent9999\%99 % test accuracy.
Figure 7: Spiral dataset for binary classification. On each figure, the points in the support set are square-shaped, the points in the test set are triangle-shaped and the points in the training set are circle-shaped.
Refer to caption
(a) ε𝜀\varepsilonitalic_ε-Representative set.
Refer to caption
(b) Outlier-robust representative landmark set.
Refer to caption
(c) Outlier-robust vital landmark set.
Figure 8: Examples of support sets obtained using the different methods proposed in the third experiment.

Second experiment: we consider four synthetic datasets of size 5000500050005000 using the adaptation of [16] in the scikit-learn [17] implementation. The four datasets generated have, respectively, n=2,3,4𝑛234n=2,3,4italic_n = 2 , 3 , 4 and 5555 features. Then, the corresponding training set V𝑉Vitalic_V obtained from each dataset and a fully connected two-hidden-layer feed-forward (32×16321632\times 1632 × 16) neural network (FNNN) with ReLU activation functions were considered.

To train the SMNN, we use four different ε𝜀\varepsilonitalic_ε-representative sets of each V𝑉Vitalic_V. In our experiments, the different values of ε𝜀\varepsilonitalic_ε are calculated as the maximum distance from the origin to the farthest point in the dataset plus 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG and divided by a parameter κ𝜅\kappaitalic_κ. The different values for κ𝜅\kappaitalic_κ considered are 1000,100,50,10100010050101000,100,50,101000 , 100 , 50 , 10. Using the different values of κ𝜅\kappaitalic_κ we then obtain support sets U𝑈Uitalic_U of different sizes m𝑚mitalic_m. For example, for n=2𝑛2n=2italic_n = 2 and k=1000𝑘1000k=1000italic_k = 1000, we obtain a support set U𝑈Uitalic_U of size m=3560𝑚3560m=3560italic_m = 3560. The sizes m𝑚mitalic_m of the support sets U𝑈Uitalic_U generated for n=2,3,4,5𝑛2345n=2,3,4,5italic_n = 2 , 3 , 4 , 5 and κ=1000,100,50,10𝜅10001005010\kappa=1000,100,50,10italic_κ = 1000 , 100 , 50 , 10 are provided in Table 1.

First, the SMNN was trained using the gradient descent algorithm and the cross-entropy loss function during 500500500500 epochs for n=2,3,4,5𝑛2345n=2,3,4,5italic_n = 2 , 3 , 4 , 5 and κ=1000,100,50,10𝜅10001005010\kappa=1000,100,50,10italic_κ = 1000 , 100 , 50 , 10. Besides, the two-hidden-layer feed-forward neural network FFNN was trained using the Adam training algorithm [18] for n=2,3,4,5𝑛2345n=2,3,4,5italic_n = 2 , 3 , 4 , 5. Both training processes were carried out for 500500500500 epochs. The accuracy and loss results provided in Table 1 are the mean of 100100100100 repetitions. We can observe that both the SMNN and the FFNN have similar performance, but the SMNN generally reaches lower loss values. The variance in the results was on the order of 108superscript10810^{-8}10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT to 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT in the case of the SMNN and of 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT to 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT in the case of the FFNN.

Third experiment: we compare the performance of the SMNN depending on the choice of the support set U𝑈Uitalic_U. To do this, we applied two different methods to choose U𝑈Uitalic_U. On the one hand, we use the ε𝜀\varepsilonitalic_ε-representative sets previously computed in the first experiment for different values of κ𝜅\kappaitalic_κ. On the other hand, we use the two outlier-robust subsampling methods presented in [19] for different values of Topological Density (TD). In Figure 8, examples of different support sets computed using the three different approaches are shown. Let us remark that the outlier-robust subsampling method can be tuned to keep outliers, which the authors call vital landmark set, or not, obtaining a representative landmark set. The two methods were tested on synthetic datasets composed of 1000100010001000 points. The SMNN was trained for 1000100010001000 epochs using the 75%percent7575\%75 % of the dataset and tested in the remaining 25%percent2525\%25 %. The number of features n𝑛nitalic_n of each dataset considered, the size m𝑚mitalic_m of each support set computed, and the mean accuracy and loss results of 5555 repetitions when training the SMNN are provided in Table 2. In Table 3, the time for the computation of the barycentric coordinates is provided. As we can see, the time of execution does not have to be directly related to the size of the support set and it may increase if many of the points to be evaluated are outside the Delaunay triangulation of U𝑈Uitalic_U. In this experiment, the results suggest that ε𝜀\varepsilonitalic_ε-representative datasets provide better results than the other support sets. However, thorough theoretical studies should be developed to confirm the last statement. Such a study is outside of the scope of this paper.

Table 2: Accuracy score and loss values obtained after training the SMNN. The size m𝑚mitalic_m of the subset considered to compute the Delaunay triangulation varies in each experiment depending on a parameter κ𝜅\kappaitalic_κ (for ε𝜀\varepsilonitalic_ε-representative sets) or a parameter TD (for landmark sets). The datasets used are synthetic binary-class datasets with n=2,3,4𝑛234n=2,3,4italic_n = 2 , 3 , 4 features.
Sampling method Parameter n=2𝑛2n=2italic_n = 2 n=3𝑛3n=3italic_n = 3 n=4𝑛4n=4italic_n = 4
m𝑚mitalic_m Acc. Loss m𝑚mitalic_m Acc. Loss m𝑚mitalic_m Acc. Loss
Representative sets κ=10𝜅10\kappa=10italic_κ = 10 48 0.94 0.22 109 0.94 0.29 407 0.9 0.53
κ=50𝜅50\kappa=50italic_κ = 50 371 0.94 0.47 730 0.95 0.53 748 0.87 0.6
κ=100𝜅100\kappa=100italic_κ = 100 570 0.93 0.54 747 0.96 0.56 750 0.87 0.6
ORS Representative landmark sets TD=0.1 75 0.79 0.49 75 0.9 0.35 75 0.85 0.46
TD=0.4 300 0.85 0.57 300 0.86 0.5 300 0.87 0.62
TD=0.6 450 0.86 0.52 450 0.89 0.52 450 0.86 0.55
TD=0.8 600 0.89 0.56 600 0.92 0.52 600 0.86 0.58
Vital landmark sets TD=0.1 75 0.93 0.27 75 0.84 0.4 75 0.88 0.37
TD=0.4 300 0.93 0.37 300 0.88 0.46 300 0.9 0.45
TD=0.6 450 0.91 0.44 450 0.96 0.44 450 0.89 0.53
TD=0.8 600 0.93 0.5 600 0.95 0.52 600 0.87 0.57
Table 3: Time in seconds for the ξU(x)subscript𝜉𝑈𝑥\xi_{U}(x)italic_ξ start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_x ) computation for the experiments in Table 2. The values are the mean of 5 iterations. Let us remark that higher values can be expected when increasing the number of points outside the Delaunay triangulation of the support set.
Sampling method Parameter n=2𝑛2n=2italic_n = 2 n=3𝑛3n=3italic_n = 3 n=4𝑛4n=4italic_n = 4
Representative sets κ=10𝜅10\kappa=10italic_κ = 10 0.08 0.14 0.54
κ=50𝜅50\kappa=50italic_κ = 50 0.12 0.22 0.79
κ=100𝜅100\kappa=100italic_κ = 100 0.14 0.24 0.79
ORS Representative landmark sets TD=0.1 1.04 2.89 21.75
TD=0.4 0.33 0.29 7
TD=0.6 0.12 0.37 3.03
TD=0.8 0.1 0.23 1.29
Vital landmark sets TD=0.1 0.17 3.22 3.87
TD=0.4 0.15 2.31 1.69
TD=0.6 0.08 0.53 1.17
TD=0.8 0.15 0.27 0.82

8 Conclusions

The balance between efficiency and explainability will be one of the major problems of AI in the next years. Although AI models based on network architectures and backpropagation algorithms are currently among the most successful models, they are far from providing a human-readable explanation of their outputs. On the other hand, simpler models not based on gradient descent methods usually do not provide a comparable level of performance. In this way, a trainable version of SMNNs provides a new step in filling the gap between both approaches.

Simplicial map neural networks provide a combinatorial approach to artificial intelligence. Its simplicial-based definition provides nice properties, such as easy construction and robustness capability against adversarial examples.

In this work, we have extended its definition to provide a trainable version of this architecture. The training process is based on a local search and links this model based on simplices with the most efficient methods in AI. Moreover, we have demonstrated in this paper that such a simplicial-based construction provides a human-understandable explanation of the decision.

The ideas presented in this paper can be extended in many different ways. In future work, we intend to study less data-dependent approaches so that the Delaunay triangulation is needless. Besides, this architecture should be extended to Deep Learning models so that it can be applied to more complex classification problems such as image classification and provides extra explainability to Deep Learning models.

Code availability

The code is available in the GitHub repository: https://github.com/Cimagroup/TrainableSMNN.

9 Aknowledgments

The work was supported in part by the European Union HORIZON-CL4-2021-HUMAN-01-01 under grant agreement 101070028 (REXASI-PRO) and by TED2021-129438B-I00 / AEI/10.13039/501100011033 / Unión Europea NextGenerationEU/PRTR

References

  • Graziani et al. [2022] M. Graziani, L. Dutkiewicz, D. Calvaresi, A global taxonomy of interpretable ai: unifying the terminology for the technical and social sciences, Artif Intell Rev (2022). doi:10.1007/s10462-022-10256-8.
  • Molnar [2022] C. Molnar, Interpretable Machine Learning, 2 ed., Independently published, 2022. URL: https://christophm.github.io/interpretable-ml-book.
  • Murdoch et al. [2019] W. J. Murdoch, C. Singh, K. Kumbier, et al, Definitions, methods, and applications in interpretable machine learning, PNAS (2019). URL: https://www.pnas.org/doi/full/10.1073/pnas.1900654116.
  • Paluzo-Hidalgo et al. [2020] E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo, Two-hidden-layer feed-forward networks are universal approximators: A constructive approach, Neural Netw 131 (2020) 29–36. doi:10.1016/j.neunet.2020.07.021.
  • Paluzo-Hidalgo et al. [2021a] E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo, J. Heras, Simplicial-map neural networks robust to adversarial examples, Mathematics 9 (2021a) 169. doi:10.3390/math9020169.
  • Paluzo-Hidalgo et al. [2021b] E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo, J. Heras, Optimizing the simplicial-map neural network architecture, J Imaging 7 (2021b) 173. doi:10.3390/jimaging7090173.
  • Guliyev and Ismailov [2018] N. J. Guliyev, V. Ismailov, On the approximation by single hidden layer feedforward neural networks with fixed weights, Neural Networks 98 (2018) 296–304. doi:10.1016/j.neunet.2017.12.007.
  • Edelsbrunner and Harer [2010] H. Edelsbrunner, J. Harer, Computational Topology - an Introduction., Am Math Soc, 2010.
  • Boissonnat et al. [2018] J. Boissonnat, F. Chazal, M. Yvinec, Geometric and Topological Inference, Cambridge Texts in Applied Maths, Cambridge Univ Press, 2018. doi:10.1017/9781108297806.
  • McInnes et al. [2018] L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for dimension reduction, 2018. URL: http://arxiv.org/abs/1802.03426.
  • Das and Wiese [2023] P. P. Das, L. Wiese, Explainability based on feature importance for better comprehension of machine learning in healthcare, in: A. Abelló, P. Vassiliadis, O. Romero, R. Wrembel, F. Bugiotti, J. Gamper, G. Vargas Solar, E. Zumpano (Eds.), New Trends in Database and Information Systems, Springer Nature Switzerland, Cham, 2023, pp. 324–335.
  • Nguyen et al. [2019] A. Nguyen, J. Yosinski, J. Clune, Understanding Neural Networks via Feature Visualization: A Survey, Springer International Publishing, Cham, 2019, pp. 55–76. doi:10.1007/978-3-030-28954-6_4.
  • Kong and Chaudhuri [2021] Z. Kong, K. Chaudhuri, Understanding instance-based interpretability of variational auto-encoders, in: A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems, 2021. URL: https://openreview.net/forum?id=a5-37ER8qTI.
  • Guyon [2003] I. M. Guyon, Design of experiments for the nips 2003 variable selection benchmark, 2003. URL: https://archive.ics.uci.edu/ml/machine-learning-databases/dorothea/Dataset.pdf.
  • Paluzo-Hidalgo et al. [2022] E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo, Topology-based representative datasets to reduce neural network training resources, Neural Comput & Applic 34 (2022) 14397–14413. doi:10.1007/s00521-022-07252-y.
  • Subramanian [2003] I. R. Subramanian, Design of experiments for the nips 2003 variable selection benchmark, in: NeurIPS Proceedings, 2003.
  • Pedregosa et al. [2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830.
  • Kingma and Ba [2015] D. Kingma, J. Ba, Adam: A method for stochastic optimization, in: Y. Bengio, Y. LeCun (Eds.), 3rd Int Conf on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL: http://arxiv.org/abs/1412.6980.
  • Stolz [2023] B. J. Stolz, Outlier-robust subsampling techniques for persistent homology, Journal of Machine Learning Research 24 (2023) 1–35. URL: http://jmlr.org/papers/v24/21-1526.html.