Trainable and Explainable Simplicial Map Neural Networks
Eduardo Paluzo-Hidalgo
[email protected]Department of Quantitative Methods, Universidad Loyola Andalucía, Campus Sevilla, Dos Hermanas, Seville, Spain
Rocio Gonzalez-Diaz
[email protected][
Department of Applied Mathematics I, School of engineering, University of Seville, Seville, Spain
Miguel A. Gutiérrez-Naranjo
[email protected][
Department of Computer Science and Artificial Intelligence, School of Engineering, University of Seville, Seville, Spain
Abstract
Simplicial map neural networks (SMNNs) are topology-based neural networks with interesting properties such as universal approximation ability
and robustness to adversarial examples under appropriate conditions.
However, SMNNs present some bottlenecks for their possible application in high-dimensional datasets.
First, SMNNs have precomputed fixed weight and no SMNN training process has been defined so far, so they lack generalization ability. Second, SMNNs require the construction of a convex polytope surrounding the input dataset.
In this paper, we overcome these issues by proposing an SMNN training procedure based on a support subset of the given dataset and replacing the construction of the convex polytope by a method based on projections to a hypersphere.
In addition, the explainability capacity of SMNNs and effective implementation are also newly introduced in this paper.
keywords:
Training neural network \sepSimplicial maps \sep\sepExplainable artificial intelligence
1 Introduction
In recent years, Artificial Intelligence (AI) methods in general and Machine Learning methods in particular have reached success in real-life problems that were unexpected only a few years ago. Many different areas have contributed to this development. Among them, we can cite the research on new theoretical algorithms, the increasing computational power of the latest generation hardware, and the rapid access to a huge amount of data. Such a combination of factors leads to the development of increasingly complex self-regulated AI methods.
Many AI models currently used are based on backpropagation algorithms, which train and regulate themselves to achieve a goal, such as classification, recommendation, or prediction. These self-regulating models achieve some kind of knowledge as they successfully evaluate test data independent of the data used to train them. Nonetheless, such knowledge is usually expressed in a non-human-readable way.
To fill the gap between the recent development of AI models and their social use, many researchers have focused on the development of Explainable Artificial Intelligence (XAI), which consists of a set of techniques to provide clear, understandable, transparent, intelligible, trustworthy, and interpretable explanations of the decisions, predictions, and reasoning processes made by the AI models, rather than just presenting their output, especially in domains where AI decisions can have significant consequences on human life.
A global taxonomy of interpretable AI with the aim of unifying terminology to achieve clarity and efficiency in the definition of regulations for the development of ethical and reliable AI can be found in [1].
Moreover, a nice introduction and general vision can be found in [2]. Another clarifying paper with definitions, concepts, and applications of XAI is [3].
The so-called Simplicial Map Neural Networks (SMNNs) were introduced in [4] as a constructive approach to the problem of approximating a continuous function on a compact set in a triangulated space. Since the original aim of the definition of SMNNs was focused on building a constructive approach, the computation of their weights was not based on an optimization process, as usual in neural networks, but on a deterministic calculus. The architecture of SMNNs and the computation of the set of weights are based on a combinatorial topology tool called simplicial maps.
Moreover, SMNNs can be used for classification and can be constructed to be robust to adversarial examples [5].
Besides, their architecture can be reduced while maintaining accuracy [6], being invariant to transformation if the transformation preserves the barycentric coordinates (scale, rotation, symmetries, etc.).
As defined in [4, 6, 5], SMNNs are built as a two-hidden-layer feed-forward network where the set of weights is precomputed based on the calculation of a triangulated convex polytope surrounding the input data.
As other approximations to continuous functions with arbitrary precision (see, for example, [7]), SMNNs have fixed weights, which means
that the weights depend only on the triangulation made with the points of the dataset as the support set and no training process is applied.
Summing up, some of the limitations of the SMNNs until now are that they are costly to calculate since the number of neurons is proportional to the number of simplices of the triangulation supported on the input dataset, and they suffer from overfitting and therefore not generalize well. These aspects make SMNNs not used in practice so far, although the idea of relating simplicial maps to neural networks is disruptive and provides a new bridge that can enrich both areas.
In this paper, we propose a method to make SMNNs efficient by reducing
their size (in terms of the number of neurons that depends on the vertices of the triangulation) and that successfully makes SMNNs trainable and with generalization ability.
Besides, we also present a study of the selection of the vertices from which we obtain the triangulation.
Although SMNNs consider the vertices of a simplex as part of the necessary information for the classification task, the approach presented in this paper is far from the classic Machine Learning instance-based methods. Such methods rely on a deterministic computation based on distances, but, in the approach presented in this paper, the computation of the weights is the result of an optimization method in a probability distribution space.
Finally, from an XAI point of view, we will see in this paper that SMNNs are explainable models since all decision steps to compute the output of SMNNs are understandable and transparent, and therefore trustworthy.
The paper is organized as follows. First, some concepts of computational topology and the definition of SMNNs are recalled in Section 2. Next, in Section 3 we develop several technical details needed for the SMNN training process, which will be introduced in Section 4.
Section 5 is devoted to the explainability of the model.
Section 6 is devoted to discussion and limitations.
Finally, the paper ends with some experiments and conclusions.
2 Background
In this section, we assume that the reader is familiar with the basic concepts of computational topology. For a comprehensive presentation, we refer to [8].
2.1 Simplicial complexes
Consider a finite set of points
whose elements will be called vertices.
A subset
of with vertices (in general position) is called a -simplex.
The convex hull of the vertices of will be denoted by and corresponds to the set:
where for ,
and
are called the barycentric coordinates of with respect to , and satisfy that:
The barycentric coordinates of can be interpreted as masses placed at the vertices of so is the centre of mass. All these masses are positive if and only if is inside . For example, let us consider the 1-simplex
which is composed of two vertices of .
Then is the set of points in corresponding to the edge with endpoints and , and if, for example, then is the midpoint of .
A simplicial complex with vertex set consists of a finite collection of simplices satisfying that if then either for some or any face (that is, a nonempty subset) of is a simplex of . Furthermore, if then or for some .
The set will be denoted by .
A maximal simplex of is a simplex that is not the face of any other simplex of .
If the maximal simplices of are all -simplices then is called a pure -simplicial complex.
These concepts are illustrated in Figure 1.
Figure 1: On the left, two triangles that do not intersect in a common face (an edge or a vertex).
On the right, the geometric representation of a pure 2-simplicial complex composed of three maximal -simplices (the triangles , and ). The edge is a common face of and . The edge is a face of .
The barycentric coordinates of with respect to the simplicial complex are defined as the barycentric coordinates of with respect to such that .
Let us observe that the barycentric coordinates of are unique.
An example of simplicial complexes is the Delaunay triangulation defined from the Voronoi diagram of a given finite set of vertices . The following result
extracted from [9, page 48] is just an alternative definition of Delaunay triangulations.
The empty ball property
[9]:
Any subset of is a simplex of if and only if has a circumscribing open ball empty of points of .
2.2 Simplicial maps
Let be a pure -simplicial complex and a pure -simplicial complex with vertex sets and , respectively.
The map is called a vertex map if it satisfies that the set obtained from after removing duplicated vertices is a simplex in whenever is a simplex in .
The vertex map always induces a continuous function, called a simplicial map , which is defined as follows. Let be the barycentric coordinates of with respect to .
Then
Let us observe that
if .
A special kind of simplicial map used to solve classification tasks will be introduced in the next subsection.
2.3 Simplicial maps for classification tasks
Next, we will show how a simplicial map can be used to solve a classification problem (see [5] for details). From now on, we will assume that the input dataset is a finite set of points in together with a set of labels such that each is tagged with a label
taken from .
Firstly, the intuition is that the space surrounding the dataset is labelled as unknown.
For this, we add a new label to , called unknown label, and
a one-hot encoding representation of these labels being:
where the one-hot vector encodes the -th label of for and where encodes the unknown label.
We now consider a convex polytope with a vertex set surrounding the set . The polytope always exists since is finite.
Next, we define a map mapping each vertex tagged with label to the one-hot vector in that encodes the label .
The vertices of are sent to the vertex .
Observe that is a vertex map.
Let denote the simplicial complex with vertex set consisting of only one maximal -simplex and let denote the Delauney triangulation computed for the set of points .
Note that .
The simplicial map is induced by the vertex map as explained in Subsection 2.2.
Remark 1
The space can be interpreted as the discrete probability distribution space with variables.
As an example, in Figure 2, on the left, we can see a dataset with four points , labelled red and blue.
The green points are the vertices of a convex polytope containing and are sent by to the green vertex on the right.
The simplicial complex is drawn on the left and consists of ten maximal 2-simplices.
On the right, the simplicial complex consists of one maximal 2-simplex. The dotted arrows illustrate some examples of .
Figure 2: Illustration of a simplicial map for a classification task.
2.4 Simplicial map neural networks
Artificial neural networks can be seen as parametrized real-valued mappings between multidimensional spaces. Such mappings are the composition of several maps (usually many of them) that can be structured in layers.
In [5], the simplicial map defined above was represented as a two-hidden-layer feed-forward neural network .
This kind of artificial neural network is called simplicial map neural network (SMNN).
In the original definition [5], the first hidden layer of an SMNN computes the barycentric coordinates of .
However, we will see here that if we precompute the barycentric coordinates, we can simplify the architecture of as follows.
As before, consider an input dataset consisting of a finite set endowed with a set of labels and a convex polytope with a set of vertices surrounding . Let be the Delaunay triangulation with vertex set
Then, is a vertex map. Let us assume that, given , we precompute the barycentric coordinates of with respect to the -simplex such that , and that we also precompute the vector
satisfying that,
for , if
for some .
Let us remark that should be a column vector, but we will use row notation, for simplicity.
The SMNN induced by that predicts the -label of , for , has the following architecture:
•
The number of neurons in the input layer is .
•
The number of neurons in the output layer is .
•
The set of weights is represented as a matrix such that
the -th column of is
for .
Then,
Observe that as defined so far, the SMNN weights are precomputed.
Furthermore, the computation of the barycentric coordinates of the points around implies the calculation of the convex polytope surrounding .
Finally, the computation of the Delaunay triangulation is costly if has many points since its time complexity is
(see [9, Chapter 4]).
In the next sections, we will propose some techniques to overcome the SMNN construction drawbacks while maintaining its advantages.
We will see that one way to overcome the computation of the convex polytope is to consider a hypersphere instead.
We will also see how to avoid the use of the artificially created unknown label.
Furthermore, to reduce the cost of Delaunay computation and add trainability to to avoid overfitting, a subset will be considered.
The set will be used to train and test a map .
Such a map will induce a continuous function which approximates .
3 The unknown boundary and the function
In this section, we will see how to compute a function that approximates the simplicial map and avoids the computation of the convex polytope and the artificial consideration of the unknown label, reducing, at the same time, the computation of the Delaunay triangulation used to construct SMNNs.
The general description of the methodology is described in Algorithm 1.
First, let us compute a hypersphere surrounding .
One of the simplest ways to do that is to translate so that its center of mass is placed at the origin .
Then, the hypersphere
such that satisfies that surrounds .
Second, let us assume that we have selected a subset (we will compare different strategies to select in Section 7) and that we have computed .
Then, we have that
Let us consider the boundary of , denoted as , which consists of the set of -simplices that are faces of exactly one maximal simplex of .
Now, let us define for any as follows.
Given , to find the -simplex such that , we have to consider two cases: and .
If then is the -simplex in such that .
If then is a new -simplex defined by the vertices of an -simplex of and a new vertex consisting of the projection of to .
Specifically, if then is computed in the following way:
1.
Consider the set
where
is the vector normal to the hyperplane
containing
,
,
and
such that .
2.
Compute the point .
3.
Find such that
and .
Observe that, by construction,
always exists since is a convex polytope.
Now, let be the barycentric coordinates of with respect to .
Then,
is the point in satisfying that,
for ,
Observe that always exists and is unique. An example of points and and simplex is shown in Figure 3 and Example 1.
Let us observe that, thanks to the new definition of for , if we have a map
then it induces a continuous function defined for any as:
where for ,
Let us observe that, to obtain a categorical distribution from , we could just divide each of its coordinates by the total sum. However, is introduced here to obtain a simplified formula for the gradient descent algorithm as shown in Theorem 1.
Figure 3: An example of the point computed from and the -simplex such that
for .
Example 1
Let us consider
with label for with and
for with .
Firstly, we translate so that the center of mass of is the origin .
The translated dataset is
Let us consider and
Hence, the translated input data is and .
To simplify the explanation of the method, consider , that is, for .
Then, the maximal simplices of are and
•
The matrix is:
•
Since the barycentric coordinates of with respect to are then is in and .
Then
•
On the other hand, the point is outside
.
Assuming that, for example, we have fixed , we add a new simplex where which is the projection of to the hypersphere of radius centered in the origin. See Figure 4.
Then, the barycentric coordinates of with respect to is and then , concluding that
Figure 4:
The relative positions of the vertices for and the points and of Example 1.
The pseudocode for computing is provided in Algorithm 1.
Algorithm 1 Pseudocode to compute for and a subset of an input dataset surrounded by a hypersphere of radius
Input:
labelled using a set of labels , a radius , and a point .
Output: The value of
compute
for
init an empty matrix
init an empty vector
fordo
if is the label of then
add as a column of
endif
endfor
for maximal simplex of do
compute
if for all then
stop:compute
endif
endfor
if is empty then
compute , , and
compute with respect to and
endif
The following property holds.
Lemma 1 (Continuity)
Let . Then,
Proof. If then the result holds due to the continuity of the barycentric coordinates transformation. If , since the origin , then . Therefore, for close to , and .
Besides,
and therefore
concluding the proof.
\qed
Lemma 2 (Consistence)
Let be the map defined in Subsection 2.3.
If
and for all then
Proof. Let us observe that if
and for all then, for any , we have that:
.
Then, and
\qed
One of the keys to our study is the identification of the points of allocated inside a given simplex, with the set of all probability distributions with support values.
In this way, the barycentric coordinates of a point can be seen as a probability distribution.
From this point of view, given , then and are both in the set of probability distributions with support points.
This is why the categorical cross-entropy loss function will be used to compare the similarity between and .
Specifically, for , is defined as:
where and
.
The following lemma establishes a specific set and a function such that for all .
Lemma 3 (-optimum simplicial map)
Let be a subset of satisfying, for all , that:
1.
, and
2.
if such that and then .
Then,
Proof. As proved in [6],
under the assumptions stated in this lemma, we have that, for all :
and then .
\qed
Unfortunately, to compute the subset satisfying the conditions stated in
Lemma 3, we need to compute the entire triangulation which is computationally expensive, as we have already mentioned above.
4 Training SMNNs
The novel idea of this paper is to learn the function for any given , using the gradient descent algorithm, in order to minimize the loss function
for any .
The following result provides an expression of the gradient of in terms of the functions and ,
and the set .
Theorem 1
Let be a subset with elements taken from a finite set of points tagged with labels taken from a set of labels. Let and .
Let us consider that
is a set of variables.
Then,
for ,
where ,
,
and .
Proof. We have:
Since
then
Besides, since
then
Finally,
\qed
Let us now see how we add trainability to the SMNN induced by . Let be the training set and let be a support set lying in the same space as .
First, assuming that has elements, then is a multiclass perceptron
with an input layer with neurons that predicts the -th label for using the formula:
where is a matrix of weights and is obtained from the barycentric coordinates of as in Section 3.
Let us observe that
The idea is to modify the initial values of
for and ,
in order to obtain new values for for in a way that the error decreases.
We will do it by avoiding recomputing or the barycentric coordinates for each during the training process.
In this way, given , if , we compute the maximal simplex such that and for . If , we compute and the simplex
such that and for .
Then we compute the barycentric coordinates of with respect to and the point as in Section 3.
Using the gradient descent algorithm, we update the variables for and as follows:
.
An illustrative picture of the role of each point in a simple two-dimensional binary classification problem is provided in Figure 5.
The pseudocode of the method to train SMNNs using Stochastic Gradient Descent
is provided in Algorithm 2.
Figure 5: Two-dimensional
synthetic dataset with points divided into two classes: blue and yellow. Triangle-shaped points belong to the test set and square-shaped points belong to the support set .
The diamond-shaped point is the vertex on the hypersphere (the blue circumference) used to classify the triangle-shaped
point (surrounded by a small red circumference) outside the triangulation. Algorithm 2 Pseudocode of the proposed method to train SMNNs using SGD.
Input:
Dataset surrounded by a hypersphere of radius and a model .
Figure 6: Table with the five flowers taken from the training set
that influence the classification of a given flower in the test set, i.e., the vertices of . The SMNN was trained for epochs and .
5 Explainability
In this section, we provide insight into the explainability capability of SMNNs. In the literature, many different approaches can be found to what is an explanation of the prediction of an AI model. In our case, explainability will be provided based on similarities and dissimilarities of the point to be explained with the points corresponding to the vertices of the simplex containing it. Based on this idea, the barycentric coordinates of
with respect to
can be considered indicators of how much a vertex of
will contribute to the prediction of .
Specifically,
the barycentric coordinates of multiplied by the evaluation of
the trained map
at the vertices of encodes the contribution of each vertex of
to the label assigned to by the SMNN.
As an illustration, consider the Iris dataset111https://archive.ics.uci.edu/ml/datasets/iris as a toy example and split it into a training set () and a test set (). Since we focus on this section on explainability, let us take , containing 112 points.
Then, initialize with a random value in , for
and .
After the training process,
the SMNN reached accuracy and loss value on the test set.
Once the SMNN is trained,
we may be interested in determining why a certain output is given for a specific point in the test set.
As mentioned above, the explanation for why the SMNN assigns a label to is based on the labels of the vertices of the simplex of containing .
Therefore, the first step is to find the maximal simplex
that contains the point
to be explained.
As an example, in Figure 6, the point
in the test set is chosen to be explained,
to which the SMNN predicts to assign class 2.
The coordinates of the five
vertices (, , , and ) of the simplex containing
together with the classes they have been assigned
are shown in the table at the bottom of Figure 6.
The contribution of the class assigned to each vertex of to
the class assigned to by the SMNN is displayed in the bar chart, and is measured in terms of for and .
Let us notice that the contributions can be positive or negative.
For example,
the vertex with index 95 with the greatest influence
on the prediction
negatively affected the classification of corresponding to the first and third class, but positively to the second class,
which is the correct classification. Let us note that the Euclidean distance between points is not the only factor making a vertex of contribution greater. That is,
even if two vertices
are equally close to the point to be explained, they do
not contribute the same.
For
example, the
vertices and are
similarly close to the test point, but their contribution is very different in magnitude.
6 Discussion and Limitations
Let us remark that SMNNs are in between an instance-based method and a multilayer perceptron. The previous definition of SMNNs ([4, 5]) shared advantages with instance-based methods such as the k-Nearest Neighbour Algorithm. Some of the advantages are: there was no training phase, it handles easily complex decision boundaries by barycentric subdivisions, it is effective in low-dimensional spaces, it adapts well to local patterns in the data, and the decision-making is transparent and interpretable. However, it was computationally expensive for large high-dimensional datasets due to the Delaunay triangulation computation and the required convex polytope,
it suffered from overfitting and lacked generalization ability. The proposed update in this paper provides a substitute for the convex polytope which reduces the number of points needed to compute the Delaunay triangulation.
Delaunay computation is also less expensive thanks to the use of a support set .
Nevertheless, one of the main limitations of SMNNs is the need for an input triangulable space. Hence, structured data such as images need to be embedded by, for example, applying a dimensionality reduction technique such as UMAP [10] so that the dataset is a point cloud.
Regarding the explainability of the model, it is not a feature-based explainability such as [11, 12] and, hence, it does not provide direct insight on the importance of the features of the input data. However, it provides instance-based explainability [13]. When predicting input data, the vertices of the simplex where it belongs and their contribution to the classification give a similarity measure inferred by the SMNN which is understandable by an
expert.
Table 1: Accuracy score and loss values obtained after training both the SMNN and a feed-forward neural network (FFNN).
The experiments were repeated times and the results provided are the mean of the accuracy values of the repetitions.
The size of the subset considered to compute the Delaunay triangulation also varies in each experiment depending on a parameter .
The FFNN is composed of two hidden layers of size and , respectively, with ReLu activation functions and an output layer with a softmax activation function. The datasets used are synthetic binary-class datasets with features.
SMNN
FFNN
Acc.
Loss
Acc.
Loss
2
1000
3560
0.87
0.64
0.91
0.23
100
1282
0.90
0.51
50
626
0.9
0.42
10
53
0.87
0.33
3
1000
3750
0.76
0.66
0.8
0.61
100
3664
0.76
0.66
50
3252
0.77
0.65
10
413
0.81
0.5
4
50
3728
0.69
0.67
0.72
0.69
10
1410
0.73
0.64
5
316
0.73
0.57
2
26
0.72
0.56
5
50
3743
0.77
0.66
0.8
0.91
10
1699
0.81
0.63
5
323
0.8
0.52
2
17
0.74
0.53
7 Experiments
In this section, we provide experiments that show the performance of SMNNs.
In all the experiments,
we split the given dataset into a training set and a test set composed of 75% and 25% of the dataset, respectively.
The datasets used for experimentation are (1) a two-dimensional binary classification spiral synthetic dataset, and (2) dimension-varying binary classification synthetic datasets composed of different noisy clusters for each class (we refer to [14] for a specific description of how data is generated). All experiments were developed using a 3.70 GHz AMD Ryzen 9 5900X 12-Core Processor.
In the first two experiments,
-representative subsets of the training set are used as the support set for different values of .
The notion of -representative sets was introduced in [15].
Specifically, a support set is -representative of a set if, for any , there exists such that the distance between and is less than .
Let us now describe the methodology of each experiment.
First experiment: we consider a two-dimensional spiral dataset for binary classification
composed of two-dimensional points.
We selected three different values of
obtaining three -representative sets (the support sets) of size 5, 9 and 95, respectively. In Figure 7, the spiral dataset and the three different support sets with the associated Delaunay triangulation are shown.
In this case, we observed that the accuracy of the SMNNs increases with the size of the support set. We can also appreciate that the topology of the dataset is characterized by the support set, which we guess is responsible for the successful classification.
(a)Using a support set of
points, the SMNN reaches test accuracy.
(b)Using a support set of
points, the SMNN reaches test accuracy.
(c)Using a support set of
points, the SMNN reaches test accuracy.
Figure 7: Spiral dataset for binary classification. On each figure, the points in the support set are square-shaped, the points in the test set are triangle-shaped and the points in the training set are circle-shaped.
(a)-Representative set.
(b)Outlier-robust representative landmark set.
(c)Outlier-robust vital landmark set.
Figure 8: Examples of support sets obtained using the different methods proposed in the third experiment.
Second experiment: we consider four synthetic datasets of size
using the adaptation of [16] in
the scikit-learn [17] implementation.
The four datasets generated have, respectively, and features.
Then, the corresponding training set obtained from each dataset
and a fully connected two-hidden-layer feed-forward () neural network (FNNN) with ReLU activation functions
were considered.
To train the SMNN, we use
four different -representative sets of each .
In our experiments, the different values of are calculated as the maximum distance from the origin to the
farthest
point in the dataset plus and divided by a parameter . The different values for considered are .
Using the different values of we then obtain support sets of different sizes . For example, for and , we obtain a support set of size .
The sizes of the support sets generated for and are provided in Table 1.
First, the SMNN was trained using the gradient descent algorithm and the cross-entropy loss function during epochs for and .
Besides, the two-hidden-layer feed-forward neural network FFNN was trained using the Adam training algorithm [18] for .
Both training processes were carried out for epochs.
The accuracy and loss results provided in Table 1 are the mean of repetitions.
We can observe that both the SMNN and the FFNN have similar performance, but the SMNN generally reaches lower loss values. The variance in the results was on the order of to in the case of the SMNN and of to in the case of the FFNN.
Third experiment: we compare the performance of the SMNN depending on the choice of the support set .
To do this, we applied two different methods to choose .
On the one hand, we use the -representative sets previously computed in the first experiment for different values of . On the other hand, we use the two outlier-robust subsampling methods presented in [19] for different values of Topological Density (TD).
In Figure 8, examples of different support sets computed using the three different approaches are shown. Let us remark that the outlier-robust subsampling method can be tuned to keep outliers, which the authors call vital landmark set, or not, obtaining a representative landmark set. The two methods were tested on synthetic datasets composed of points. The SMNN was trained for epochs using the of the dataset and tested in the remaining .
The number of features of each dataset considered, the size of each support set computed, and the mean accuracy and loss results of repetitions when training the SMNN are provided in Table 2.
In Table 3, the time for the computation of the barycentric coordinates is provided. As we can see, the time of execution does not have to be directly related to the size of the support set and it may increase if many of the points to be evaluated are outside the Delaunay triangulation of .
In this experiment, the results suggest that -representative datasets provide better results than the other support sets.
However, thorough theoretical studies should be developed to confirm the last statement. Such a study is outside of the scope of this paper.
Table 2: Accuracy score and loss values obtained after training the SMNN.
The size of the subset considered to compute the Delaunay triangulation varies in each experiment depending on a parameter (for -representative sets) or a parameter TD (for landmark sets).
The datasets used are synthetic binary-class datasets
with features.
Sampling method
Parameter
Acc.
Loss
Acc.
Loss
Acc.
Loss
Representative sets
48
0.94
0.22
109
0.94
0.29
407
0.9
0.53
371
0.94
0.47
730
0.95
0.53
748
0.87
0.6
570
0.93
0.54
747
0.96
0.56
750
0.87
0.6
ORS
Representative landmark sets
TD=0.1
75
0.79
0.49
75
0.9
0.35
75
0.85
0.46
TD=0.4
300
0.85
0.57
300
0.86
0.5
300
0.87
0.62
TD=0.6
450
0.86
0.52
450
0.89
0.52
450
0.86
0.55
TD=0.8
600
0.89
0.56
600
0.92
0.52
600
0.86
0.58
Vital landmark sets
TD=0.1
75
0.93
0.27
75
0.84
0.4
75
0.88
0.37
TD=0.4
300
0.93
0.37
300
0.88
0.46
300
0.9
0.45
TD=0.6
450
0.91
0.44
450
0.96
0.44
450
0.89
0.53
TD=0.8
600
0.93
0.5
600
0.95
0.52
600
0.87
0.57
Table 3: Time in seconds for the computation for the experiments in Table 2. The values are the mean of 5 iterations. Let us remark that higher values can be expected when increasing the number of points outside the Delaunay triangulation of the support set.
Sampling method
Parameter
Representative sets
0.08
0.14
0.54
0.12
0.22
0.79
0.14
0.24
0.79
ORS
Representative landmark sets
TD=0.1
1.04
2.89
21.75
TD=0.4
0.33
0.29
7
TD=0.6
0.12
0.37
3.03
TD=0.8
0.1
0.23
1.29
Vital landmark sets
TD=0.1
0.17
3.22
3.87
TD=0.4
0.15
2.31
1.69
TD=0.6
0.08
0.53
1.17
TD=0.8
0.15
0.27
0.82
8 Conclusions
The balance between efficiency and explainability will be one of the major problems of AI in the next years. Although AI models based on network architectures and backpropagation algorithms are currently among the most successful models, they are far from providing a human-readable explanation of their outputs. On the other hand, simpler models not based on gradient descent methods usually do not provide a comparable level of performance. In this way, a trainable version of SMNNs provides a new step in filling the gap between both approaches.
Simplicial map neural networks provide a combinatorial approach to artificial intelligence. Its simplicial-based definition provides nice properties, such as easy construction and robustness capability against adversarial examples.
In this work, we have extended its definition to provide a trainable version of this architecture. The training process is based on a local search and links this model based on simplices with the most efficient methods in AI. Moreover, we have demonstrated in this paper that such a simplicial-based construction provides a human-understandable explanation of the decision.
The ideas presented in this paper can be extended in many different ways. In future work, we intend to study less data-dependent approaches so that the Delaunay triangulation is needless. Besides, this architecture should be extended to Deep Learning models so that it can be applied to more complex classification problems such as image classification and provides extra explainability to Deep Learning models.
The work was supported in part by the European Union HORIZON-CL4-2021-HUMAN-01-01 under grant agreement 101070028 (REXASI-PRO) and by
TED2021-129438B-I00 / AEI/10.13039/501100011033 / Unión Europea NextGenerationEU/PRTR
References
Graziani et al. [2022]
M. Graziani, L. Dutkiewicz, D. Calvaresi,
A global taxonomy of interpretable ai: unifying the terminology for the technical and social sciences,
Artif Intell Rev (2022). doi:10.1007/s10462-022-10256-8.
Murdoch et al. [2019]
W. J. Murdoch, C. Singh, K. Kumbier, et al,
Definitions, methods, and applications in interpretable machine learning,
PNAS (2019). URL: https://www.pnas.org/doi/full/10.1073/pnas.1900654116.
Paluzo-Hidalgo et al. [2020]
E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo,
Two-hidden-layer feed-forward networks are universal approximators: A constructive approach,
Neural Netw 131 (2020) 29–36. doi:10.1016/j.neunet.2020.07.021.
Paluzo-Hidalgo et al. [2021a]
E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo, J. Heras,
Simplicial-map neural networks robust to adversarial examples,
Mathematics 9 (2021a) 169. doi:10.3390/math9020169.
Paluzo-Hidalgo et al. [2021b]
E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo, J. Heras,
Optimizing the simplicial-map neural network architecture,
J Imaging 7 (2021b) 173. doi:10.3390/jimaging7090173.
Guliyev and Ismailov [2018]
N. J. Guliyev, V. Ismailov,
On the approximation by single hidden layer feedforward neural networks with fixed weights,
Neural Networks 98 (2018) 296–304. doi:10.1016/j.neunet.2017.12.007.
Edelsbrunner and Harer [2010]
H. Edelsbrunner, J. Harer, Computational Topology - an Introduction., Am Math Soc, 2010.
Boissonnat et al. [2018]
J. Boissonnat, F. Chazal, M. Yvinec, Geometric and Topological Inference, Cambridge Texts in Applied Maths, Cambridge Univ Press, 2018. doi:10.1017/9781108297806.
McInnes et al. [2018]
L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for dimension reduction, 2018. URL: http://arxiv.org/abs/1802.03426.
Das and Wiese [2023]
P. P. Das, L. Wiese,
Explainability based on feature importance for better comprehension of machine learning in healthcare,
in: A. Abelló, P. Vassiliadis, O. Romero, R. Wrembel, F. Bugiotti, J. Gamper, G. Vargas Solar, E. Zumpano (Eds.), New Trends in Database and Information Systems, Springer Nature Switzerland, Cham, 2023, pp. 324–335.
Nguyen et al. [2019]
A. Nguyen, J. Yosinski, J. Clune, Understanding Neural Networks via Feature Visualization: A Survey, Springer International Publishing, Cham, 2019, pp. 55–76. doi:10.1007/978-3-030-28954-6_4.
Kong and Chaudhuri [2021]
Z. Kong, K. Chaudhuri,
Understanding instance-based interpretability of variational auto-encoders,
in: A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems, 2021. URL: https://openreview.net/forum?id=a5-37ER8qTI.
Paluzo-Hidalgo et al. [2022]
E. Paluzo-Hidalgo, R. Gonzalez-Diaz, M. A. Gutiérrez-Naranjo,
Topology-based representative datasets to reduce neural network training resources,
Neural Comput & Applic 34 (2022) 14397–14413. doi:10.1007/s00521-022-07252-y.
Subramanian [2003]
I. R. Subramanian,
Design of experiments for the nips 2003 variable selection benchmark,
in: NeurIPS Proceedings, 2003.
Pedregosa et al. [2011]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay,
Scikit-learn: Machine learning in Python,
Journal of Machine Learning Research 12 (2011) 2825–2830.
Kingma and Ba [2015]
D. Kingma, J. Ba,
Adam: A method for stochastic optimization,
in: Y. Bengio, Y. LeCun (Eds.), 3rd Int Conf on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL: http://arxiv.org/abs/1412.6980.
Stolz [2023]
B. J. Stolz,
Outlier-robust subsampling techniques for persistent homology,
Journal of Machine Learning Research 24 (2023) 1–35. URL: http://jmlr.org/papers/v24/21-1526.html.