A Multi-Branched Radial Basis Network Approach to Predicting Complex Chaotic Behaviours

Aarush Sinha
[email protected]
Abstract

In this study, we propose a multi branched network approach to predict the dynamics of a physics attractor characterized by intricate and chaotic behavior. We introduce a unique neural network architecture comprised of Radial Basis Function (RBF) layers combined with an attention mechanism designed to effectively capture nonlinear inter-dependencies inherent in the attractor’s temporal evolution. Our results demonstrate successful prediction of the attractor’s trajectory across 100 predictions made using a real-world dataset of 36,700 time-series observations encompassing approximately 28 minutes of activity. To further illustrate the performance of our proposed technique, we provide comprehensive visualizations depicting the attractor’s original and predicted behaviors alongside quantitative measures comparing observed versus estimated outcomes. Overall, this work showcases the potential of advanced machine learning algorithms in elucidating hidden structures in complex physical systems while offering practical applications in various domains requiring accurate short-term forecasting capabilities.

1 Introduction

In traditional mathematics, a radial basis function is a function that is based on the distance between the input and a specified point, such as the origin or a center point. A radial function is any function that meets this property [1].
A radial function is a function φ:[0,):𝜑0\varphi:[0,\infty)\to\mathbb{R}italic_φ : [ 0 , ∞ ) → blackboard_R. When paired with a metric on a vector space :V[0,)\|\cdot\|:V\to[0,\infty)∥ ⋅ ∥ : italic_V → [ 0 , ∞ ) a function φ𝐜=φ(𝐱𝐜)subscript𝜑𝐜𝜑norm𝐱𝐜\varphi_{\mathbf{c}}=\varphi(\|\mathbf{x}-\mathbf{c}\|)italic_φ start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT = italic_φ ( ∥ bold_x - bold_c ∥ ) is said to be a radial kernel centered at 𝐜𝐜\mathbf{c}bold_c. A Radial function and the associated radial kernels are said to be radial basis functions if, for any set of nodes {𝐱k}k=1nsuperscriptsubscriptsubscript𝐱𝑘𝑘1𝑛\{\mathbf{x}_{k}\}_{k=1}^{n}{ bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT:

  • The kernels φ𝐱1,φ𝐱2,,φ𝐱nsubscript𝜑subscript𝐱1subscript𝜑subscript𝐱2subscript𝜑subscript𝐱𝑛\varphi_{\mathbf{x}_{1}},\varphi_{\mathbf{x}_{2}},\dots,\varphi_{\mathbf{x}_{n}}italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT are linearly independent (for example φ(r)=r2𝜑𝑟superscript𝑟2\varphi(r)=r^{2}italic_φ ( italic_r ) = italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in V=𝑉V=\mathbb{R}italic_V = blackboard_R is not a radial basis function).

  • The kernels φ𝐱1,φ𝐱2,,φ𝐱nsubscript𝜑subscript𝐱1subscript𝜑subscript𝐱2subscript𝜑subscript𝐱𝑛\varphi_{\mathbf{x}_{1}},\varphi_{\mathbf{x}_{2}},\dots,\varphi_{\mathbf{x}_{n}}italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_φ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT form a basis for a Haar Space, meaning that the interpolation matrix

    [φ(𝐱1𝐱1)φ(𝐱2𝐱1)φ(𝐱n𝐱1)φ(𝐱1𝐱2)φ(𝐱2𝐱2)φ(𝐱n𝐱2)φ(𝐱1𝐱n)φ(𝐱2𝐱n)φ(𝐱n𝐱n)]matrix𝜑normsubscript𝐱1subscript𝐱1𝜑normsubscript𝐱2subscript𝐱1𝜑normsubscript𝐱𝑛subscript𝐱1𝜑normsubscript𝐱1subscript𝐱2𝜑normsubscript𝐱2subscript𝐱2𝜑normsubscript𝐱𝑛subscript𝐱2𝜑normsubscript𝐱1subscript𝐱𝑛𝜑normsubscript𝐱2subscript𝐱𝑛𝜑normsubscript𝐱𝑛subscript𝐱𝑛\begin{bmatrix}\varphi(\|\mathbf{x}_{1}-\mathbf{x}_{1}\|)&\varphi(\|\mathbf{x}% _{2}-\mathbf{x}_{1}\|)&\dots&\varphi(\|\mathbf{x}_{n}-\mathbf{x}_{1}\|)\\ \varphi(\|\mathbf{x}_{1}-\mathbf{x}_{2}\|)&\varphi(\|\mathbf{x}_{2}-\mathbf{x}% _{2}\|)&\dots&\varphi(\|\mathbf{x}_{n}-\mathbf{x}_{2}\|)\\ \vdots&\vdots&\ddots&\vdots\\ \varphi(\|\mathbf{x}_{1}-\mathbf{x}_{n}\|)&\varphi(\|\mathbf{x}_{2}-\mathbf{x}% _{n}\|)&\dots&\varphi(\|\mathbf{x}_{n}-\mathbf{x}_{n}\|)\\ \end{bmatrix}[ start_ARG start_ROW start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL … end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) end_CELL end_ROW start_ROW start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL … end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ) end_CELL start_CELL … end_CELL start_CELL italic_φ ( ∥ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ) end_CELL end_ROW end_ARG ] (1)

    is non-singular [2] [3].

Frequently utilized varieties of radial basis functions consist of:

  • Gaussian RBF:

    φ(r)=exp(r22σ2)𝜑𝑟superscript𝑟22superscript𝜎2\varphi(r)=\exp\left(-\frac{r^{2}}{2\sigma^{2}}\right)italic_φ ( italic_r ) = roman_exp ( - divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )

    where r𝑟ritalic_r is the distance between the input point and the center, and σ𝜎\sigmaitalic_σ is a parameter controlling the width of the Gaussian.

  • Multiquadric RBF:

    φ(r)=1+(rσ)2𝜑𝑟1superscript𝑟𝜎2\varphi(r)=\sqrt{1+\left(\frac{r}{\sigma}\right)^{2}}italic_φ ( italic_r ) = square-root start_ARG 1 + ( divide start_ARG italic_r end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

    where r𝑟ritalic_r is the distance between the input point and the center, and σ𝜎\sigmaitalic_σ is a parameter controlling the shape of the function.

  • Inverse Multiquadric RBF:

    φ(r)=11+(rσ)2𝜑𝑟11superscript𝑟𝜎2\varphi(r)=\frac{1}{\sqrt{1+\left(\frac{r}{\sigma}\right)^{2}}}italic_φ ( italic_r ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 + ( divide start_ARG italic_r end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG

    where r𝑟ritalic_r is the distance between the input point and the center, and σ𝜎\sigmaitalic_σ is a parameter controlling the shape of the function.

  • Thin Plate Spline RBF:

    φ(r)=r2log(r)𝜑𝑟superscript𝑟2𝑟\varphi(r)=r^{2}\log(r)italic_φ ( italic_r ) = italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( italic_r )

    where r𝑟ritalic_r is the distance between the input point and the center.

Imagine a ball rolling around a landscape with hills and valleys. An attractor acts like the bottom of a valley. Regardless of where you place the ball on the landscape (starting conditions), if it rolls downhill long enough, it will eventually settle at the valley’s bottom (the attractor). This signifies that the system (the ball) tends towards a specific set of values (the valley’s position) over time. Thus, formally defining an attractor involves identifying a group of numeric values that a system naturally gravitates towards, irrespective of its initial parameters.

Mathematical defintion of an attractor:
Let t𝑡titalic_t represent time and let f(t,)𝑓𝑡f(t,\cdot)italic_f ( italic_t , ⋅ ) be a function specifying the dynamics of the system. If a𝑎aitalic_a is a point in an n𝑛nitalic_n-dimensional phase space, representing the initial state of the system, then f(0,a)=a𝑓0𝑎𝑎f(0,a)=aitalic_f ( 0 , italic_a ) = italic_a, and for a positive value of t𝑡titalic_t, f(t,a)𝑓𝑡𝑎f(t,a)italic_f ( italic_t , italic_a ) is the result of the evolution of this state after t𝑡titalic_t units of time. For example, if the system describes the evolution of a free particle in one dimension, then the phase space is the plane 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with coordinates (x,v)𝑥𝑣(x,v)( italic_x , italic_v ), where x𝑥xitalic_x is the position of the particle, v𝑣vitalic_v is its velocity, a=(x,v)𝑎𝑥𝑣a=(x,v)italic_a = ( italic_x , italic_v ), and the evolution is given by

f(t,(x,v))=(x+tv,v).𝑓𝑡𝑥𝑣𝑥𝑡𝑣𝑣f(t,(x,v))=(x+tv,v).italic_f ( italic_t , ( italic_x , italic_v ) ) = ( italic_x + italic_t italic_v , italic_v ) .

An attractor is a subset A𝐴Aitalic_A of the phase space characterized by the following three conditions:

  1. 1.

    A𝐴Aitalic_A is forward invariant under f𝑓fitalic_f: if a𝑎aitalic_a is an element of A𝐴Aitalic_A, then so is f(t,a)𝑓𝑡𝑎f(t,a)italic_f ( italic_t , italic_a ) for all t>0𝑡0t>0italic_t > 0.

  2. 2.

    There exists a neighborhood of A𝐴Aitalic_A, called the basin of attraction for A𝐴Aitalic_A and denoted B(A)𝐵𝐴B(A)italic_B ( italic_A ), which consists of all points b𝑏bitalic_b that "enter" A𝐴Aitalic_A in the limit t𝑡t\to\inftyitalic_t → ∞. More formally, B(A)𝐵𝐴B(A)italic_B ( italic_A ) is the set of all points b𝑏bitalic_b in the phase space with the following property: For any open neighborhood N𝑁Nitalic_N of A𝐴Aitalic_A, there is a positive constant T𝑇Titalic_T such that f(t,b)N𝑓𝑡𝑏𝑁f(t,b)\in Nitalic_f ( italic_t , italic_b ) ∈ italic_N for all real t>T𝑡𝑇t>Titalic_t > italic_T.

  3. 3.

    There is no proper (non-empty) subset of A𝐴Aitalic_A having the first two properties.

Since the basin of attraction contains an open set containing A𝐴Aitalic_A, every point that is sufficiently close to A𝐴Aitalic_A is attracted to A𝐴Aitalic_A. The definition of an attractor uses a metric on the phase space, but the resulting notion usually depends only on the topology of the phase space [4]. In the case of nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the Euclidean norm [5] is typically used, which is defined as

x=x12+x22++xn2normxsuperscriptsubscript𝑥12superscriptsubscript𝑥22superscriptsubscript𝑥𝑛2\|\textbf{x}\|=\sqrt{x_{1}^{2}+x_{2}^{2}+\dots+x_{n}^{2}}∥ x ∥ = square-root start_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ + italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

. Using these concepts we propose implementing a multi-branched radial basis neural network to help predict the chaotic and random behaviours of an attractor.

2 Related Work

Radial Basis networks have been extensively studied and proven effective in various classification tasks [6][7]. They offer a versatile framework for pattern recognition and data analysis, leveraging the flexibility of radial basis functions to model complex relationships within datasets. By capturing the intricate dynamics and nonlinear interactions inherent in real-world phenomena, Radial Basis networks contribute to advancing our understanding of complex systems and facilitating informed decision-making in fields ranging from communication systems [8][9] to computational biology [10][11].
While RBF layers offer valuable capabilities in certain modeling tasks, they alone may not be sufficient for capturing the rich dynamics and predicting chaotic and random behaviors in attractors. To address the complexities inherent in chaotic systems, more sophisticated and adaptable modeling approaches are required, which may involve combining RBF layers with other architectural components and techniques tailored to the specific characteristics of chaotic dynamics.
Attention mechanisms [12] have emerged as powerful tools in the realm of neural networks, offering sophisticated mechanisms for selectively focusing on relevant parts of input data while suppressing irrelevant information. Originally inspired by human cognitive processes, attention mechanisms have found widespread applications in various domains, including natural language processing, computer vision, and sequential data modeling.

3 Dataset

We use a pre existing kaggle dataset [13]. This dataset comprises time series data originating from an unidentified physics attractor, synthesized through undisclosed governing rules. Manifesting intricate and chaotic dynamics, the attractor presents a challenge for analysis.
The dataset encompasses 36,700 data points, each delineating the positions of two points in a two-dimensional space at distinct time intervals. Collected over approximately 28 minutes, the dataset offers insights into the attractor’s behavior over time. Notably, the system undergoes periodic resets, typically occurring upon reentry into a recurring loop. Table 1 shows the different variables in the dataset.

Variable Type Definition
time Float The time in seconds since the start of the simulation
distance Float Distance between both objects
angle1 Angle Angle of the first object
pos1x Float X position of the first object
pos1y Float Y position of the first object
angle2 Angle Angle of the second object
pos2x Float X position of the second object
pos2y Float Y position of the second object
Table 1: Description of variables present in the dataset.

4 Methodology

It defines the network architecture consisting of several components:
Branches: Three separate branches are utilized, each focusing on learning the relationship between a specific pair of input columns.

  • An RBFLayer: Performs the Radial Basis Function transformation on the input data; customizable parameters include the number of kernels (K)𝐾(K)( italic_K ), output features (Fo)subscript𝐹o(F_{\text{o}})( italic_F start_POSTSUBSCRIPT o end_POSTSUBSCRIPT ), radial function (φ)𝜑(\varphi)( italic_φ ), norm function \|\cdot\|∥ ⋅ ∥, and normalization option (normalize). We use the inverse_multiquadric function 1 and Euclidean norm 1.

  • Dropout layer: Introduced with a probability of 0.30.30.30.3 to mitigate overfitting.

  • AttentionLayer: Focuses on significant portions of the transformed data within the branch.

  • Linear layers with ReLU(x)=max{0,x}ReLU𝑥0𝑥\operatorname{ReLU}(x)=\max\{0,x\}roman_ReLU ( italic_x ) = roman_max { 0 , italic_x } and tanh(x)=exexex+extanh𝑥superscript𝑒𝑥superscript𝑒𝑥superscript𝑒𝑥superscript𝑒𝑥\operatorname{tanh}(x)=\displaystyle\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}roman_tanh ( italic_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT end_ARG activation functions for additional feature extraction and transformation.

Merging Layer: Following the processing of each pair of columns within their respective branches, the outputs are concatenated. A linear layer with a ReLU activation function integrates the combined information.
Output Layer: A final linear layer with an output size of 3333 projects the merged features onto the desired three-dimensional prediction.

Denote x3𝑥superscript3x\in\mathbb{R}^{3}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT as the input vector having three features. Let y^3^𝑦superscript3{\hat{y}\in\mathbb{R}^{3}}over^ start_ARG italic_y end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT denote the output of the model. Each branch accepts a pair of input features represented as xi,xj{1,2,3}subscript𝑥𝑖subscript𝑥𝑗123x_{i},x_{j}\in\{1,2,3\}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { 1 , 2 , 3 }, satisfying ij𝑖𝑗{i\neq j}italic_i ≠ italic_j.

The forward function governs the data flow through the network:

  • Input Splitting: Separation of the input data x𝑥xitalic_x into three distinct columns, representing the features: x=(x1,x2,x3)𝑥subscript𝑥1subscript𝑥2subscript𝑥3x=(x_{1},x_{2},x_{3})italic_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ).

  • Branch Processing: Feeding each pair of columns into the assigned branch (branch1, branch2, or branch3); subsequently processed through their constituent layers, yielding an output per pair.

  • Output Concatenation: The individual branch outputs, namely (out1,out2,out3)𝑜𝑢subscript𝑡1𝑜𝑢subscript𝑡2𝑜𝑢subscript𝑡3(out_{1},out_{2},out_{3})( italic_o italic_u italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_o italic_u italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_o italic_u italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), undergo concatenation along the feature dimension.

  • Merging: Transmission of the concatenated outputs through the merging layer produces a unified representation.

  • Prediction: Applying the merged features to the ultimate output layer yields the three-dimensional prediction (y^1,y^2,y^3)subscript^𝑦1subscript^𝑦2subscript^𝑦3(\hat{y}_{1},\hat{y}_{2},\hat{y}_{3})( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ).

Refer to caption
Figure 1: Proposed multi layer architecture
Refer to caption
Figure 2: Single Sequential Proposed Layer

This design enables the model to discern specific relationships amongst diverse input feature pairs while combining the learned features through the attention mechanism and merging stages for delivering the final prediction.

5 Training

We train the model on a singular NVIDIA A30 GPU. It takes 2 hours to train the model for 2000 epochs with a batch_size=512. We use the Mean Squared Error (MSE) loss function as our criterion:

MSE(𝒚^,𝒚)=1Ni=1N(y^iyi)2MSE^𝒚𝒚1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript^𝑦𝑖subscript𝑦𝑖2\mathrm{MSE}(\hat{\boldsymbol{y}},\boldsymbol{y})=\frac{1}{N}\sum_{i=1}^{N}% \left(\hat{y}_{i}-y_{i}\right)^{2}roman_MSE ( over^ start_ARG bold_italic_y end_ARG , bold_italic_y ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Where 𝒚^^𝒚\hat{\boldsymbol{y}}over^ start_ARG bold_italic_y end_ARG represents the predicted values, 𝒚𝒚\boldsymbol{y}bold_italic_y represents the actual target values, and N𝑁Nitalic_N is the total number of samples. The MSE computes the average of the squared differences between predicted and actual values, providing a measure of the model’s performance in minimizing prediction errors. We utilize Adam [14] optimizer for our model:

mtsubscript𝑚𝑡\displaystyle m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =β1mt1+(1β1)gt,absentsubscript𝛽1subscript𝑚𝑡11subscript𝛽1subscript𝑔𝑡\displaystyle=\beta_{1}\cdot m_{t-1}+(1-\beta_{1})\cdot g_{t},= italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_m start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⋅ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,
vtsubscript𝑣𝑡\displaystyle v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =β2vt1+(1β2)gt2,absentsubscript𝛽2subscript𝑣𝑡11subscript𝛽2superscriptsubscript𝑔𝑡2\displaystyle=\beta_{2}\cdot v_{t-1}+(1-\beta_{2})\cdot g_{t}^{2},= italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⋅ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
m^tsubscript^𝑚𝑡\displaystyle\hat{m}_{t}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =mt1β1t,absentsubscript𝑚𝑡1superscriptsubscript𝛽1𝑡\displaystyle=\frac{m_{t}}{1-\beta_{1}^{t}},= divide start_ARG italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ,
v^tsubscript^𝑣𝑡\displaystyle\hat{v}_{t}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =vt1β2t,absentsubscript𝑣𝑡1superscriptsubscript𝛽2𝑡\displaystyle=\frac{v_{t}}{1-\beta_{2}^{t}},= divide start_ARG italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ,
θt+1subscript𝜃𝑡1\displaystyle\theta_{t+1}italic_θ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =θtηv^t+ϵm^t,absentsubscript𝜃𝑡𝜂subscript^𝑣𝑡italic-ϵsubscript^𝑚𝑡\displaystyle=\theta_{t}-\frac{\eta}{\sqrt{\hat{v}_{t}}+\epsilon}\cdot\hat{m}_% {t},= italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_η end_ARG start_ARG square-root start_ARG over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + italic_ϵ end_ARG ⋅ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

Where mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are the first and second moment estimates, gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the gradient, β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the exponential decay rates for the moment estimates, m^tsubscript^𝑚𝑡\hat{m}_{t}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and v^tsubscript^𝑣𝑡\hat{v}_{t}over^ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are bias-corrected estimates, θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the parameter at iteration t𝑡titalic_t, η𝜂\etaitalic_η is the learning rate, and ϵitalic-ϵ\epsilonitalic_ϵ is a small constant to prevent division by zero.

Finally for comparison purposes we train with the same hyper parameters two models a singular branch 2 and the proposed model 1. All implementations were done in PyTorch[15].

6 Results

  • Loss over iterations of the Single Sequential Network (Figure 3) The training loss for Object 1 (blue) starts high, sharply decreases, and then fluctuates around a lower level with some spikes. The training loss for Object 2 (orange) follows a similar pattern but maintains a higher overall loss throughout the training process. There are large spikes in the loss for both objects early in training, indicating potential instability or difficulty in the initial learning phase. The loss seems to stabilize and flatten out more towards the end of the training iterations shown.

  • Loss over iterations of the Multi-Branched Network (Figure 4) has a overall pattern is similar to Figure 3, with Object 2’s loss (orange) being consistently higher than Object 1’s loss (blue).However, the initial large spikes in loss are more prominent and last longer compared to Image 1.The loss curves appear to flatten out and stabilize at a later point in the training process compared to Figure 3.There are fewer small fluctuations and spikes in the loss curves once they stabilize, suggesting potentially smoother convergence.

In summary, while the overall trend of Object 2 having higher training loss is consistent across both images, the single sequential network exhibits more pronounced initial instability and takes longer to stabilize compared to the multi-branched architecture.

We next compare the outputs of the single sequential layer and the multi layered architecture. Figure 5 shows the object movement for the single sequential layer and Figure 6 shows the object movement for the multi layered architecture.

The predicted paths (black lines) for the single sequential layer 5 are relatively centralized and seem to capture some linear segments of the trajectories. The overall pattern shows dense and tangled paths, which is typical of chaotic systems. The black lines appear to follow the chaotic nature to some extent but might be too centralized and not dispersed enough to fully capture the randomness. The predicted paths (black lines) for the multi layer architecture layer 6 are also centralized but show slight shifts compared to the output of the single sequential layer. This output also has dense and tangled paths, consistent with chaotic behavior. The black lines appear to capture more variability and slight shifts, which might better reflect the unpredictability of chaotic systems.

Refer to caption
Figure 3: Loss over iterations of the Single Sequential Network
Refer to caption
Figure 4: Loss over iterations of the Multi-Branched Network
Refer to caption
Figure 5: Output of the single sequential layer
Refer to caption
Figure 6: Output of the multi layer

7 Conclusion

In conclusion, this paper has explored the application of Radial Basis Function Neural Networks (RBFNNs) in predicting chaotic and random behaviors. Through a comprehensive review of related work, we have highlighted the strengths and limitations of RBFNNs in capturing the complex dynamics of chaotic systems. Leveraging insights from chaos theory and neural network architecture, we have proposed novel approaches for enhancing the predictive capabilities of RBFNNs with attention mechanisms.
Our results demonstrate the effectiveness of our proposed methods in predicting chaotic and random behaviors. A comparison of object movement predictions illustrated in our visual results indicates that our enhanced RBFNN model effectively captures the inherent variability and unpredictability of chaotic systems. Specifically, in Figure 6 prediction paths exhibited greater variability and subtle shifts, closely aligning with the expected characteristics of chaotic behavior. This confirms that our model can realistically reflect the randomness and sensitivity to initial conditions typical of chaotic systems.
Overall, this paper contributes to advancing our understanding of chaotic systems and lays the groundwork for future research in utilizing RBFNNs for predictive modeling in complex dynamical systems.

8 Limitations

Chaotic systems often require ongoing monitoring and adjustments to plans. Since small changes can have significant impacts, staying updated on the current state of the system is crucial. We understand that chaos can not be truly predicted and understood.

9 Reproducibility

Results can be reproduced from the code present in my GitHub repository.

10 Acknowledgement

We acknowledge the work of Alessio Russo who originally implemented RBFNNs in PyTorch. His work is available on his GitHub[16].

References

  • [1] Contributors to Wikimedia projects. Radial basis function - Wikipedia, 2024.
  • [2] Gregory E. Fasshauer. Meshfree Approximation Methods with MATLAB. World Scientific Publishing Co. Pte. Ltd., Singapore, 2007.
  • [3] Holger Wendland. Scattered Data Approximation. Cambridge University Press, Cambridge, 2005.
  • [4] John Milnor. On the concept of attractor. Communications in Mathematical Physics, 99(2):177–195, Jun 1985.
  • [5] M. Emre Celebi, Fatih Celiker, and Hassan A. Kingravi. On euclidean norm approximations, 2010.
  • [6] Yue Wu, Hui Wang, Biaobiao Zhang, and K.-L. Du. Using Radial Basis Function Networks for Function Approximation and Classification. International Scholarly Research Notices, 2012, March 2012.
  • [7] James A Leonard and Mark A Kramer. Radial basis function networks for classifying process faults. IEEE Control Systems Magazine, 11(3):31–38, 1991.
  • [8] Deng Jianping, Narasimhan Sundararajan, and P Saratchandran. Communication channel equalization using complex-valued minimal radial basis function neural networks. IEEE Transactions on neural networks, 13(3):687–696, 2002.
  • [9] Hao Yu, Tiantian Xie, Stanisław Paszczynski, and Bogdan M Wilamowski. Advantages of radial basis function networks for dynamic system design. IEEE Transactions on Industrial Electronics, 58(12):5438–5450, 2011.
  • [10] A Vande Wouwer, Christine Renotte, and Ph Bogaerts. Biological reaction modeling using radial basis function networks. Computers & chemical engineering, 28(11):2157–2164, 2004.
  • [11] Yu-Yen Ou et al. Identifying the molecular functions of electron transport proteins using radial basis function networks and biochemical properties. Journal of Molecular Graphics and Modelling, 73:166–178, 2017.
  • [12] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023.
  • [13] NIKITRICKY. Physics attractor time series dataset, 2023.
  • [14] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017.
  • [15] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library, 2019.
  • [16] Alessio Russo. Pytorch rbf layer, 2021.