A Physics-Informed Meta-Learning Framework
for the Continuous Solution of Parametric PDEs
on Arbitrary Geometries

[Uncaptioned image] Reza Najian Asl
[email protected]
&[Uncaptioned image] Yusuke Yamazaki
Graduate School of Science and Technology,
Keio University, Hiyoshi3-14-1,
Kohoku-ku, Yokohama 223-8522, Japan
[email protected]
&[Uncaptioned image] Kianoosh Taghikhani
Access e.V.
Intzestr. 5, D-52072, Aachen, Germany
[email protected]
&[Uncaptioned image] Mayu Muramatsu
Department of Mechanical Engineering,
Keio University, Hiyoshi3-14-1,
Kohoku-ku, Yokohama 223-8522, Japan
[email protected]
&[Uncaptioned image] Markus Apel
Access e.V.
Intzestr. 5, D-52072, Aachen, Germany
[email protected]
&[Uncaptioned image] Shahed Rezaei
Access e.V.
Intzestr. 5, D-52072, Aachen, Germany
[email protected]
Abstract
{adjustwidth}

-1cm-1cm In this work, we introduce implicit Finite Operator Learning (iFOL) for the continuous and parametric solution of partial differential equations (PDEs) on arbitrary geometries. We propose a physics-informed encoder-decoder network to establish the mapping between continuous parameter and solution spaces. The decoder constructs the parametric solution field by leveraging an implicit neural field network conditioned on a latent or feature code. Instance-specific codes are derived through a PDE encoding process based on the second-order meta-learning technique. In training and inference, a physics-informed loss function is minimized during the PDE encoding and decoding. iFOL expresses the loss function in an energy or weighted residual form and evaluates it using discrete residuals derived from standard numerical PDE methods. This approach results in the backpropagation of discrete residuals during both training and inference.

iFOL features several key properties: (1) its unique loss formulation eliminates the need for the conventional encode-process-decode pipeline previously used in operator learning with conditional neural fields for PDEs; (2) it not only provides accurate parametric and continuous fields but also delivers solution-to-parameter gradients without requiring additional loss terms or sensitivity analysis; (3) it can effectively capture sharp discontinuities in the solution; and (4) it removes constraints on the geometry and mesh, making it applicable to arbitrary geometries and spatial sampling (zero-shot super-resolution capability). We critically assess these features and analyze the network’s ability to generalize to unseen samples across both stationary and transient PDEs. The overall performance of the proposed method is promising, demonstrating its applicability to a range of challenging problems in computational mechanics.

Keywords Parameterized PDEs  \cdot Physics-informed operator learning  \cdot Conditional neural field  \cdot PDE encoding  \cdot Second-order meta-learning

1 Introduction

Numerical Solvers. Numerical methods for solving nonlinear PDEs have been central to scientific computing, enabling the simulation of complex physical phenomena. Classical approaches such as the Finite Element Method (FEM), Finite Difference Method (FDM), and Spectral Methods are widely used due to their robustness and versatility. FEM excels in handling complex geometries and boundary conditions, while FDM is known for its simplicity and efficiency on structured grids. Spectral methods, on the other hand, offer high accuracy for problems with smooth solutions by leveraging global basis functions. Despite their success, these methods often require significant computational resources, especially for high-dimensional problems, as they need to be repeated for any new set of input parameters.

Physics-Informed Neural Networks. Physics-Informed Neural Networks (PINN) have emerged as a promising alternative, leveraging neural networks to approximate solutions to PDEs by embedding the governing equations directly into the loss function Raissi et al. (2019). PINNs are particularly attractive for solving problems with sparse or incomplete data. In specific cases where all governing equations and boundary conditions are known, PINNs can be employed as forward solvers to compute the solutions. Here, the term solver refers to finding solutions that satisfy the governing equations, translating the problem into a constrained optimization task. However, this approach faces two significant challenges: (1) the training time for PINNs is still not competitive with traditional numerical solvers, except in very high-dimensional settings where classical methods struggle Rezaei et al. (2022); Grossmann et al. (2024), and (2) even at their best (considering recent advances), standard PINNs perform comparably to methods like FEM, and the solutions lack generalizability to other parametric inputs. Consequently, the core limitations of traditional numerical methods, such as computational inefficiency and lack of flexibility for parametric variations, persist when using PINNs solely for forward problems. The performance of PINNs in inverse problems, model discovery, and calibration differs from their application in forward problems, with several reported advantages in these contexts Faroughi et al. (2024).

It’s worth mentioning that, as a current trend, researchers are also combining ideas from FEM and neural networks to develop new deep learning-based solvers. See, for example, the following studies:Mitusch et al. (2021); Škardová et al. (2024); Xiong et al. (2025); Li et al. (2024). Although these approaches offer appealing features, particularly in terms of implementation simplicity, their computational time remains comparable to the FEM, and their applicability to highly nonlinear multiphysics problems has yet to be demonstrated.

Neural Operators. Neural Operators (NOs) extend the capabilities of traditional machine learning by learning mappings between infinite dimensional function spaces, making them powerful tools for solving parametric PDEs and modeling complex physical systems. NOs ideally should generalize across entire families of functions, enabling them to predict solutions for varying input conditions, such as material properties or boundary conditions. Rather well-known approaches for operator learning so far are Deep Neural Operator (DeepOnet) and its extensions Lu et al. (2021); Abueidda et al. (2025); He et al. (2024); Kumar et al. (2025); Yu et al. (2024), Physics-informed DeepOnet Wang et al. (2021); Mandl et al. (2025); Li et al. (2025), Fourier Neural Operator (FNO) and its extensions Li et al. (2021); Azizzadenesheli et al. (2024); Li et al. (2023a, b), Unet Ronneberger et al. (2015); Chen et al. (2019); Mendizabal et al. (2020); Mianroodi et al. (2022); Gupta et al. (2023); Najafi Koopas et al. (2025) and others.

The mentioned methods are also continuously improving their generalizability to complex geometries and their ability to handle complex solution fields. For example, Yin et al. (2024); Li et al. (2023a); Xiao et al. (2024); Zeng et al. (2025); Chen et al. (2024a) introduce various frameworks designed to learn geometry-dependent solution operators for different PDEs efficiently. Similar to the previous section, the combination of physics-informed NO and classical numerical solvers such as FEM, FDM, and FFT are also getting more attention, see for example Yamazaki et al. (2025a),Rezaei et al. (2025a), Xu et al. (2024), Eshaghi et al. (2025), Lee et al. (2025),Kaewnuratchadasorn et al. (2024), Franco et al. (2023) and Harandi et al. (2025). The latter helps avoid time-consuming data generation or reduces the required data, and delivers higher accuracy for unseen predictions.

Operator learning faces several key challenges that hinder its broader adoption in scientific computing.

  • A major issue is generalization, as models often struggle to extrapolate beyond training data, especially in highly nonlinear systems with complex geometries.

  • Capturing high-frequency features (discontinuous solution fields) remains challenging. In many industrial applications, one expects discontinuities in the solution fields which produce high gradients and therefore very relevant for design purposes (e.g. consider stress or flux values coming from derivation of the primary field such as displacement, or temperature).

  • Data efficiency is a concern, as training requires large datasets from expensive simulations or experimental measurements. On that note, many operator-learning frameworks also lack physical constraints, leading to solutions that may violate fundamental laws.

  • Scalability to large 3D problems is computationally expensive, and meta-learning techniques for adaptability are still underexplored. This also includes extending their applicability to irregular geometries or domains with complex topologies.

Implicit Neural Representations Implicit Neural Representations (INRs), also known as neural fields, represent a novel approach for encoding data as continuous functions parameterized by neural networks. Unlike traditional data storage methods, such as grids, meshes, or point clouds, INRs implicitly map input coordinates (e.g., spatial or temporal points) to corresponding output values (e.g., deformation, temperature, or color). This method enables efficient and flexible modeling of high-dimensional and complex data. Consequently, the application of INRs in scientific computing has been receiving increasing attention.

The literature and research on neural fields for scientific machine learning, particularly in computational mechanics, have gained momentum in recent years. Serrano et al. (2023) introduced a novel method that uses coordinate-based networks to solve PDEs on general geometries, removing input mesh constraints and excelling in diverse problem domains such as spatio-temporal forecasting and geometric design. Naour et al. (2024) proposed a continuous-time modeling approach for time series forecasting, leveraging implicit neural representations and meta-learning. Boudec et al. (2024)introduced a physics-informed iterative solver that learns to condition gradient descent for solving parametric PDEs, and improving optimization stability, accelerating convergence. See also contributions by Yeom et al. (2025) for achieving speed-up in training neural fields via weight scaling of sinusoidal neural fields. Hagnberger et al. (2024) proposed vectorized conditional neural fields to solve time-dependent PDEs, that have efficient inference, zero-shot super-resolution, and generalization to unseen PDE parameters through attention-based neural fields. Du et al. (2024a) discussed the conditional neural field latent diffusion model which is a generative deep learning framework that enables efficient and probabilistic simulation of complex spatiotemporal turbulence in 3D irregular domains, leveraging conditional neural fields and latent diffusion. Dupont et al. (2022a) proposed a neural compression framework that handles diverse data modalities by converting data into implicit neural representations. Their approach reduces encoding time by two orders of magnitude. Catalani et al. (2024) developed a methodology using INRs to learn surrogate models for steady-state fluid dynamics on unstructured domains, handling 3D geometric variations and generalizing to unseen shapes.

Despite these strengths, INRs face several challenges. Training INRs can be computationally expensive, requiring large amounts of data and careful regularization to prevent overfitting. Generalization remains a significant challenge, although meta-learning and transfer-learning approaches are emerging to address this limitation. Additionally, while INRs excel in representing continuous fields, their scalability to large-scale or high-dimensional data remains constrained by the capacity of the neural network and the computational cost of training. Representing disconnected or complex topologies without artifacts can also be challenging. Addressing these challenges will be critical for further advancing the utility and adoption of INRs in real-world applications.

Summary, open questions and our contributions. Traditional numerical solvers are highly optimized for solving challenging PDE problems. However, they are typically designed for one-time use, requiring costly recomputation whenever input parameters change. Deep learning methods, particularly neural operators, offer a promising alternative but are often highly data-demanding. Integrating physical constraints into neural operators has shown potential in mitigating these challenges, yet training such complex networks remains difficult, especially across a wide range of input parameters. Furthermore, the application of neural operators to real-world problems is still underdeveloped, with key challenges including scalability to complex geometries, accuracy in capturing sharp gradients, and efficient training, all of which remain active areas of research.

Conditional neural fields represent a promising yet straightforward network architecture that compactly and continuously maps function spaces across arbitrary domains. They offer significant flexibility for operator learning in parametrized PDEs, as their outputs can be sampled at arbitrary resolutions and modified by adjusting a set of latent variables. In contrast to discrete networks, where memory usage increases inefficiently with spatio-temporal resolution, neural fields require memory that scales primarily with the number of parameters in the network. Additionally, specialized architectures, such as sinusoidal activations or Fourier features, enhance their ability to capture high-frequency details, while integration with numerical methods extends their applicability to complex geometries and topologies. This work aims to address some of the above challenges by integrating concepts from the standard finite element method and conditional neural fields into operator learning. As a result, we introduce a physics-informed encoder-decoder network to establish the mapping between continuous parameter and solution spaces for parametrized PDEs. To the best of our knowledge, this is the first application of a single-network conditional neural field for operator learning, owing this capability to its unique loss formulation, which eliminates the need for the complex encode-process-decode pipeline (Serrano et al., 2023; Catalani et al., 2024). This approach is well-suited for complex geometries, offering data efficiency, requiring no labeled data, and demonstrating scalability to real-world industry applications, especially in the fields of materials engineering and science.

In Section 2, we summarize the main problem formulation and the types of PDEs and geometries chosen for this study. Next, in Section 3, we introduce implicit Finite Operator Learning as a promising approach for near-real-world problems. In Section 4, we present the study results, followed by conclusions and future directions in Section 5.

2 Parameterized PDEs

Let u:𝒯×Ω×𝒞d:𝑢𝒯Ω𝒞superscript𝑑u:\mathcal{T}\times\Omega\times\mathcal{C}\to\mathbb{R}^{d}italic_u : caligraphic_T × roman_Ω × caligraphic_C → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be the solution field representing physical quantities such as temperature, displacement, velocity, or pressure. The solution field u𝑢uitalic_u depends on the time t𝒯𝑡𝒯t\in\mathcal{T}\subset\mathbb{R}italic_t ∈ caligraphic_T ⊂ blackboard_R, the spatial coordinates xΩd𝑥Ωsuperscript𝑑{x}\in\Omega\subset\mathbb{R}^{d}italic_x ∈ roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and the control parameter c𝒞d𝑐𝒞superscriptsuperscript𝑑c\in\mathcal{C}\subset\mathbb{R}^{d^{\prime}}italic_c ∈ caligraphic_C ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The solution field u𝑢uitalic_u is parameterized by c𝑐citalic_c while satisfying the following PDE:

=tu(t,x;c)f(x,u;c)=0,subscript𝑡𝑢𝑡𝑥𝑐𝑓𝑥𝑢𝑐0\displaystyle\mathcal{R}=\partial_{t}u(t,x;c)-f(x,u;c)=0,caligraphic_R = ∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_u ( italic_t , italic_x ; italic_c ) - italic_f ( italic_x , italic_u ; italic_c ) = 0 , (t,x)𝑡𝑥\displaystyle(t,x)( italic_t , italic_x ) 𝒯×Ω,absent𝒯Ω\displaystyle\in\mathcal{T}\times\Omega,∈ caligraphic_T × roman_Ω , (1)
u(0,x;c)u0(x;c)=0,𝑢0𝑥𝑐subscript𝑢0𝑥𝑐0\displaystyle u(0,x;c)-u_{0}(x;c)=0,italic_u ( 0 , italic_x ; italic_c ) - italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ; italic_c ) = 0 , (t,x)𝑡𝑥\displaystyle(t,x)( italic_t , italic_x ) {(0,x)xΩ}.absentconditional-set0𝑥𝑥Ω\displaystyle\in\{(0,x)\mid x\in\Omega\}.∈ { ( 0 , italic_x ) ∣ italic_x ∈ roman_Ω } .

Here, u0subscript𝑢0u_{0}italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denotes the initial condition, and f𝑓fitalic_f can be either linear or nonlinear with respect to both the solution and the control parameter, and it typically involves partial derivatives of u𝑢uitalic_u with respect to x𝑥xitalic_x.

The control parameter c𝑐citalic_c can influence the solution field in various ways, such as defining the initial and boundary conditions, material properties, or other system characteristics. In more complex scenarios, c𝑐citalic_c may represent a spatially distributed field, accounting for variations in geometry or domain (e.g. material) heterogeneity.

Standard numerical methods, such as the finite element and finite volume methods, are widely used for solving Eq. 1. In practical applications and design processes, they are repeatedly executed for each variation in the control parameter c𝑐citalic_c. These methods yield a numerical approximation u(,;c):𝒯×Ω:𝑢𝑐𝒯Ωu(\cdot,\cdot;c):\mathcal{T}\times\Omega\to\mathbb{R}italic_u ( ⋅ , ⋅ ; italic_c ) : caligraphic_T × roman_Ω → blackboard_R for varying c𝒞𝑐𝒞c\in\mathcal{C}italic_c ∈ caligraphic_C.

After applying a spatial discretization, such as the finite element method, we approximate the solution field as:

u(t,x;c)uh(t,x;c)=i=1Mui(t,c)ψi(x),𝑢𝑡𝑥𝑐superscript𝑢𝑡𝑥𝑐superscriptsubscript𝑖1𝑀subscript𝑢𝑖𝑡𝑐subscript𝜓𝑖𝑥u(t,x;c)\approx u^{h}(t,x;c)=\sum_{i=1}^{M}u_{i}(t,c)\psi_{i}(x),italic_u ( italic_t , italic_x ; italic_c ) ≈ italic_u start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT ( italic_t , italic_x ; italic_c ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t , italic_c ) italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) , (2)

where {ψi(x)}i=1Msuperscriptsubscriptsubscript𝜓𝑖𝑥𝑖1𝑀\{\psi_{i}(x)\}_{i=1}^{M}{ italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT are spatial basis functions, ui(t,c)subscript𝑢𝑖𝑡𝑐u_{i}(t,c)italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t , italic_c ) are the time-dependent, parameterized coefficients representing the discrete solution at spatial degrees of freedom, and M𝑀Mitalic_M is the number of grid points. Substituting this approximation into the governing PDE and applying the Galerkin projection, we obtain the semi-discrete system:

𝐫(𝐮,𝐜)=𝐌d𝐮dt+𝐊(𝐮,𝐜)𝐮𝐠(𝐮,𝐜)=𝟎,𝐫𝐮𝐜𝐌𝑑𝐮𝑑𝑡𝐊𝐮𝐜𝐮𝐠𝐮𝐜0\mathbf{r}(\mathbf{u},\mathbf{c})=\mathbf{M}\frac{d\mathbf{u}}{dt}+\mathbf{K}(% \mathbf{u},\mathbf{c})\mathbf{u}-\mathbf{g}(\mathbf{u},\mathbf{c})=\mathbf{0},bold_r ( bold_u , bold_c ) = bold_M divide start_ARG italic_d bold_u end_ARG start_ARG italic_d italic_t end_ARG + bold_K ( bold_u , bold_c ) bold_u - bold_g ( bold_u , bold_c ) = bold_0 , (3)

where:

  • 𝐮,𝐫M𝐮𝐫superscript𝑀\mathbf{u},\mathbf{r}\in\mathbb{R}^{M}bold_u , bold_r ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT are the vectors of unknowns (discrete solutions at spatial nodes) and nodal residuals, respectively,

  • 𝐌M×M𝐌superscript𝑀𝑀\mathbf{M}\in\mathbb{R}^{M\times M}bold_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_M end_POSTSUPERSCRIPT is the mass matrix, given by

    Mij=Ωψi(x)ψj(x)𝑑Ω,subscript𝑀𝑖𝑗subscriptΩsubscript𝜓𝑖𝑥subscript𝜓𝑗𝑥differential-dΩM_{ij}=\int_{\Omega}\psi_{i}(x)\psi_{j}(x)\,d\Omega,italic_M start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) italic_d roman_Ω , (4)
  • 𝐊(𝐮,𝐜)M×M𝐊𝐮𝐜superscript𝑀𝑀\mathbf{K}(\mathbf{u},\mathbf{c})\in\mathbb{R}^{M\times M}bold_K ( bold_u , bold_c ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_M end_POSTSUPERSCRIPT represents the nonlinear stiffness, often resulting from the spatial derivatives of u𝑢uitalic_u. It may be dependent on both the solution as well as the problem parameters.

  • 𝐠(𝐮,𝐜)𝐠𝐮𝐜\mathbf{g}(\mathbf{u},\mathbf{c})bold_g ( bold_u , bold_c ) is the discrete source term,

  • 𝐜𝐜\mathbf{c}bold_c is the control vector that parametrizes the discrete system by governing any or a combination of initial and boundary conditions, material properties, domain geometry, and spatial heterogeneity.

This semi-discrete system is an ordinary differential equation (ODE) system in time, which must be further discretized using a time-stepping method. In this work, the implicit Euler method is used to approximate the time-dependent term in Eq. 3 with a chosen time step size ΔtΔ𝑡\Delta troman_Δ italic_t. Table 1 outlines the parametrized boundary value problems explored in this research for operator learning. These problems are selected from a wide range of applications, particularly in computational mechanics. The upper part of the table focuses on stationary problems, where the given PDE has no transient term or time evolution, while the lower part highlights two well-known nonlinear transient equations. For clarity, the FEM-based residual vector for each problem is specified in Table 3.

Table 1: Summary of the parametrized PDEs and their setup for operator learning.
2D,3D Steady-state problems Mechanical problem hyper/linear elasticity Nonlinear diffusion
PDE 𝑷+𝒇=0+BCs,𝑷=W𝑭formulae-sequence𝑷𝒇0BCs𝑷𝑊𝑭\nabla\cdot\bm{P}+\bm{f}=0+\text{BCs},~{}\bm{P}=\frac{\partial W}{\partial\bm{% F}}∇ ⋅ bold_italic_P + bold_italic_f = 0 + BCs , bold_italic_P = divide start_ARG ∂ italic_W end_ARG start_ARG ∂ bold_italic_F end_ARG W=μ(𝑿)2(I1¯3)κ(𝑿)4(JF212lnJF)𝑊𝜇𝑿2¯subscript𝐼13𝜅𝑿4superscriptsubscript𝐽𝐹212subscript𝐽𝐹W=\frac{\mu(\bm{X})}{2}(\bar{I_{1}}-3)-\frac{\kappa(\bm{X})}{4}(J_{F}^{2}-1-2% \ln J_{F})italic_W = divide start_ARG italic_μ ( bold_italic_X ) end_ARG start_ARG 2 end_ARG ( over¯ start_ARG italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG - 3 ) - divide start_ARG italic_κ ( bold_italic_X ) end_ARG start_ARG 4 end_ARG ( italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 - 2 roman_ln italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) 𝝈+𝒇=0+BCs,𝝈^=𝑪𝜺^formulae-sequence𝝈𝒇0BCs^𝝈𝑪^𝜺\nabla\cdot\bm{\sigma}+\bm{f}=0+\text{BCs},~{}\hat{\bm{\sigma}}=\bm{C}\hat{\bm% {\varepsilon}}∇ ⋅ bold_italic_σ + bold_italic_f = 0 + BCs , over^ start_ARG bold_italic_σ end_ARG = bold_italic_C over^ start_ARG bold_italic_ε end_ARG 𝒒=0𝒒0-\nabla\cdot\bm{q}=0- ∇ ⋅ bold_italic_q = 0 𝒒=k(𝒙,T)T𝒒𝑘𝒙𝑇𝑇\bm{q}=-k(\bm{x},T)~{}\nabla Tbold_italic_q = - italic_k ( bold_italic_x , italic_T ) ∇ italic_T k(𝒙,T)=k0(𝒙)(1+2T4)𝑘𝒙𝑇subscript𝑘0𝒙12superscript𝑇4k(\bm{x},T)=k_{0}(\bm{x})(1+2T^{4})italic_k ( bold_italic_x , italic_T ) = italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) ( 1 + 2 italic_T start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT )
Operator 𝒪𝒪\mathcal{O}caligraphic_O: In \to Out 𝒪:𝑪(𝒙)𝑼(𝒙):𝒪𝑪𝒙𝑼𝒙\mathcal{O}:\bm{C}(\bm{x})\to\bm{U}(\bm{x})caligraphic_O : bold_italic_C ( bold_italic_x ) → bold_italic_U ( bold_italic_x ),   𝒪:𝑼b𝑼(𝒙):𝒪subscript𝑼𝑏𝑼𝒙\mathcal{O}:\bm{U}_{b}\to\bm{U}(\bm{x})caligraphic_O : bold_italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT → bold_italic_U ( bold_italic_x ) 𝒪:k(𝒙)T(𝒙):𝒪𝑘𝒙𝑇𝒙\mathcal{O}:{k}(\bm{x})\to{T}(\bm{x})caligraphic_O : italic_k ( bold_italic_x ) → italic_T ( bold_italic_x )
Geometry and BCs [Uncaptioned image] [Uncaptioned image]
Input Samples 𝑪(𝒙)=Fourier-based param.𝑪𝒙Fourier-based param.\bm{C}(\bm{x})=\text{Fourier-based param.}bold_italic_C ( bold_italic_x ) = Fourier-based param. 𝑼b=Random Gaussiansubscript𝑼𝑏Random Gaussian\bm{U}_{b}=\text{Random Gaussian}bold_italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = Random Gaussian [Uncaptioned image] k0=0.95Sigmoid(20(kf0.5))subscript𝑘00.95Sigmoid20subscript𝑘𝑓0.5\displaystyle k_{0}=0.95\cdot\text{Sigmoid}\left(20\left(k_{f}-0.5\right)\right)italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0.95 ⋅ Sigmoid ( 20 ( italic_k start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT - 0.5 ) ) +0.05.0.05\displaystyle+0.05.+ 0.05 . kf=Fourier-based param.subscript𝑘𝑓Fourier-based param.k_{f}=\text{Fourier-based param.}italic_k start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = Fourier-based param. [Uncaptioned image]
2D Transient problems Non-linear thermal diffusion Allen-Cahn equation
PDE ρcpTt=𝒒+Q𝜌subscript𝑐𝑝𝑇𝑡𝒒𝑄\rho c_{p}\frac{\partial T}{\partial t}=-\nabla\cdot\bm{q}+Qitalic_ρ italic_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT divide start_ARG ∂ italic_T end_ARG start_ARG ∂ italic_t end_ARG = - ∇ ⋅ bold_italic_q + italic_Q 𝒒=KT,K=k0(𝒙)(1+αT)formulae-sequence𝒒𝐾𝑇𝐾subscript𝑘0𝒙1𝛼𝑇\bm{q}=-K~{}\nabla T,~{}K=k_{0}(\bm{x})(1+\alpha T)bold_italic_q = - italic_K ∇ italic_T , italic_K = italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) ( 1 + italic_α italic_T ) ϕt=M(δFδϕ)italic-ϕ𝑡𝑀𝛿𝐹𝛿italic-ϕ\frac{\partial\phi}{\partial t}=-M\left(\frac{\delta F}{\delta\phi}\right)divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_t end_ARG = - italic_M ( divide start_ARG italic_δ italic_F end_ARG start_ARG italic_δ italic_ϕ end_ARG ) F=Ω(ϵ2|ϕ|2+f(ϕ))𝑑Ω𝐹subscriptΩitalic-ϵ2superscriptitalic-ϕ2𝑓italic-ϕdifferential-dΩF=\int_{\Omega}\left(\frac{\epsilon}{2}|\nabla\phi|^{2}+f(\phi)\right)d\Omegaitalic_F = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT ( divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG | ∇ italic_ϕ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_f ( italic_ϕ ) ) italic_d roman_Ω
Operator 𝒪𝒪\mathcal{O}caligraphic_O, In \to Out 𝒪:Ti(𝒙)Ti+1(𝒙):𝒪subscript𝑇𝑖𝒙subscript𝑇𝑖1𝒙\mathcal{O}:T_{i}(\bm{x})\to T_{i+1}(\bm{x})caligraphic_O : italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) → italic_T start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( bold_italic_x ) 𝒪:ϕi(𝒙)ϕi+1(𝒙):𝒪subscriptitalic-ϕ𝑖𝒙subscriptitalic-ϕ𝑖1𝒙\mathcal{O}:\phi_{i}(\bm{x})\to\phi_{i+1}(\bm{x})caligraphic_O : italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) → italic_ϕ start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( bold_italic_x )
Geometry and BCs [Uncaptioned image] [Uncaptioned image]
Input Samples T0𝒩(0,exp(XiXj22ε2))similar-tosubscript𝑇0𝒩0superscriptnormsubscript𝑋𝑖subscript𝑋𝑗22superscript𝜀2T_{0}\sim\mathcal{N}\left(0,\exp\left(-\tfrac{\|X_{i}-X_{j}\|^{2}}{2% \varepsilon^{2}}\right)\right)italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , roman_exp ( - divide start_ARG ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ) [Uncaptioned image] ϕ0𝒩(0,exp(XiXj22ε2))similar-tosubscriptitalic-ϕ0𝒩0superscriptnormsubscript𝑋𝑖subscript𝑋𝑗22superscript𝜀2\phi_{0}\sim\mathcal{N}\left(0,\exp\left(-\tfrac{\|X_{i}-X_{j}\|^{2}}{2% \varepsilon^{2}}\right)\right)italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , roman_exp ( - divide start_ARG ∥ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ) [Uncaptioned image]

3 iFOL: implicit Finite Operator Learning

3.1 Implicit Neural Representations

Implicit Neural Representations (INRs) are multi-layer perceptron (MLP) networks that are coordinate-based and parameterized by L𝐿Litalic_L layers of weights 𝐖isubscript𝐖𝑖\mathbf{W}_{i}bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, biases 𝐛isubscript𝐛𝑖\mathbf{b}_{i}bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and nonlinear activation functions σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with θ=(𝐖i,𝐛i)i=1L𝜃superscriptsubscriptsubscript𝐖𝑖subscript𝐛𝑖𝑖1𝐿\theta=(\mathbf{W}_{i},\mathbf{b}_{i})_{i=1}^{L}italic_θ = ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT. These networks model spatial fields as an implicit function that maps spatial coordinates to scalar or vector quantities xdfθ(x)𝑥superscript𝑑maps-tosubscript𝑓𝜃𝑥x\in\mathbb{R}^{d}\mapsto f_{\theta}(x)italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ↦ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ). We adopt SIREN (Sitzmann et al., 2020) as the core INR architecture in our framework. This network utilizes sine activation functions, combined with a distinct initialization strategy.

𝐒𝐈𝐑𝐄𝐍(x)=𝐖L(σL1σL2σ0(x))+𝐛L,withσi(𝜼i)=sin(ω0(𝐖i𝜼i+𝐛i))formulae-sequence𝐒𝐈𝐑𝐄𝐍𝑥subscript𝐖𝐿subscript𝜎𝐿1subscript𝜎𝐿2subscript𝜎0𝑥subscript𝐛𝐿withsubscript𝜎𝑖subscript𝜼𝑖subscript𝜔0subscript𝐖𝑖subscript𝜼𝑖subscript𝐛𝑖\mathbf{SIREN}({x})=\mathbf{W}_{L}(\sigma_{L-1}\circ\sigma_{L-2}\circ\cdots% \circ\sigma_{0}({x}))+\mathbf{b}_{L},\quad\text{with}\quad\sigma_{i}(\bm{\eta}% _{i})=\sin\left(\omega_{0}(\mathbf{W}_{i}\bm{\eta}_{i}+\mathbf{b}_{i})\right)bold_SIREN ( italic_x ) = bold_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∘ italic_σ start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) ) + bold_b start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , with italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = roman_sin ( italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) (5)

Here, 𝜼0=𝒙subscript𝜼0𝒙\bm{\eta}_{0}=\bm{x}bold_italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_italic_x, and (𝜼i)i1subscriptsubscript𝜼𝑖𝑖1(\bm{\eta}_{i})_{i\geq 1}( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT represent the hidden activations at each layer of the network. The parameter ω0+subscript𝜔0superscriptsubscript\omega_{0}\in\mathbb{R}_{+}^{*}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a hyperparameter that governs the frequency bandwidth of the network. SIREN requires a specialized initialization of weights to ensure that outputs across layers adhere to a standard normal distribution.

3.2 Conditional Neural Fields

Our goal is to obtain a plausible solution to a parameterized partial differential equation using neural fields. This is achieved by conditioning the neural field on a set of latent variables 𝒍𝒍\bm{l}bold_italic_l, which can encode the solution field across arbitrary parameterizations and discretization of the underlying PDEs. By varying these latent variables, we can effectively modulate the neural solution field. 𝒍𝒍\bm{l}bold_italic_l is typically a low-dimensional vector, and is also referred to as a feature code. For a comprehensive discussion and review of conditional neural fields, interested readers are referred to Xie et al. (2022).

iFOL utilizes Feature-wise Linear Modulation (FiLM Perez et al. (2018)), which conditions neural field in an auto-decoding manner. It employs a simple neural network without hidden layers (i.e., a linear transformation) to predict a shift vector from the latent variables 𝒍𝒍\bm{l}bold_italic_l for each layer of the neural field network. This yields the shift-modulated SIREN:

uθ,γ(x,𝒍)subscript𝑢𝜃𝛾𝑥𝒍\displaystyle u_{\theta,\gamma}({x},\bm{l})italic_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ( italic_x , bold_italic_l ) =Decode(𝒍)=𝐖L(σL1σL2σ0(x))+𝐛L,absentDecode𝒍subscript𝐖𝐿subscript𝜎𝐿1subscript𝜎𝐿2subscript𝜎0𝑥subscript𝐛𝐿\displaystyle=\text{Decode}(\bm{l})=\mathbf{W}_{L}\left(\sigma_{L-1}\circ% \sigma_{L-2}\circ\cdots\circ\sigma_{0}({x})\right)+\mathbf{b}_{L},= Decode ( bold_italic_l ) = bold_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∘ italic_σ start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) ) + bold_b start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , (6)
σi(𝜼i,ϕi)subscript𝜎𝑖subscript𝜼𝑖subscriptbold-italic-ϕ𝑖\displaystyle\sigma_{i}(\bm{\eta}_{i},\bm{\phi}_{i})italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) =sin(ω0(𝐖i𝜼i+𝐛i+ϕi)),absentsubscript𝜔0subscript𝐖𝑖subscript𝜼𝑖subscript𝐛𝑖subscriptbold-italic-ϕ𝑖\displaystyle=\sin\left(\omega_{0}(\mathbf{W}_{i}\bm{\eta}_{i}+\mathbf{b}_{i}+% \bm{\phi}_{i})\right),= roman_sin ( italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ,
ϕi(𝒍)subscriptbold-italic-ϕ𝑖𝒍\displaystyle\bm{\phi}_{i}(\bm{l})bold_italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_l ) =𝐕i𝒍+𝐜i.absentsubscript𝐕𝑖𝒍subscript𝐜𝑖\displaystyle=\mathbf{V}_{i}\bm{l}+\mathbf{c}_{i}.= bold_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_l + bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

where uθ,γsubscript𝑢𝜃𝛾u_{\theta,\gamma}italic_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT is the neural solution field, θ=(𝐖i,𝐛i)i=1L𝜃superscriptsubscriptsubscript𝐖𝑖subscript𝐛𝑖𝑖1𝐿\theta=(\mathbf{W}_{i},\mathbf{b}_{i})_{i=1}^{L}italic_θ = ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT and γ=(𝐕i,𝐜i)i=1L1𝛾superscriptsubscriptsubscript𝐕𝑖subscript𝐜𝑖𝑖1𝐿1\gamma=(\mathbf{V}_{i},\mathbf{c}_{i})_{i=1}^{L-1}italic_γ = ( bold_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L - 1 end_POSTSUPERSCRIPT represent the trainable parameters of the SIREN network, referred to as the Synthesizer, and the FiLM hypernetworks, referred to as the Modulator, respectively.

3.3 PDE Loss Function

Building on physics-informed neural networks, we introduce a domain-integrated physical loss function whose variation with respect to the solution field yields the residuals of the governing partial differential equations. Among the functionals commonly used in computational mechanics, the total potential energy functional and the weighted residual functional naturally fulfill this property. Since the energy functional is physics-specific, not always explicitly known, and its derivation can be challenging, we adopt the well-established weighted residual functional as follows:

𝐏𝐃𝐄=Ωuθ,γ𝑑Ω=0.subscript𝐏𝐃𝐄subscriptΩsubscript𝑢𝜃𝛾differential-dΩ0\mathcal{L}_{\mathbf{PDE}}=\int_{\Omega}u_{\theta,\gamma}\mathcal{R}\,d\Omega=0.caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT caligraphic_R italic_d roman_Ω = 0 . (7)

This formulation weights the residual using the neural field as a test function, enforcing its vanishing in an integral sense. Applying the chain rule and variational calculus, the gradient of the loss function with respect to the predicted solution is given by:

δuθ,γ𝐏𝐃𝐄subscript𝛿subscript𝑢𝜃𝛾subscript𝐏𝐃𝐄\displaystyle\delta_{u_{\theta,\gamma}}\mathcal{L}_{\mathbf{PDE}}italic_δ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT =Ω𝑑Ω+Ωuθ,γδuθ,γ𝑑ΩabsentsubscriptΩdifferential-dΩsubscriptΩsubscript𝑢𝜃𝛾subscript𝛿subscript𝑢𝜃𝛾differential-dΩ\displaystyle=\int_{\Omega}\mathcal{R}\,d\Omega+\int_{\Omega}u_{\theta,\gamma}% \,\delta_{u_{\theta,\gamma}}\mathcal{R}\,d\Omega= ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT caligraphic_R italic_d roman_Ω + ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_R italic_d roman_Ω (8)
=Ω𝑑ΩabsentsubscriptΩdifferential-dΩ\displaystyle=\int_{\Omega}\mathcal{R}\,d\Omega= ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT caligraphic_R italic_d roman_Ω

where δuθ,γsubscript𝛿subscript𝑢𝜃𝛾\delta_{u_{\theta,\gamma}}\mathcal{R}italic_δ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_R is zero due to the stationarity of the residual functional with respect to the solution field. To compute the loss function efficiently, we discretize the domain and employ the corresponding discrete residuals (Eq. 3) as follows:

𝐏𝐃𝐄(𝐮θ,γ(t,𝒙,𝒍),𝐜)=e=1nel(𝐮θ,γe)T𝐫e,where𝐫e=𝐌ed𝐮θ,γedt+𝐊e𝐮θ,γe𝐠e.formulae-sequencesubscript𝐏𝐃𝐄subscript𝐮𝜃𝛾𝑡𝒙𝒍𝐜superscriptsubscript𝑒1subscript𝑛𝑒𝑙superscriptsuperscriptsubscript𝐮𝜃𝛾𝑒𝑇superscript𝐫𝑒wheresuperscript𝐫𝑒superscript𝐌𝑒𝑑superscriptsubscript𝐮𝜃𝛾𝑒𝑑𝑡superscript𝐊𝑒superscriptsubscript𝐮𝜃𝛾𝑒superscript𝐠𝑒\mathcal{L}_{\mathbf{PDE}}(\mathbf{u}_{\theta,\gamma}(t,\bm{x},\bm{l}),\mathbf% {c})=\sum_{e=1}^{n_{el}}({\mathbf{u}_{\theta,\gamma}^{e}})^{T}\mathbf{r}^{e},~% {}\text{where}~{}~{}\mathbf{r}^{e}=\mathbf{M}^{e}\frac{d\mathbf{u}_{\theta,% \gamma}^{e}}{dt}+\mathbf{K}^{e}\mathbf{u}_{\theta,\gamma}^{e}-\mathbf{g}^{e}.caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ( italic_t , bold_italic_x , bold_italic_l ) , bold_c ) = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_r start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT , where bold_r start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT = bold_M start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT divide start_ARG italic_d bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT end_ARG start_ARG italic_d italic_t end_ARG + bold_K start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT - bold_g start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT . (9)

where 𝐮θ,γsubscript𝐮𝜃𝛾\mathbf{u}_{\theta,\gamma}bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT represents the neural solution field vector evaluated at the mesh points, and the superscript e𝑒eitalic_e indicates that the quantity is evaluated at the element level. Here note that, although the PDE loss presented here is based on the residual, one can also directly utilize the energy functional of a PDE as shown in Table 3 in Appendix A.4 as well. In fact, the energy functional-based loss in Table 3 was employed for the transient problems.

This strategy enables the seamless integration of well-established numerical methods into the learning process while minimizing computational overhead. Consequently, it allows for the construction of PINNs without relying on resource-intensive automatic differentiation to evaluate the components of the PDE.

3.4 Training

The training and inference of the conditional neural fields (CNFs) involve the computation of the latent variables 𝒍𝒍\bm{l}bold_italic_l, a step commonly referred to as encoding. In data-driven operator learning methodologies utilizing CNFs, encoding is performed on both the input and output fields, following the so-called Encode-Process-Decode framework(Dupont et al., 2022b; Yin et al., 2022; Serrano et al., 2023; Catalani et al., 2024). In contrast, this work uniquely encodes the PDE and the underlying physics, rather than the spatial fields. In each training step of iFOL, the latent codes for the sample batch \mathcal{B}caligraphic_B are first derived by minimizing the physical loss with respect to the latent codes in just a few steps of gradient descent:

𝒍(𝐜i)superscript𝒍subscript𝐜𝑖\displaystyle\bm{l}^{*}(\mathbf{c}_{i})bold_italic_l start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) =Encode(𝐏𝐃𝐄)absentEncode𝐏𝐃𝐄\displaystyle=\text{Encode}(\mathbf{PDE})= Encode ( bold_PDE ) (10)
=argmin𝒍𝐏𝐃𝐄=e=1nel(𝐮θ,γe)T𝐫e(𝐮θ,γe,𝐜ie),iformulae-sequenceabsentsubscript𝒍subscript𝐏𝐃𝐄superscriptsubscript𝑒1subscript𝑛𝑒𝑙superscriptsuperscriptsubscript𝐮𝜃𝛾𝑒𝑇superscript𝐫𝑒subscriptsuperscript𝐮𝑒𝜃𝛾subscriptsuperscript𝐜𝑒𝑖for-all𝑖\displaystyle=\arg\min_{\bm{l}}\mathcal{L}_{\mathbf{PDE}}=\sum_{e=1}^{n_{el}}(% {\mathbf{u}_{\theta,\gamma}^{e}})^{T}\mathbf{r}^{e}(\mathbf{u}^{e}_{\theta,% \gamma},\mathbf{c}^{e}_{i}),\quad\forall i\in\mathcal{B}= roman_arg roman_min start_POSTSUBSCRIPT bold_italic_l end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_r start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT , bold_c start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , ∀ italic_i ∈ caligraphic_B

Subsequently, the parameters of the Synthesizer and Modulator networks are optimized using the computed latent codes:

θ,γ=argminθ,γ𝐏𝐃𝐄=e=1nel(𝐮θ,γe)T𝐫e(𝐮θ,γe,𝐜ie),iformulae-sequencesuperscript𝜃superscript𝛾subscript𝜃𝛾subscript𝐏𝐃𝐄superscriptsubscript𝑒1subscript𝑛𝑒𝑙superscriptsuperscriptsubscript𝐮𝜃𝛾𝑒𝑇superscript𝐫𝑒subscriptsuperscript𝐮𝑒𝜃𝛾subscriptsuperscript𝐜𝑒𝑖for-all𝑖\theta^{*},\gamma^{*}=\arg\min_{\theta,\gamma}\mathcal{L}_{\mathbf{PDE}}=\sum_% {e=1}^{n_{el}}({\mathbf{u}_{\theta,\gamma}^{e}})^{T}\mathbf{r}^{e}(\mathbf{u}^% {e}_{\theta,\gamma},\mathbf{c}^{e}_{i}),\quad\forall i\in\mathcal{B}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_r start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT , bold_c start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , ∀ italic_i ∈ caligraphic_B (11)

Basically, we partition the model into context-specific parameters, which dynamically adapt to individual samples, and meta-trained parameters, which are globally optimized to enable knowledge transfer across diverse contexts. To the best of our knowledge, this work presents the first application of second-order meta-learning, inspired by the CAVIA algorithm (Zintgraf et al., 2019), for the parametric learning of PDEs in a physics-informed manner. The details of the approach are provided in Algorithms 1 and 2. Here, α𝛼\alphaitalic_α denotes the encoding learning rate, while λ𝜆\lambdaitalic_λ is the training learning rate, which adjusts the weights of the modulator and synthesizer networks.

Algorithm 1 Encode PDE for sample batch \mathcal{B}caligraphic_B
1:Initialize: Set codes to zero 𝒍i0,iformulae-sequencesubscript𝒍𝑖0for-all𝑖\bm{l}_{i}\leftarrow 0,\quad\forall i\in\mathcal{B}bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← 0 , ∀ italic_i ∈ caligraphic_B;
2:for each i𝑖i\in\mathcal{B}italic_i ∈ caligraphic_B and step {1,,Ke}absent1subscript𝐾𝑒\in\{1,\dots,K_{e}\}∈ { 1 , … , italic_K start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT } do
3:     𝒍i𝒍iα𝒍i𝐏𝐃𝐄(𝐮θ,γ(t,𝒙,𝒍i),𝐜i)subscript𝒍𝑖subscript𝒍𝑖𝛼subscriptsubscript𝒍𝑖subscript𝐏𝐃𝐄subscript𝐮𝜃𝛾𝑡𝒙subscript𝒍𝑖subscript𝐜𝑖\bm{l}_{i}\leftarrow\bm{l}_{i}-\alpha\nabla_{\bm{l}_{i}}\mathcal{L}_{\mathbf{% PDE}}(\mathbf{u}_{\theta,\gamma}(t,\bm{x},\bm{l}_{i}),\mathbf{c}_{i})bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_α ∇ start_POSTSUBSCRIPT bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ( italic_t , bold_italic_x , bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
4:end for
Algorithm 2 Training of iFOL
1:while no convergence do
2:     for each mini-batch \mathcal{M}\subseteq\mathcal{B}caligraphic_M ⊆ caligraphic_B do
3:         /* Encode PDE */
4:         for each i𝑖i\in\mathcal{M}italic_i ∈ caligraphic_M and step {1,,Ke}absent1subscript𝐾𝑒\in\{1,\dots,K_{e}\}∈ { 1 , … , italic_K start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT } do
5:              𝒍i𝒍iα𝒍i𝐏𝐃𝐄(𝐮θ,γ(t,𝒙,𝒍i),𝐜i)subscript𝒍𝑖subscript𝒍𝑖𝛼subscriptsubscript𝒍𝑖subscript𝐏𝐃𝐄subscript𝐮𝜃𝛾𝑡𝒙subscript𝒍𝑖subscript𝐜𝑖\bm{l}_{i}\leftarrow\bm{l}_{i}-\alpha\nabla_{\bm{l}_{i}}\mathcal{L}_{\mathbf{% PDE}}(\mathbf{u}_{\theta,\gamma}(t,\bm{x},\bm{l}_{i}),\mathbf{c}_{i})bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_α ∇ start_POSTSUBSCRIPT bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ( italic_t , bold_italic_x , bold_italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
6:         end for
7:         /* Update Modulator & Synthesizer */
8:         θ,γθ,γλ1||iθ,γ𝐏𝐃𝐄(𝐮θ,γ(t,𝒙,𝒍i),𝐜i)formulae-sequence𝜃𝛾𝜃𝛾𝜆1subscript𝑖subscript𝜃𝛾subscript𝐏𝐃𝐄subscript𝐮𝜃𝛾𝑡𝒙subscriptsuperscript𝒍𝑖subscript𝐜𝑖\theta,\gamma\leftarrow\theta,\gamma-\lambda\frac{1}{|\mathcal{M}|}\sum_{i\in% \mathcal{M}}\nabla_{\theta,\gamma}\mathcal{L}_{\mathbf{PDE}}(\mathbf{u}_{% \theta,\gamma}(t,\bm{x},\bm{l}^{*}_{i}),\mathbf{c}_{i})italic_θ , italic_γ ← italic_θ , italic_γ - italic_λ divide start_ARG 1 end_ARG start_ARG | caligraphic_M | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_M end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT bold_PDE end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ( italic_t , bold_italic_x , bold_italic_l start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT );
9:     end for
10:end while

To provide further clarity on the architecture presented above, a comparative analysis between the proposed approach and two well-established operator learning algorithms is shown in Fig. 1. Furthermore, a more detailed description of each component of the iFOL framework is provided in Fig. 2. The upper section of the figure outlines the approach for stationary problems, while the lower section demonstrates the application of the same architecture to transient problems. Notably, once trained on transient problems, the network can be repeatedly invoked to generate solutions over time, without the need for additional retraining.

Refer to caption
Figure 1: Comparison of different architectures for operator learning. The brownish arrows indicate the flow of gradients in each method, while the input parametric space and output continuous field are highlighted with reddish and bluish circles, respectively. Notably, in the proposed iFOL method, the input space first influences the loss term, after which the latent code is constructed.
Refer to caption
Figure 2: Training and inference procedures in the iFOL framework. Top: Details of the iFOL architecture for quasi-static problems, where the goal is to predict the corresponding solution for a given input parameter space in a single step. Bottom: Details of the iFOL architecture for transient problems, where the only difference lies in the inference step—the trained network is repeatedly called to predict the solution field over time.

4 Results

4.1 Stationary problems: hyper-elastic mechanical equilibrium

We begin by evaluating the performance of iFOL in predicting the mechanical deformation of a hyperelastic heterogeneous solid when the property distribution varies, i.e., the operator maps 𝒪:𝑪(𝒙)𝒖(𝒙):𝒪𝑪𝒙𝒖𝒙\mathcal{O}:\bm{C}(\bm{x})\to\bm{u}(\bm{x})caligraphic_O : bold_italic_C ( bold_italic_x ) → bold_italic_u ( bold_italic_x ). The strong form and energy formulation of the solid in this context follows a nonlinear form, which is detailed in Tables 1 and 3 and Section A.1, along with the selected material properties.

Refer to caption
Figure 3: Results for the 2D stationary mechanical equilibrium PDE considering a hyperelastic material model. The operator learns to map the elasticity distribution to the deformation fields (see also Table 1 and Section A.1).

The 8000800080008000 random training samples are generated using a Fourier-based parametrization with fx={2,4,6}subscript𝑓𝑥246f_{x}=\{2,4,6\}italic_f start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = { 2 , 4 , 6 } and fy={2,4,6}subscript𝑓𝑦246f_{y}=\{2,4,6\}italic_f start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = { 2 , 4 , 6 } (See also Rezaei et al. (2025b) for details on this parameterization scheme). The chosen network parameters are listed in Table 4.

In Fig. 3, the results of iFOL are shown for four unseen test cases, all evaluated at more than twice the training resolution (i.e., approximately six times more grid points). The first row presents predictions for unseen higher-frequency components as well as low-frequency ones with unsymmetrical property distribution. In the second row, we further challenge the network by testing its performance on polycrystalline microstructures, which go well beyond the training samples used in this study. Notably, in the last case featuring a multiphase polycrystal, we even alter the material property values (e.g., Young’s modulus) to include ranges not encountered during training.

Despite significant changes in the input space and resolution, the network produces reasonable predictions. Across all test cases, the errors remain well below one order of magnitude compared to the applied displacement and maximum deformation in the solid.

At this point, a valid question is how we can ensure the quantity and quality of the initial training samples to perform a certain task. Although a solid and unique answer to this question requires much further intensive study, as it is influenced by many parameters, we attempt to address it here by altering the number of training samples while keeping the network hyperparameters the same. As expected and shown in Fig. 4, increasing the number of samples systematically reduces the errors up to a certain level. Interestingly, the entire framework appears to be very sample- or data-efficient, as extensive samples are not required to achieve reasonable results. The same pattern repeats in other case studies, but we omit them for the sake of brevity.

Refer to caption
Figure 4: Studies on the influence of the number of training samples on the obtained errors for unseen test samples. Increasing the number of samples reduces the prediction errors while increasing the training time.

4.2 Stationary problems: elasticity and learning on boundary conditions

In this study, we examine the performance of iFOL in learning the solution for given BCs while keeping material properties and geometry fixed. More details on the chosen boundary values are provided in Section A.1, while details on the formulation of the loss term can be found in Tables 1 and 3. The selected network hyperparameters are listed in Table 4. Here, we define the operator as 𝒪:𝑼b𝑼(𝒙):𝒪subscript𝑼𝑏𝑼𝒙\mathcal{O}:\bm{U}_{b}\to\bm{U}(\bm{x})caligraphic_O : bold_italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT → bold_italic_U ( bold_italic_x ). We focus on gyroid surfaces, a specific class of metamaterials. The complex topology of the gyroid enables novel functionalities, which highlights its importance in the design of next-generation materials. Moreover, we consider applications in multiscale analysis, where the applied boundary conditions typically originate from the macroscale, and it is crucial to determine the microstructural response to the imposed macroscopic displacement field.

For this purpose, we generate 1000 random samples for the applied displacement vector on the front surface of the chosen metamaterial and train iFOL to learn the full deformation field for a given Dirichlet BC. In Fig. 5, the results of iFOL versus FEM are shown for three different test cases. For visualization purposes, the deformations are magnified by a factor of 10. The maximum pointwise errors for unseen test cases are one order of magnitude lower than the applied displacement field and are mainly concentrated in specific regions where Neumann BCs are applied. It is worth noting that, within the iFOL framework, Dirichlet BCs are enforced in a hard manner, which explains the zero error at the front and back surfaces.

Refer to caption
Figure 5: Results for the 3D stationary mechanical equilibrium PDE considering a linear elastic material model. The operator learns to map the given applied Drichlet boundary conditions to the deformation fields (see also Table 1 and Section A.1).

4.3 Stationary problems: Nonlinear diffusion

In this study, we not only critically assess the applicability of iFOL in approximating nonlinear operators on complex geometries with irregular meshes but also evaluate its effectiveness for sensitivity analysis. We are specifically interested in the derivatives of domain-integrated functionals that explicitly depend on the solution field (the operator’s output) with respect to the parameter field (the operator’s input), e.g.,

𝒥=ΩT(k)𝑑Ω,δk𝒥=ΩδkT𝑑Ω.formulae-sequence𝒥subscriptΩ𝑇𝑘differential-dΩsubscript𝛿𝑘𝒥subscriptΩsubscript𝛿𝑘𝑇differential-dΩ\mathcal{J}=\int_{\Omega}T(k)\,d\Omega,~{}~{}~{}\delta_{k}\mathcal{J}=\int_{% \Omega}\delta_{k}T\,d\Omega.caligraphic_J = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_T ( italic_k ) italic_d roman_Ω , italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_J = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T italic_d roman_Ω . (12)

Here, δkTsubscript𝛿𝑘𝑇\delta_{k}Titalic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T represents the Jacobian of the operator 𝒪:k(𝒙)T(𝒙):𝒪𝑘𝒙𝑇𝒙\mathcal{O}:{k}({\bm{x}})\to{T}({\bm{x}})caligraphic_O : italic_k ( bold_italic_x ) → italic_T ( bold_italic_x ) and can be efficiently computed by applying automatic differentiation (AD) to the trained iFOL network as a postprocessing step.

Refer to caption
Figure 6: Results for stationary diffusion PDE parameterized by a temperature-dependent and spatially varying conductivity field (see also Table 1 and Section A.2). The training mesh is 7× coarser than the evaluation mesh.

For this example, 8000 samples were used for training, and 2000 samples were reserved for testing as unseen data. The conductivity field is parametrized using the Fourier parametrization as introduced in (Rezaei et al., 2024), with fx={1,2,3}subscript𝑓𝑥123f_{x}=\{1,2,3\}italic_f start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = { 1 , 2 , 3 }, fy={1,2,3}subscript𝑓𝑦123f_{y}=\{1,2,3\}italic_f start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = { 1 , 2 , 3 }, and fz={0}subscript𝑓𝑧0f_{z}=\{0\}italic_f start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT = { 0 }. The so-called sigmoidal projection was applied to generate dual-phase conductivity fields within the range of 0.5 to 1.0. For further details, refer to the corresponding column in Table 1. A discussion and details on the network architecture are provided in Section B and Table 4. For the summary of the mathematical formulation see also Section A.2.

Figure 6 showcases how iFOL performs in predicting temperature fields for unseen conductivity distributions compared to the reference finite element method. The complicated 3D geometry of FOL is discretized using a fully unstructured tetrahedral mesh. The input parameter exhibits a complex spatial distribution, while the temperature field is governed by a thermal diffusion PDE with high nonlinearity arising from the temperature-dependent conductivity field. iFOL and reference FE solutions are presented for two samples that were not included in the training set. We observe that the learned operator exhibits generalization both to the parameter field and to the evaluation grid, which is 7× finer than the training mesh.

Figure 7 displays the sensitivity maps of δk𝒥subscript𝛿𝑘𝒥\delta_{k}\mathcal{J}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_J for two unseen samples, where one map is obtained via automatic differentiation applied to iFOL’s inference function, and the other is computed using adjoint-based sensitivity analysis. Qualitative comparisons show a high degree of agreement between the two maps, with both displaying similar spatial sensitivity patterns. To the authors’ knowledge, this is the first time that the Jacobian of neural network-based operators for spatially distributed input and output fields has been evaluated and compared against classical sensitivity analysis techniques, such as adjoint methods. Based on the results presented here, we can conclude that iFOL is not only able to accurately approximate PDE-based operators in a mesh- and geometry-agnostic manner, but also to reliably estimate the operator’s Jacobian.

Refer to caption
Figure 7: Sensitivity maps of δk𝒥subscript𝛿𝑘𝒥\delta_{k}\mathcal{J}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_J for two unseen samples, comparing AD-iFOL and adjoint-based analysis.

4.4 Transient problems: nonlinear thermal diffusion in heterogeneous domain

As mentioned before, for transient problems, we adopt a slightly different strategy, where we train the network to learn the dynamics of the transient problem from step i𝑖iitalic_i to i+1𝑖1i+1italic_i + 1. Upon successful training, the trained network can then be called multiple times in a loop to predict the evolution of the desired solution field. In this section, we present the results for nonlinear thermal diffusion in a heterogeneous domain where we have a map between Ti(x,y)subscript𝑇𝑖𝑥𝑦T_{i}(x,y)italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , italic_y ) and Ti+1(x,y)subscript𝑇𝑖1𝑥𝑦T_{i+1}(x,y)italic_T start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( italic_x , italic_y ). The heterogeneity is intentionally introduced to create sharp solution jumps, which typically correspond to higher frequencies in the solution and are usually not easily captured by a naive network setup. From a physical perspective, this can be related to different phases in a material microstructural system. The additional nonlinearity arises from the fact that the material conductivity is not only a function of space but also varies with temperature (i.e., the solution field). See also Section A.2 for the summary of the mathematical formulation and further details on the material properties. The results of this study are summarized in Fig. 8.

Refer to caption
Figure 8: Results for the transient nonlinear diffusion PDE considering a temperature-dependent and heterogeneous conductivity. The operator learns to map the initial conditions to the next evolved temperature field (see also Table 1 and Section A.2).

Here, the training fields are randomly generated using the Gaussian process with varying length scales of the radial basis function kernel, and the results of the temperature profile evolution for different unseen initial conditions are provided. At this point, the resolution for both training and testing is the same (a grid of 51×51515151\times 5151 × 51). Note that any obtained temperature profile as an output serves as a new input for the next step. Therefore, for transient problems, errors can accumulate over time, which has also been reported in previous studies Du et al. (2024b); Dingreville et al. (2024); Yamazaki et al. (2025b). The network’s capability to handle unseen temperature fields is also worth mentioning. For example, from the fifth time step onward, the temperature profile is primarily dominated by BCs and does not resemble the randomly generated fields process used to train the network. Yet, the predictions remain reasonably accurate. We emphasize that the predictions of up to 50 steps shown in this figure are purely based on the one-time-trained network, with no labeled data or hybrid solver used to further support the network.

As a final remark, we should point out our fundamentally different approach to handling transient problems, which involves breaking the time axis as explained. In the classical PINN approach, the time variable is yet another input in addition to space. Although this approach initially makes sense and appears natural, training with respect to both time and space can become very challenging for high-dimensional problems, as the network can simply violate the so-called causality, requiring additional enhancements that are not needed here Wang et al. (2024).

4.5 Transient problems: Allen-Cahn equation on the complex domain

In this section, we present the results for the nonlinear Allen-Cahn equation in a homogeneous irregular domain, where we establish a mapping between ϕi(x,y)subscriptitalic-ϕ𝑖𝑥𝑦\phi_{i}(x,y)italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , italic_y ) and ϕi+1(x,y)subscriptitalic-ϕ𝑖1𝑥𝑦\phi_{i+1}(x,y)italic_ϕ start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( italic_x , italic_y ). See also Section A.3 for the summary of the mathematical formulation. The irregular domain is intentionally chosen to demonstrate the approach’s capability to handle complex geometries. From a practical perspective, such shapes are commonly observed in active material particles on the cathode side of battery systems, where phase transitions in the material govern ion diffusion Chen et al. (2024b). As before, training samples are generated from the Gaussian process to train the network. The results in Fig. 9 illustrate two different initializations, leading to distinct final phase profiles. The same strategy for the nonlinear thermal problem applies here, so we do not repeat every detail. Predictions remain accurate up to 10 time steps, after which errors gradually accumulate, a point that will be analyzed further. It is important to note that the resolution for both training and testing is kept the same, as it was found to be sufficient based on the length scale parameter in the phase field equation. Further discussion on super-resolution capabilities is provided in Section 4.6. We did not leverage any precomputed solution fields or hybrid solver coupling. However, such techniques could be incorporated in future developments to further reduce errors and maintain accuracy within expected margins. See investigations by Li et al. (2023c); Dingreville et al. (2024).

4.6 Studies on zero-shot super resolution

Zero-shot super-resolution (ZSSR) in operator learning refers to the ability of a trained model to enhance the resolution of function mappings without requiring additional high-resolution training data. Unlike traditional super-resolution techniques that rely on supervised learning with paired low- and high-resolution samples, zero-shot approaches leverage the inherent structure of the learned operator to generalize across different resolutions Li et al. (2021); Koopas et al. (2024).

ZSSR is particularly useful in scientific computing, where performing calculations on very fine discretizations is computationally expensive. It is worth noting the similarities to model order reduction techniques, where the dimensionality of the problem is reduced by capturing the dominant solution modes Quarteroni and Rozza (2011).

The ZSSR capability of iFOL is also demonstrated in Figures 3 and 6. Here, we further validate this claim by systematically testing the proposed iFOL approach and quantitatively assessing the error accumulation across different mesh resolutions. Our main focus is on stationary nonlinear diffusion in 3D, where we evaluate the solution on both finer and coarser grids, as well as on the 2D nonlinear Allen-Cahn equation, where we analyze error accumulation over time and for varying resolutions. Similar behavior is observed for other investigated problems, and therefore, for brevity, we do not report all details.

In Fig.10, we evaluate the network on two additional mesh resolutions. Since we use unstructured meshes and nodes in the 3D domain, we aim to assess whether the network can generate results for both coarser and finer meshes. For error measurement in this particular example, we adopt a strict criterion based on the maximum pointwise error. To ensure the results are not case-dependent, we systematically test multiple samples, with five representative cases shown for each resolution in Fig.10. The averaged values are highlighted in red.

Refer to caption
Figure 9: Results for the transient Allen-Cahn PDE considering in a complex-shape domain. The operator learns to map the initial conditions to the next evolved phase-field variable (see also Table 1 and Section A.3).

As expected, the prediction error increases when moving to resolutions different from the training resolution. However, as demonstrated in Fig. 6, the overall results remain acceptable. A potential way to mitigate this issue is to train the network on multiple resolutions across different samples, reducing bias toward a specific mesh resolution.

Refer to caption
Figure 10: Performance of iFOL in ZSSR for nonlinear thermal diffusion in a heterogeneous 3D problem with unstructured meshes. The maximum pointwise error is lowest for the trained resolution and increases for other mesh resolutions, yet remains within an acceptable range.
Refer to caption
Figure 11: Performance of iFOL in ZSSR for the nonlinear transient Allen-Cahn equation in an arbitrary 2D domain with unstructured meshes. The relative L2 error is lowest for the trained resolution but increases for other mesh resolutions and accumulates over time during further network evaluations.

In Fig. 11, we analyze the transient Allen-Cahn problem and compare results using relative L2 error measurements across three different resolutions. On the far left, we show the trained resolution, followed by two additional cases with two and four times more nodes, respectively. As expected, and similar to other physical problems, the error increases slightly when moving to different resolutions, particularly finer ones. Moreover, this diagram illustrates how errors accumulate over time for a given resolution (observe the spacing between circles, squares, and triangles along the y-axis). A potential remedy is to train the model using multiple resolutions.

Refer to caption
Figure 12: Prediction of iFOL for different time steps and resolutions for the nonlinear Allen-Cahn equation.

4.7 Computational cost analysis

Accuracy and computational cost are two key criteria for comparing classical techniques, such as the FEM and adjoint-based sensitivity analysis, with machine learning-based approaches, including PINNs and operator learning methods. In this section, we analyze the computational cost of the iFOL technique using the 3D nonlinear diffusion problem described in Section 4.3. Leveraging Google JAX Bradbury et al. (2018), we developed a unified framework that seamlessly integrates classical solver operations—such as element-wise computations, global Jacobian assembly, and right-hand side (RHS) construction—with machine learning techniques, including loss function evaluation and gradient-based optimization. This framework is implemented within the FOL paradigm Rezaei et al. (2024), utilizing key JAX features such as jax.vmap for efficient vectorized operations and jax.jit for just-in-time (JIT) compilation, enabling the translation of computationally demanding tasks into optimized machine code. The loss function is implemented exactly as formulated in Eq. 9. Furthermore, the physical loss function not only returns the element-wise weighted residual or its energy counterpart as a scalar loss function to be minimized during the learning process but also provides the corresponding elemental Jacobian and RHS. This unified implementation enables the same framework to be used for both FEM solvers and parameterized learning approaches such as iFOL.

Table 2 summarizes the computational times for the primal and sensitivity analyses performed using iFOL and FEM. or the sake of brevity, we focus on the example reported in Section 4.3, where we deal with a nonlinear PDE in a 3D complex geometry. A similar trend of results was also observed for other cases. We note that training iFOL on 8000 samples with 3800 iterations required 14 hours on NVIDIA A100 GPU with 40 GB of RAM (see Table 4). However, during inference, iFOL achieves speedups of 200×, 1000×, and 8300× compared to FEM on coarse (1069 nodes), normal (5485 nodes), and fine (37362 nodes) meshes, respectively. Beyond the speedup in the primal inference, iFOL also achieved 1.3× and 13× speedups for sensitivity analysis on normal and fine meshes, respectively. It should be noted that the reported times for AD-iFOL and Adjoint-FEM correspond exclusively to sensitivity analysis, assuming the primal solution is already available. Furthermore, the computational cost of iFOL-based sensitivity analysis remains independent of the number of response functions. In contrast, adjoint-based sensitivity analysis scales directly with the number of responses, as a separate adjoint system of equations must be solved for each response.

Table 2: Computational costs of the studies in the 3D nonlinear diffusion problem Sec. 4.3.
Training (h) Batch size Inference (ms/sample) # Iterations/sample
Mesh res. (# nodes) 1069 5485 37362
iFOL 14 64 1.26 1.60 3.59 3
FEM - 64 256 1599 29961 10
AD-iFOL - 32 169 173 228 0
Adjoint-FEM - 32 175 233 3152 1

5 Conclusion

We introduced the implicit Finite Operator Learning (iFOL) technique, a novel framework for solving parametric PDEs on arbitrary geometries. iFOL is a physics-informed automatic encoder-decoder that combines three state-of-the-art techniques: conditional neural fields for spatial encoding, second-order meta-learning for PDE encoding, and a physics-informed loss function inspired by the original FOL concept. This operator learning technique is capable of providing both a precise continuous mapping between parameter and solution spaces and a reliable estimation of the operator’s Jacobian. The computed Jacobian can be directly utilized for sensitivity analysis and gradient-based optimization without incurring additional computational costs.

The applicability of the methodology is tested on several classes of PDEs, with special attention to problems in computational mechanics and materials engineering, namely hyperelasticity, linear elasticity, nonlinear thermal diffusion, and the Allen-Cahn equation. We address stationary and transient problems in 2D and 3D, including both regular and complex geometries. The variety of examples is intended to cover most potential applications. In all cases, iFOL is able to parametrically learn the solution and predict unseen test samples, even beyond the training regime, demonstrating its generalization capability to new scenarios without any retraining.

It is worth highlighting that no labeled data is used throughout this work. Instead, the PDEs in the weighted residual form or in their energy counterparts are directly defined as loss functions. This approach improves efficiency, as no automatic differentiation operator is required to construct the loss terms. Moreover, it allows for the direct integration of residuals from other numerical solvers, such as FEM or spectral methods like FFT. Finally, since the network learns the equation parametrically for a class of problems, the inference time of iFOL can be highly competitive with direct numerical solvers like FEM. The speed-up factor is not only problem-dependent but also resolution-dependent and influenced by implementation details, making it difficult to provide a universal comparison, especially considering differences in hardware setups for DL-based solvers and classical numerical methods. In specific cases and under comparable conditions, we observe at least a 100×\times× speed-up for nonlinear problems, which can further increase for higher resolutions.

Outlook and future research direction Physics-informed neural operators, and more specifically the iFOL framework, offer significant potential for the next generation of numerical solvers by leveraging their ability to learn solutions parametrically. A natural next step is to apply the introduced methodologies to more complex multiphysics problems, where multiple physical processes influence each other through one- or two-way coupling. In such scenarios, classical solvers often struggle and require staggered approaches. Moreover, in this work, we primarily rely on a single input parameter space at a time. A natural extension would be to incorporate multiple input parameter spaces, a direction that has been explored to some extent by other authors, primarily using data-driven operator learning techniques. Finally, since we have access to meaningful and reliable sensitivity information independently of the number of objectives, gradient-based optimization becomes a compelling approach. This capability has significant potential for efficiently solving complex optimization and inverse problems, even in time-dependent settings, while maintaining reasonable computational and implementation costs.

Code Availability

The implementation of iFOL and the examples presented in this study will be made publicly available under the FOL framework repository on GitHub at https://github.com/RezaNajian/FOL, following the publication of this work.

Acknowledgments

The authors would like to thank the Deutsche Forschungsgemeinschaft (DFG) for the funding support provided to develop the present work in the Cluster of Excellence Project ’Internet of Production’ (project: 390621612). The authors also acknowledge the financial support of SFB 1120 B07 - Mehrskalige thermomechanische Simulation der fest-flüssig Interaktionen bei der Erstarrung (B07) (260070971) (260070971). Yusuke Yamazaki acknowledges the financial support from JST SPRING (JPMJSP2123). Mayu Muramatsu acknowledges the financial support from the JSPS KAKENHI Grant (22H01367).

Authors contributions

R.N.A.: Conceptualization, Methodology, Software, Writing - Review & Editing. Y.Y.: Software, Writing - Review & Editing. K.T.: Software. M.M.: Funding, Review & Editing. M.A.: Funding, Review & Editing. S.R.: Supervision, Methodology, Software, Writing - Review & Editing.

Appendix A Review on physical formulations and their discretization using FEM

A.1 Mechanical problem

Here, we summarize the mechanical problem where the position of material points is denoted by 𝑿T=[x,y,z]superscript𝑿𝑇𝑥𝑦𝑧\bm{X}^{T}=[x,~{}y,~{}z]bold_italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = [ italic_x , italic_y , italic_z ]. We denote the displacement components by U𝑈Uitalic_U, V𝑉Vitalic_V, and W𝑊Witalic_W in the x𝑥xitalic_x, y𝑦yitalic_y, and z𝑧zitalic_z directions, respectively. The kinematic relation defines the strain tensor 𝜺𝜺\bm{\varepsilon}bold_italic_ε in terms of the deformation vector 𝑼T=[U,V,W]superscript𝑼𝑇𝑈𝑉𝑊\bm{U}^{T}=[U,~{}V,~{}W]bold_italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = [ italic_U , italic_V , italic_W ] and reads:

𝜺𝜺\displaystyle\bm{\varepsilon}\,bold_italic_ε =sym(grad(𝑼))=s𝑼=12(𝑼+𝑼T),absentsymgrad𝑼superscript𝑠𝑼12𝑼superscript𝑼𝑇\displaystyle=\,\text{sym}(\text{grad}(\bm{U}))=\nabla^{s}\bm{U}=\dfrac{1}{2}% \left(\nabla\bm{U}+\nabla\bm{U}^{T}\right),= sym ( grad ( bold_italic_U ) ) = ∇ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT bold_italic_U = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ∇ bold_italic_U + ∇ bold_italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) , (13)
𝑭𝑭\displaystyle\bm{F}bold_italic_F =𝒙𝑿,𝑪=𝑭T𝑭,JF=det(𝑭).formulae-sequenceabsent𝒙𝑿formulae-sequence𝑪superscript𝑭𝑇𝑭subscript𝐽𝐹𝑭\displaystyle=\dfrac{\partial\bm{x}}{\partial\bm{X}},~{}\bm{C}=\bm{F}^{T}\bm{F% },~{}J_{F}=\det(\bm{F}).= divide start_ARG ∂ bold_italic_x end_ARG start_ARG ∂ bold_italic_X end_ARG , bold_italic_C = bold_italic_F start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_F , italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = roman_det ( bold_italic_F ) . (14)

Here, 𝑭𝑭\bm{F}bold_italic_F is the deformation gradient, and JFsubscript𝐽𝐹J_{F}italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is the Jacobian determinant, representing the local volume change. The elastic energy of the solid for linear and nonlinear (hyperelastic Neo-Hookean material) case reads as

ψlinsubscript𝜓𝑙𝑖𝑛\displaystyle\psi_{lin}italic_ψ start_POSTSUBSCRIPT italic_l italic_i italic_n end_POSTSUBSCRIPT =λ2(tr𝜺)2+μtr(𝜺2),absent𝜆2superscripttr𝜺2𝜇trsuperscript𝜺2\displaystyle=\frac{\lambda}{2}(\text{tr}\bm{\varepsilon})^{2}+\mu\,\text{tr}(% \bm{\varepsilon}^{2}),= divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG ( tr bold_italic_ε ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ tr ( bold_italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (15)
ψnonlinsubscript𝜓𝑛𝑜𝑛𝑙𝑖𝑛\displaystyle\psi_{nonlin}italic_ψ start_POSTSUBSCRIPT italic_n italic_o italic_n italic_l italic_i italic_n end_POSTSUBSCRIPT =μ(𝑿)2(I1¯3)κ(𝑿)4(JF212lnJF).absent𝜇𝑿2¯subscript𝐼13𝜅𝑿4superscriptsubscript𝐽𝐹212subscript𝐽𝐹\displaystyle=\frac{\mu(\bm{X})}{2}(\bar{I_{1}}-3)-\frac{\kappa(\bm{X})}{4}(J_% {F}^{2}-1-2\ln J_{F}).= divide start_ARG italic_μ ( bold_italic_X ) end_ARG start_ARG 2 end_ARG ( over¯ start_ARG italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG - 3 ) - divide start_ARG italic_κ ( bold_italic_X ) end_ARG start_ARG 4 end_ARG ( italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 - 2 roman_ln italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) . (16)

The Lamé constants are denoted by λ𝜆\lambdaitalic_λ and μ𝜇\muitalic_μ, and tr(𝜺)tr𝜺\text{tr}(\bm{\varepsilon})tr ( bold_italic_ε ) is the trace of the strain tensor, representing the volumetric strain. I¯1=tr(𝐂¯)subscript¯𝐼1tr¯𝐂\bar{I}_{1}=\text{tr}(\mathbf{\bar{C}})over¯ start_ARG italic_I end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = tr ( over¯ start_ARG bold_C end_ARG ) is the first invariant of the isochoric right Cauchy-Green deformation tensor 𝐂¯=𝐉2/3𝐂¯𝐂superscript𝐉23𝐂\mathbf{\bar{C}}=\mathbf{J}^{-2/3}\mathbf{C}over¯ start_ARG bold_C end_ARG = bold_J start_POSTSUPERSCRIPT - 2 / 3 end_POSTSUPERSCRIPT bold_C where 𝐂𝐂\mathbf{C}bold_C is the right Cauchy-Green tensor. We then define the Cauchy stress tensor 𝝈=ψlinϵ=𝜺𝝈subscript𝜓𝑙𝑖𝑛italic-ϵ𝜺\bm{\sigma}=\frac{\partial\psi_{lin}}{\partial\mathbf{\epsilon}}={\mathbb{C}}~% {}{\bm{\varepsilon}}bold_italic_σ = divide start_ARG ∂ italic_ψ start_POSTSUBSCRIPT italic_l italic_i italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ϵ end_ARG = blackboard_C bold_italic_ε and the Piola-Kirchhoff stress 𝑷=ψnonlin𝐅𝑷subscript𝜓𝑛𝑜𝑛𝑙𝑖𝑛𝐅\bm{P}=\frac{\partial\psi_{nonlin}}{\partial\mathbf{F}}bold_italic_P = divide start_ARG ∂ italic_ψ start_POSTSUBSCRIPT italic_n italic_o italic_n italic_l italic_i italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ bold_F end_ARG, which are obtained by differentiating the strain energy function. Here (λ,μ)=λ𝐈𝐈+2μ𝕀s𝜆𝜇tensor-product𝜆𝐈𝐈2𝜇superscript𝕀𝑠\mathbb{C}(\lambda,\mu)=\lambda~{}\mathbf{I}\otimes\mathbf{I}+2\mu~{}\mathbb{I% }^{s}blackboard_C ( italic_λ , italic_μ ) = italic_λ bold_I ⊗ bold_I + 2 italic_μ blackboard_I start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT or (E,ν)𝐸𝜈\mathbb{C}(E,\nu)blackboard_C ( italic_E , italic_ν ) is the fourth-order elasticity tensor by defining 𝐈𝐈\mathbf{I}bold_I as the second-order identity tensor and 𝕀ssuperscript𝕀𝑠\mathbb{I}^{s}blackboard_I start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT as the symmetric fourth-order identity tensor. Finally, the mechanical equilibrium in the absence of body force, as well as the Dirichlet and Neumann boundary conditions, are written as:

div(𝝈)=div(s𝑼)div𝝈divsuperscript𝑠𝑼\displaystyle\text{div}({\bm{\sigma}})=\text{div}(\mathbb{C}\nabla^{s}\bm{U})\ div ( bold_italic_σ ) = div ( blackboard_C ∇ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT bold_italic_U ) =𝟎inΩabsent0inΩ\displaystyle=\bm{0}~{}~{}~{}~{}\text{in}~{}~{}~{}\Omega= bold_0 in roman_Ω (17)
𝑼𝑼\displaystyle\bm{U}bold_italic_U =𝑼¯onΓDabsent¯𝑼onsubscriptΓ𝐷\displaystyle=\bar{\bm{U}}~{}~{}~{}\text{on}~{}~{}\Gamma_{D}= over¯ start_ARG bold_italic_U end_ARG on roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT (18)
𝝈𝒏=𝒕𝝈𝒏𝒕\displaystyle\bm{\sigma}\cdot\bm{n}=\bm{t}bold_italic_σ ⋅ bold_italic_n = bold_italic_t =𝒕¯onΓNabsent¯𝒕onsubscriptΓ𝑁\displaystyle=\bar{\bm{t}}~{}~{}~{}~{}\text{on}~{}~{}\Gamma_{N}= over¯ start_ARG bold_italic_t end_ARG on roman_Γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT (19)

In the above relations, ΩΩ\Omegaroman_Ω and ΓΓ\Gammaroman_Γ denote the material points in the body and on the boundary area, respectively. Moreover, the Dirichlet and Neumann boundary conditions are introduced in Eq. 18 and Eq. 19, respectively. In the case of hyperelasticity, we then have Div(𝑷)=𝟎Div𝑷0\text{Div}({\bm{P}})=\bm{0}Div ( bold_italic_P ) = bold_0 instead of Eq. 17.

A.2 Thermal problem

The heat equation describes how the temperature T(𝒙,t)𝑇𝒙𝑡T(\bm{x},t)italic_T ( bold_italic_x , italic_t ), with 𝒙𝒙\bm{x}bold_italic_x representing position and t𝑡titalic_t the time, evolves within the domain 𝒙Ω𝒙Ω\bm{x}\in\Omegabold_italic_x ∈ roman_Ω over time. Let the heat source be Q:Ω×(0,τ):𝑄Ω0𝜏Q:\Omega\times(0,\tau)\to\mathbb{R}italic_Q : roman_Ω × ( 0 , italic_τ ) → blackboard_R, the boundary temperature TD(𝒙):ΓD×(0,τ):subscript𝑇𝐷𝒙subscriptΓ𝐷0𝜏T_{D}(\bm{x}):\Gamma_{D}\times(0,\tau)\to\mathbb{R}italic_T start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_italic_x ) : roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT × ( 0 , italic_τ ) → blackboard_R, and the boundary heat flux be qN:ΓN×(0,τ):subscript𝑞𝑁subscriptΓ𝑁0𝜏q_{N}:\Gamma_{N}\times(0,\tau)\to\mathbb{R}italic_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT : roman_Γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT × ( 0 , italic_τ ) → blackboard_R, where ΓDsubscriptΓ𝐷\Gamma_{D}roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and ΓNsubscriptΓ𝑁\Gamma_{N}roman_Γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT are the regions where the Dirichlet and Neumann boundary conditions are applied, respectively. Here, t(0,τ)𝑡0𝜏t\in(0,\tau)italic_t ∈ ( 0 , italic_τ ) represents the temporal domain, with τ𝜏\tauitalic_τ denoting the end time. The strong form of the heat equation is given as:

cρT(𝒙,t)t𝑐𝜌𝑇𝒙𝑡𝑡\displaystyle c\rho\frac{\partial T(\bm{x},t)}{\partial t}italic_c italic_ρ divide start_ARG ∂ italic_T ( bold_italic_x , italic_t ) end_ARG start_ARG ∂ italic_t end_ARG =div(𝒒)+QinΩ×(0,τ),absentdiv𝒒𝑄inΩ0𝜏\displaystyle=-\text{div}(\bm{q})+Q\quad\text{in}\ \Omega\times(0,\tau),= - div ( bold_italic_q ) + italic_Q in roman_Ω × ( 0 , italic_τ ) , (20)
T(𝒙,t)𝑇𝒙𝑡\displaystyle T(\bm{x},t)italic_T ( bold_italic_x , italic_t ) =TD(𝒙)onΓD×(0,τ),absentsubscript𝑇𝐷𝒙onsubscriptΓ𝐷0𝜏\displaystyle=T_{D}(\bm{x})\quad\text{on}\ \Gamma_{D}\times(0,\tau),= italic_T start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_italic_x ) on roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT × ( 0 , italic_τ ) , (21)
T(𝒙,t)𝒏𝑇𝒙𝑡𝒏\displaystyle\nabla T(\bm{x},t)\cdot\bm{n}∇ italic_T ( bold_italic_x , italic_t ) ⋅ bold_italic_n =qN(𝒙)onΓN×(0,τ),absentsubscript𝑞𝑁𝒙onsubscriptΓ𝑁0𝜏\displaystyle=q_{N}(\bm{x})\quad\text{on}\ \Gamma_{N}\times(0,\tau),= italic_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( bold_italic_x ) on roman_Γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT × ( 0 , italic_τ ) , (22)
T(𝒙,0)𝑇𝒙0\displaystyle T(\bm{x},0)italic_T ( bold_italic_x , 0 ) =T0(𝒙)inΩ.absentsubscript𝑇0𝒙inΩ\displaystyle=T_{0}(\bm{x})\quad\text{in}\ \Omega.= italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) in roman_Ω . (23)

In the above equations, 𝒏𝒏\bm{n}bold_italic_n is the outward normal vector, c𝑐citalic_c is the specific heat capacity, and ρ𝜌\rhoitalic_ρ is the density. The heat flux is given by 𝒒=KT𝒒𝐾𝑇\bm{q}=-K\nabla Tbold_italic_q = - italic_K ∇ italic_T, where K=k0(𝒙)(1+αT)𝐾subscript𝑘0𝒙1𝛼𝑇K=k_{0}(\bm{x})(1+\alpha T)italic_K = italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) ( 1 + italic_α italic_T ) represents the thermal conductivity, which depends on both position and temperature.

The above set of equations forms the basis for the results and studies presented in Section 4.4. For the stationary thermal problem (see Section 4.3), the time derivative in Eq. 20 naturally vanishes.

A.3 Allen-Cahn equation

The Allen-Cahn equation describes the evolution of an order parameter ϕ(𝒙,t)italic-ϕ𝒙𝑡\phi(\bm{x},t)italic_ϕ ( bold_italic_x , italic_t ), where 𝒙𝒙\bm{x}bold_italic_x represents position and t𝑡titalic_t is time, within the domain 𝒙Ω𝒙Ω\bm{x}\in\Omegabold_italic_x ∈ roman_Ω. Let the source term be S:Ω×(0,τ):𝑆Ω0𝜏S:\Omega\times(0,\tau)\to\mathbb{R}italic_S : roman_Ω × ( 0 , italic_τ ) → blackboard_R, the boundary value be ϕD(𝒙):ΓD×(0,τ):subscriptitalic-ϕ𝐷𝒙subscriptΓ𝐷0𝜏\phi_{D}(\bm{x}):\Gamma_{D}\times(0,\tau)\to\mathbb{R}italic_ϕ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_italic_x ) : roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT × ( 0 , italic_τ ) → blackboard_R, and the boundary flux be qN:ΓN×(0,τ):subscript𝑞𝑁subscriptΓ𝑁0𝜏q_{N}:\Gamma_{N}\times(0,\tau)\to\mathbb{R}italic_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT : roman_Γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT × ( 0 , italic_τ ) → blackboard_R, where ΓDsubscriptΓ𝐷\Gamma_{D}roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT and ΓNsubscriptΓ𝑁\Gamma_{N}roman_Γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT are the regions where the Dirichlet and Neumann boundary conditions are applied, respectively. Here, t(0,τ)𝑡0𝜏t\in(0,\tau)italic_t ∈ ( 0 , italic_τ ) represents the temporal domain, with τ𝜏\tauitalic_τ denoting the end time. The strong form of the Allen-Cahn equation is given as:

ϕ(𝒙,t)titalic-ϕ𝒙𝑡𝑡\displaystyle\frac{\partial\phi(\bm{x},t)}{\partial t}divide start_ARG ∂ italic_ϕ ( bold_italic_x , italic_t ) end_ARG start_ARG ∂ italic_t end_ARG =M(δFδϕ)inΩ×(0,τ),absent𝑀𝛿𝐹𝛿italic-ϕinΩ0𝜏\displaystyle=-M\left(\frac{\delta F}{\delta\phi}\right)\quad\quad\quad\quad% \quad\quad\text{in}\ \Omega\times(0,\tau),= - italic_M ( divide start_ARG italic_δ italic_F end_ARG start_ARG italic_δ italic_ϕ end_ARG ) in roman_Ω × ( 0 , italic_τ ) , (24)
F𝐹\displaystyle Fitalic_F =Ω(ϵ2|ϕ|2+f(ϕ))𝑑ΩinΩ×(0,τ),absentsubscriptΩitalic-ϵ2superscriptitalic-ϕ2𝑓italic-ϕdifferential-dΩinΩ0𝜏\displaystyle=\int_{\Omega}\left(\frac{\epsilon}{2}|\nabla\phi|^{2}+f(\phi)% \right)d\Omega~{}~{}~{}~{}\text{in}~{}~{}~{}\Omega\times(0,\tau),= ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT ( divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG | ∇ italic_ϕ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_f ( italic_ϕ ) ) italic_d roman_Ω in roman_Ω × ( 0 , italic_τ ) , (25)
ϕ(𝒙,t)italic-ϕ𝒙𝑡\displaystyle\phi(\bm{x},t)italic_ϕ ( bold_italic_x , italic_t ) =ϕD(𝒙)onΓD×(0,τ),absentsubscriptitalic-ϕ𝐷𝒙onsubscriptΓ𝐷0𝜏\displaystyle=\phi_{D}(\bm{x})~{}~{}\quad\quad\quad\quad\quad\quad\quad\quad% \text{on}\ \Gamma_{D}\times(0,\tau),= italic_ϕ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_italic_x ) on roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT × ( 0 , italic_τ ) , (26)
ϕ(𝒙,t)𝒏italic-ϕ𝒙𝑡𝒏\displaystyle\nabla\phi(\bm{x},t)\cdot\bm{n}∇ italic_ϕ ( bold_italic_x , italic_t ) ⋅ bold_italic_n =ϕN(𝒙)onΓN×(0,τ),absentsubscriptitalic-ϕ𝑁𝒙onsubscriptΓ𝑁0𝜏\displaystyle=\phi_{N}(\bm{x})~{}~{}\quad\quad\quad\quad\quad\quad\quad\quad% \text{on}\ \Gamma_{N}\times(0,\tau),= italic_ϕ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( bold_italic_x ) on roman_Γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT × ( 0 , italic_τ ) , (27)
ϕ(𝒙,0)italic-ϕ𝒙0\displaystyle\phi(\bm{x},0)italic_ϕ ( bold_italic_x , 0 ) =ϕ0(𝒙)inΩ.absentsubscriptitalic-ϕ0𝒙inΩ\displaystyle=\phi_{0}(\bm{x})~{}~{}~{}\quad\quad\quad\quad\quad\quad\quad% \quad\text{in}\ \Omega.= italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_x ) in roman_Ω . (28)

In the above equations, 𝒏𝒏\bm{n}bold_italic_n is the outward normal vector, M𝑀Mitalic_M is the mobility coefficient, and δFδϕ𝛿𝐹𝛿italic-ϕ\frac{\delta F}{\delta\phi}divide start_ARG italic_δ italic_F end_ARG start_ARG italic_δ italic_ϕ end_ARG represents the functional derivative of the free energy, typically given by:

δFδϕ=ϵ22ϕ+f(ϕ),𝛿𝐹𝛿italic-ϕsuperscriptitalic-ϵ2superscript2italic-ϕsuperscript𝑓italic-ϕ\displaystyle\frac{\delta F}{\delta\phi}=-\epsilon^{2}\nabla^{2}\phi+f^{\prime% }(\phi),divide start_ARG italic_δ italic_F end_ARG start_ARG italic_δ italic_ϕ end_ARG = - italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϕ + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ϕ ) , (29)

where ϵitalic-ϵ\epsilonitalic_ϵ is a small parameter controlling the interface thickness and f(ϕ)=(ϕ21)ϕsuperscript𝑓italic-ϕsuperscriptitalic-ϕ21italic-ϕf^{\prime}(\phi)=(\phi^{2}-1)\phiitalic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ϕ ) = ( italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) italic_ϕ denotes the derivative of the double-well potential enforcing phase separation. The above set of equations forms the basis for the results and studies presented in Section 4.5. The Allen-Cahn equation is a fundamental phase-field model and plays a crucial role in modeling microstructure evolution in alloys Allen and Cahn (1979); Karma and Rappel (1998). The equation captures the dynamics of interfaces and transitions between phases (including applications in phase-field fracture Bourdin et al. (2000); Rezaei et al. (2021)). Its applications extend to image processing, tumor growth modeling, and topology optimization.

A.4 A short review on FEM formulation

Next, we briefly summarize the FEM discretization techniques, particularly for a 2D quadrilateral element, to clarify the loss terms in the iFOL formulation, especially for implementation purposes. The details provided here are fairly standard, and the extension to other types of complex element formulations with different polynomial orders should be straightforward. Readers are also encouraged to see the standard procedure in any finite element subroutine Hughes (2000); Bathe (1996). The corresponding linear shape functions 𝑵𝑵\bm{N}bold_italic_N and the deformation matrix 𝑩𝑩\bm{B}bold_italic_B used to discretize the mechanical weak form in the current work.

𝑵=[N10N400N10N4],𝑩=[N1,x0N4,x00N1,y0N4,yN1,yN1,xN4,yN4,x].formulae-sequence𝑵matrixsubscript𝑁10subscript𝑁400subscript𝑁10subscript𝑁4𝑩matrixsubscript𝑁1𝑥0subscript𝑁4𝑥00subscript𝑁1𝑦0subscript𝑁4𝑦subscript𝑁1𝑦subscript𝑁1𝑥subscript𝑁4𝑦subscript𝑁4𝑥\displaystyle\bm{N}=\begin{bmatrix}N_{1}&0&\dots&N_{4}&0\\ 0&N_{1}&\dots&0&N_{4}\end{bmatrix},~{}\bm{B}=\begin{bmatrix}N_{1,x}&0&\dots&N_% {4,x}&0\\ 0&N_{1,y}&\dots&0&N_{4,y}\\ N_{1,y}&N_{1,x}&\dots&N_{4,y}&N_{4,x}\end{bmatrix}.bold_italic_N = [ start_ARG start_ROW start_CELL italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL italic_N start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL 0 end_CELL start_CELL italic_N start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , bold_italic_B = [ start_ARG start_ROW start_CELL italic_N start_POSTSUBSCRIPT 1 , italic_x end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL italic_N start_POSTSUBSCRIPT 4 , italic_x end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_N start_POSTSUBSCRIPT 1 , italic_y end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL 0 end_CELL start_CELL italic_N start_POSTSUBSCRIPT 4 , italic_y end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_N start_POSTSUBSCRIPT 1 , italic_y end_POSTSUBSCRIPT end_CELL start_CELL italic_N start_POSTSUBSCRIPT 1 , italic_x end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_N start_POSTSUBSCRIPT 4 , italic_y end_POSTSUBSCRIPT end_CELL start_CELL italic_N start_POSTSUBSCRIPT 4 , italic_x end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (30)

The notation Ni,xsubscript𝑁𝑖𝑥N_{i,x}italic_N start_POSTSUBSCRIPT italic_i , italic_x end_POSTSUBSCRIPT and Ni,ysubscript𝑁𝑖𝑦N_{i,y}italic_N start_POSTSUBSCRIPT italic_i , italic_y end_POSTSUBSCRIPT represent the derivatives of the shape function Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with respect to the coordinates x𝑥xitalic_x and y𝑦yitalic_y, respectively. To compute these derivatives, we utilize the Jacobian matrix

𝑱=𝑿/𝝃=[xξyξxηyη].𝑱𝑿𝝃matrix𝑥𝜉𝑦𝜉𝑥𝜂𝑦𝜂\displaystyle\bm{J}=\partial\bm{X}/\partial\bm{\xi}=\begin{bmatrix}\frac{% \partial x}{\partial\xi}&\frac{\partial y}{\partial\xi}\\ \frac{\partial x}{\partial\eta}&\frac{\partial y}{\partial\eta}\end{bmatrix}.bold_italic_J = ∂ bold_italic_X / ∂ bold_italic_ξ = [ start_ARG start_ROW start_CELL divide start_ARG ∂ italic_x end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL divide start_ARG ∂ italic_y end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_x end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL divide start_ARG ∂ italic_y end_ARG start_ARG ∂ italic_η end_ARG end_CELL end_ROW end_ARG ] . (31)

Here, 𝑿=[x,y]𝑿𝑥𝑦\bm{X}=[x,y]bold_italic_X = [ italic_x , italic_y ] and 𝝃=[ξ,η]𝝃𝜉𝜂\bm{\xi}=[\xi,\eta]bold_italic_ξ = [ italic_ξ , italic_η ] represent the physical and parent coordinate systems, respectively. It is worth mentioning that this determinant remains constant for parallelogram-shaped elements, eliminating the need to evaluate this term at each integration point. Finally, for the B matrix, we have

𝑩=𝑱1[N1ξ0N2ξ0N3ξ0N4ξ00N1η0N2η0N3η0N4ηN1ηN1ξN2ηN2ξN3ηN3ξN4ηN4ξ].𝑩superscript𝑱1matrixsubscript𝑁1𝜉0subscript𝑁2𝜉0subscript𝑁3𝜉0subscript𝑁4𝜉00subscript𝑁1𝜂0subscript𝑁2𝜂0subscript𝑁3𝜂0subscript𝑁4𝜂subscript𝑁1𝜂subscript𝑁1𝜉subscript𝑁2𝜂subscript𝑁2𝜉subscript𝑁3𝜂subscript𝑁3𝜉subscript𝑁4𝜂subscript𝑁4𝜉\displaystyle\bm{B}=\bm{J}^{-1}\begin{bmatrix}\frac{\partial N_{1}}{\partial% \xi}&0&\frac{\partial N_{2}}{\partial\xi}&0&\frac{\partial N_{3}}{\partial\xi}% &0&\frac{\partial N_{4}}{\partial\xi}&0\\ 0&\frac{\partial N_{1}}{\partial\eta}&0&\frac{\partial N_{2}}{\partial\eta}&0&% \frac{\partial N_{3}}{\partial\eta}&0&\frac{\partial N_{4}}{\partial\eta}\\ \frac{\partial N_{1}}{\partial\eta}&\frac{\partial N_{1}}{\partial\xi}&\frac{% \partial N_{2}}{\partial\eta}&\frac{\partial N_{2}}{\partial\xi}&\frac{% \partial N_{3}}{\partial\eta}&\frac{\partial N_{3}}{\partial\xi}&\frac{% \partial N_{4}}{\partial\eta}&\frac{\partial N_{4}}{\partial\xi}\end{bmatrix}.bold_italic_B = bold_italic_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL 0 end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL 0 end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL 0 end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL 0 end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL 0 end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL 0 end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_η end_ARG end_CELL start_CELL divide start_ARG ∂ italic_N start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_ξ end_ARG end_CELL end_ROW end_ARG ] . (32)

For each element the deformation field and stress tensor 𝝈^^𝝈\hat{\bm{\sigma}}over^ start_ARG bold_italic_σ end_ARG are approximated as

U=Niui=𝑵𝑼e,V=NiVi=𝑵𝑽e,𝝈^formulae-sequence𝑈subscript𝑁𝑖subscript𝑢𝑖𝑵subscript𝑼𝑒𝑉subscript𝑁𝑖subscript𝑉𝑖𝑵subscript𝑽𝑒^𝝈\displaystyle U=\sum{N}_{i}u_{i}=\bm{N}\bm{U}_{e},~{}V=\sum{N}_{i}V_{i}=\bm{N}% \bm{V}_{e},~{}\hat{\bm{\sigma}}italic_U = ∑ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_N bold_italic_U start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_V = ∑ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_N bold_italic_V start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , over^ start_ARG bold_italic_σ end_ARG =𝑪𝑩𝑼e,absent𝑪𝑩subscript𝑼𝑒\displaystyle=\bm{C}\bm{B}\bm{U}_{e},= bold_italic_C bold_italic_B bold_italic_U start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , (33)

Here, 𝑼eT=[u1u4]subscriptsuperscript𝑼𝑇𝑒delimited-[]matrixsubscript𝑢1subscript𝑢4\bm{U}^{T}_{e}=\left[\begin{matrix}u_{1}&\cdots&u_{4}\\ \end{matrix}\right]bold_italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_u start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] and 𝑽esubscript𝑽𝑒\bm{V}_{e}bold_italic_V start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT are the nodal values of the deformation field of element e𝑒eitalic_e in the x𝑥xitalic_x and y𝑦yitalic_y directions, respectively. The same procedure applies to both the thermal problem and the Allen-Cahn equation. However, one must account for additional time derivatives. We summarize the implemented loss terms in Table 3.

Table 3: Summary of the FEM-based residuals of the investigated problems.
PDE type FEM residual / iFOL loss function
Stationary nonlinear mechanical 𝑪(x,y)𝑼(x,y)𝑪𝑥𝑦𝑼𝑥𝑦\bm{C}(x,y)\to\bm{U}(x,y)bold_italic_C ( italic_x , italic_y ) → bold_italic_U ( italic_x , italic_y ) 𝐔θ,γ𝑼esubscript𝐔𝜃𝛾subscript𝑼𝑒\mathbf{U}_{\theta,\gamma}\equiv\bm{U}_{e}bold_U start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ≡ bold_italic_U start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, (Sec. 4.1) re=Ωe𝑺:δ𝑬dVΩe(Δ𝑺:δ𝑬+𝑺:Δδ𝑬)dV+Ωtδ𝑼.𝑻dAr_{e}=-\int_{\Omega_{e}}\bm{S}\colon\delta\bm{E}dV-\int_{\Omega_{e}}(\Delta\bm% {S}\colon\delta\bm{E}+\bm{S}\colon\Delta\delta\bm{E})dV+\int_{\Omega_{t}}% \delta\bm{U}.~{}\bm{T}dAitalic_r start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = - ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_S : italic_δ bold_italic_E italic_d italic_V - ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Δ bold_italic_S : italic_δ bold_italic_E + bold_italic_S : roman_Δ italic_δ bold_italic_E ) italic_d italic_V + ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ bold_italic_U . bold_italic_T italic_d italic_A
iFOL=e=1nelk=1nint(μ(𝝃k)2(tr(𝑪¯(𝑼𝒆))3)+κ(𝝃k)4(JF2(𝑼𝒆)2lnJF(𝑼𝒆)1))det(𝑱)wksubscript𝑖𝐹𝑂𝐿superscriptsubscript𝑒1subscript𝑛𝑒𝑙superscriptsubscript𝑘1subscript𝑛𝑖𝑛𝑡𝜇subscript𝝃𝑘2tr¯𝑪subscript𝑼𝒆3𝜅subscript𝝃𝑘4superscriptsubscript𝐽𝐹2subscript𝑼𝒆2subscript𝐽𝐹subscript𝑼𝒆1det𝑱subscript𝑤𝑘\mathcal{L}_{iFOL}=\sum_{e=1}^{n_{el}}\sum_{k=1}^{n_{int}}~{}\left(\frac{\mu(% \bm{\xi}_{k})}{2}~{}\left(\text{tr}(\bar{\bm{C}}(\bm{U_{e}}))-3\right)~{}+~{}% \frac{\kappa(\bm{\xi}_{k})}{4}~{}(J_{F}^{2}(\bm{U_{e}})-2\ln J_{F}(\bm{U_{e}})% -1)\right)~{}\text{det}(\bm{J})~{}w_{k}caligraphic_L start_POSTSUBSCRIPT italic_i italic_F italic_O italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( divide start_ARG italic_μ ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG 2 end_ARG ( tr ( over¯ start_ARG bold_italic_C end_ARG ( bold_italic_U start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT ) ) - 3 ) + divide start_ARG italic_κ ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG 4 end_ARG ( italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_italic_U start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT ) - 2 roman_ln italic_J start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( bold_italic_U start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT ) - 1 ) ) det ( bold_italic_J ) italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
\hdashline Stationary linear mechanical 𝑼b𝑼(𝒙)subscript𝑼𝑏𝑼𝒙\bm{U}_{b}\to\bm{U}(\bm{x})bold_italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT → bold_italic_U ( bold_italic_x ) 𝐔θ,γ𝑼esubscript𝐔𝜃𝛾subscript𝑼𝑒\mathbf{U}_{\theta,\gamma}\equiv\bm{U}_{e}bold_U start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ≡ bold_italic_U start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, (Sec. 4.2) re=Ωe[𝑩]T𝑪[𝑩]𝑼e𝑑V+Ωt[𝑵]T𝑪[𝑩𝑼e]T𝒏𝑑Ssubscript𝑟𝑒subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑩𝑇𝑪delimited-[]𝑩subscript𝑼𝑒differential-d𝑉subscriptsubscriptΩ𝑡superscriptdelimited-[]𝑵𝑇𝑪superscriptdelimited-[]𝑩subscript𝑼𝑒𝑇𝒏differential-d𝑆r_{e}=-\int_{\Omega_{e}}[\bm{B}]^{T}\bm{C}~{}[\bm{B}]\bm{U}_{e}~{}dV+\int_{% \Omega_{t}}[\bm{N}]^{T}\bm{C}~{}[\bm{B}\bm{U}_{e}]^{T}~{}\bm{n}~{}dSitalic_r start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = - ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_B ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_C [ bold_italic_B ] bold_italic_U start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d italic_V + ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_C [ bold_italic_B bold_italic_U start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_n italic_d italic_S
iFOL=e=1nel𝑼eT[𝐦𝐢𝐬𝐬𝐢𝐧𝐠(𝒌=𝟏𝒏𝒊𝒏𝒕𝒘𝒌𝟐det(𝑱)[𝑩(𝝃𝒌)]𝑻𝑪𝒆(𝝃𝒌)𝑩(𝝃𝒌))𝑼e].subscript𝑖𝐹𝑂𝐿superscriptsubscript𝑒1subscript𝑛𝑒𝑙subscriptsuperscript𝑼𝑇𝑒delimited-[]𝐦𝐢𝐬𝐬𝐢𝐧𝐠superscriptsubscript𝒌1subscript𝒏𝒊𝒏𝒕subscript𝒘𝒌2det𝑱superscriptdelimited-[]𝑩subscript𝝃𝒌𝑻subscript𝑪𝒆subscript𝝃𝒌𝑩subscript𝝃𝒌subscript𝑼𝑒\mathcal{L}_{iFOL}=\sum_{e=1}^{n_{el}}\bm{U}^{T}_{e}\left[\bm{\left missing}(% \sum_{k=1}^{n_{int}}\dfrac{w_{k}}{2}~{}\text{det}(\bm{J})~{}[\bm{B}(\bm{\xi}_{% k})]^{T}\bm{C}_{e}(\bm{\xi}_{k})\bm{B}(\bm{\xi}_{k})\right)~{}\bm{U}_{e}\right].caligraphic_L start_POSTSUBSCRIPT italic_i italic_F italic_O italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT [ bold_missing bold_( bold_∑ start_POSTSUBSCRIPT bold_italic_k bold_= bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_n start_POSTSUBSCRIPT bold_italic_i bold_italic_n bold_italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG bold_italic_w start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT end_ARG start_ARG bold_2 end_ARG det bold_( bold_italic_J bold_) bold_[ bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) bold_] start_POSTSUPERSCRIPT bold_italic_T end_POSTSUPERSCRIPT bold_italic_C start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) ) bold_italic_U start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ] .
\hdashline Stationary nonlinear thermal K(x,y)T(x,y)𝐾𝑥𝑦𝑇𝑥𝑦K(x,y)\to T(x,y)italic_K ( italic_x , italic_y ) → italic_T ( italic_x , italic_y ) 𝐔θ,γTesubscript𝐔𝜃𝛾subscript𝑇𝑒\mathbf{U}_{\theta,\gamma}\equiv T_{e}bold_U start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ≡ italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, (Sec. 4.3) re=Ωe[𝑩]T𝑲(𝑻e)[𝑩]𝑻e𝑑V+Ωt[𝑵]T𝒕𝑑S+Ωe[𝑵]TQ𝑑Vsubscript𝑟𝑒subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑩𝑇𝑲subscript𝑻𝑒delimited-[]𝑩subscript𝑻𝑒differential-d𝑉subscriptsubscriptΩ𝑡superscriptdelimited-[]𝑵𝑇𝒕differential-d𝑆subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑵𝑇𝑄differential-d𝑉r_{e}=-\int_{\Omega_{e}}[\bm{B}]^{T}\bm{K}(\bm{T}_{e})~{}[\bm{B}]\bm{T}_{e}~{}% dV+\int_{\Omega_{t}}[\bm{N}]^{T}\bm{t}~{}dS+\int_{\Omega_{e}}[\bm{N}]^{T}Q~{}dVitalic_r start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = - ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_B ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_K ( bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ) [ bold_italic_B ] bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_d italic_V + ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_t italic_d italic_S + ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_d italic_V
iFOL=e=1nel[𝑻en+1]T[𝐦𝐢𝐬𝐬𝐢𝐧𝐠(𝒌=𝟏𝒏𝒊𝒏𝒕𝒘𝒌𝟐det(𝑱)[𝑩(𝝃𝒌)]𝑻𝑲(𝑻𝒆𝒏+𝟏,𝝃𝒌)𝑩(𝝃𝒌))𝑻en+1]subscript𝑖𝐹𝑂𝐿superscriptsubscript𝑒1subscript𝑛𝑒𝑙superscriptdelimited-[]superscriptsubscript𝑻𝑒𝑛1𝑇delimited-[]𝐦𝐢𝐬𝐬𝐢𝐧𝐠superscriptsubscript𝒌1subscript𝒏𝒊𝒏𝒕subscript𝒘𝒌2det𝑱superscriptdelimited-[]𝑩subscript𝝃𝒌𝑻𝑲superscriptsubscript𝑻𝒆𝒏1subscript𝝃𝒌𝑩subscript𝝃𝒌superscriptsubscript𝑻𝑒𝑛1\mathcal{L}_{iFOL}=\sum_{e=1}^{n_{el}}\left[\bm{T}_{e}^{n+1}\right]^{T}\left[% \bm{\left missing}(\sum_{k=1}^{n_{int}}\dfrac{w_{k}}{2}~{}\text{det}(\bm{J})~{% }[\bm{B}(\bm{\xi}_{k})]^{T}\bm{K}(\bm{T}_{e}^{n+1},\bm{\xi}_{k})\bm{B}(\bm{\xi% }_{k})\right)~{}\bm{T}_{e}^{n+1}\right]caligraphic_L start_POSTSUBSCRIPT italic_i italic_F italic_O italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ bold_missing bold_( bold_∑ start_POSTSUBSCRIPT bold_italic_k bold_= bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_n start_POSTSUBSCRIPT bold_italic_i bold_italic_n bold_italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG bold_italic_w start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT end_ARG start_ARG bold_2 end_ARG det bold_( bold_italic_J bold_) bold_[ bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) bold_] start_POSTSUPERSCRIPT bold_italic_T end_POSTSUPERSCRIPT bold_italic_K bold_( bold_italic_T start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_n bold_+ bold_1 end_POSTSUPERSCRIPT bold_, bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) ) bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ]
\hdashline Transient nonlinear thermal Tn(x,y)Tn+1(x,y)superscript𝑇𝑛𝑥𝑦superscript𝑇𝑛1𝑥𝑦T^{n}(x,y)\to T^{n+1}(x,y)italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x , italic_y ) → italic_T start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( italic_x , italic_y ) 𝐔θ,γTen+1subscript𝐔𝜃𝛾superscriptsubscript𝑇𝑒𝑛1\mathbf{U}_{\theta,\gamma}\equiv T_{e}^{n+1}bold_U start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ≡ italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT, (Sec. 4.4) re=Ωe[𝑩]T𝑲(𝑻en+1)[𝑩]𝑻en+1𝑑V+Ωt[𝑵]T𝒕𝑑S+Ωe[𝑵]TQ𝑑Vsubscript𝑟𝑒subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑩𝑇𝑲superscriptsubscript𝑻𝑒𝑛1delimited-[]𝑩superscriptsubscript𝑻𝑒𝑛1differential-d𝑉subscriptsubscriptΩ𝑡superscriptdelimited-[]𝑵𝑇𝒕differential-d𝑆subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑵𝑇𝑄differential-d𝑉r_{e}=-\int_{\Omega_{e}}[\bm{B}]^{T}\bm{K}(\bm{T}_{e}^{n+1})~{}[\bm{B}]\bm{T}_% {e}^{n+1}~{}dV+\int_{\Omega_{t}}[\bm{N}]^{T}\bm{t}~{}dS+\int_{\Omega_{e}}[\bm{% N}]^{T}Q~{}dVitalic_r start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = - ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_B ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_K ( bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) [ bold_italic_B ] bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT italic_d italic_V + ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_t italic_d italic_S + ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_d italic_V            Ωe[𝑵]Tρcp[𝑵]𝑻en+1𝑻enΔt𝑑VsubscriptsubscriptΩ𝑒superscriptdelimited-[]𝑵𝑇𝜌subscript𝑐𝑝delimited-[]𝑵superscriptsubscript𝑻𝑒𝑛1superscriptsubscript𝑻𝑒𝑛Δ𝑡differential-d𝑉-\int_{\Omega_{e}}[\bm{N}]^{T}\rho c_{p}~{}[\bm{N}]\frac{\bm{T}_{e}^{n+1}-\bm{% T}_{e}^{n}}{\Delta t}~{}dV- ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ρ italic_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_N ] divide start_ARG bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT - bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ italic_t end_ARG italic_d italic_V
iFOL=e=1nel[𝑻en+1]T[𝐦𝐢𝐬𝐬𝐢𝐧𝐠(𝒌=𝟏𝒏𝒊𝒏𝒕𝒘𝒌𝟐det(𝑱)[𝑩(𝝃𝒌)]𝑻𝑲(𝑻𝒆𝒏+𝟏,𝝃𝒌)𝑩(𝝃𝒌))𝑻en+1]subscript𝑖𝐹𝑂𝐿superscriptsubscript𝑒1subscript𝑛𝑒𝑙superscriptdelimited-[]superscriptsubscript𝑻𝑒𝑛1𝑇delimited-[]𝐦𝐢𝐬𝐬𝐢𝐧𝐠superscriptsubscript𝒌1subscript𝒏𝒊𝒏𝒕subscript𝒘𝒌2det𝑱superscriptdelimited-[]𝑩subscript𝝃𝒌𝑻𝑲superscriptsubscript𝑻𝒆𝒏1subscript𝝃𝒌𝑩subscript𝝃𝒌superscriptsubscript𝑻𝑒𝑛1\mathcal{L}_{iFOL}=\sum_{e=1}^{n_{el}}\left[\bm{T}_{e}^{n+1}\right]^{T}\left[% \bm{\left missing}(\sum_{k=1}^{n_{int}}\dfrac{w_{k}}{2}~{}\text{det}(\bm{J})~{% }[\bm{B}(\bm{\xi}_{k})]^{T}\bm{K}(\bm{T}_{e}^{n+1},\bm{\xi}_{k})\bm{B}(\bm{\xi% }_{k})\right)~{}\bm{T}_{e}^{n+1}\right]caligraphic_L start_POSTSUBSCRIPT italic_i italic_F italic_O italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ bold_missing bold_( bold_∑ start_POSTSUBSCRIPT bold_italic_k bold_= bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_n start_POSTSUBSCRIPT bold_italic_i bold_italic_n bold_italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG bold_italic_w start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT end_ARG start_ARG bold_2 end_ARG det bold_( bold_italic_J bold_) bold_[ bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) bold_] start_POSTSUPERSCRIPT bold_italic_T end_POSTSUPERSCRIPT bold_italic_K bold_( bold_italic_T start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_n bold_+ bold_1 end_POSTSUPERSCRIPT bold_, bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) ) bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ]                 + e=1nel[k=1nintwk2det(𝑱)ρcp[𝑵(𝝃k)]T(𝑵(𝝃k)𝑻en+1𝑵(𝝃k)𝑻en)2Δt].superscriptsubscript𝑒1subscript𝑛𝑒𝑙delimited-[]superscriptsubscript𝑘1subscript𝑛𝑖𝑛𝑡subscript𝑤𝑘2det𝑱𝜌subscript𝑐𝑝superscriptdelimited-[]𝑵subscript𝝃𝑘𝑇superscript𝑵subscript𝝃𝑘superscriptsubscript𝑻𝑒𝑛1𝑵subscript𝝃𝑘superscriptsubscript𝑻𝑒𝑛2Δ𝑡\sum_{e=1}^{n_{el}}\left[\bm{\sum}_{k=1}^{n_{int}}\dfrac{w_{k}}{2}~{}\text{det% }(\bm{J})~{}\rho c_{p}\left[\bm{N}(\bm{\xi}_{k})\right]^{T}\frac{\left(\bm{N}(% \bm{\xi}_{k})\bm{T}_{e}^{n+1}-\bm{N}(\bm{\xi}_{k})\bm{T}_{e}^{n}\right)^{2}}{% \Delta t}\right].∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG det ( bold_italic_J ) italic_ρ italic_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_N ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG ( bold_italic_N ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT - bold_italic_N ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_T start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ italic_t end_ARG ] .
\hdashline Transient Allen-Cahn ϕn(x,y)ϕn+1(x,y)superscriptitalic-ϕ𝑛𝑥𝑦superscriptitalic-ϕ𝑛1𝑥𝑦\phi^{n}(x,y)\to\phi^{n+1}(x,y)italic_ϕ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x , italic_y ) → italic_ϕ start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( italic_x , italic_y ) 𝐔θ,γϕen+1subscript𝐔𝜃𝛾superscriptsubscriptitalic-ϕ𝑒𝑛1\mathbf{U}_{\theta,\gamma}\equiv\phi_{e}^{n+1}bold_U start_POSTSUBSCRIPT italic_θ , italic_γ end_POSTSUBSCRIPT ≡ italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT, (Sec. 4.5) re=Ωe[𝑩]T[𝑩]ϕen+1𝑑VΩe[𝑵]T[𝑵]ϕen+1ϕenΔt𝑑Vsubscript𝑟𝑒subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑩𝑇delimited-[]𝑩superscriptsubscriptbold-italic-ϕ𝑒𝑛1differential-d𝑉subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑵𝑇delimited-[]𝑵superscriptsubscriptbold-italic-ϕ𝑒𝑛1superscriptsubscriptbold-italic-ϕ𝑒𝑛Δ𝑡differential-d𝑉r_{e}=-\int_{\Omega_{e}}[\bm{B}]^{T}\bm{[}\bm{B}]\bm{\phi}_{e}^{n+1}~{}dV-\int% _{\Omega_{e}}[\bm{N}]^{T}~{}[\bm{N}]\frac{\bm{\phi}_{e}^{n+1}-\bm{\phi}_{e}^{n% }}{\Delta t}~{}dVitalic_r start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = - ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_B ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_[ bold_italic_B ] bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT italic_d italic_V - ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ bold_italic_N ] divide start_ARG bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT - bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ italic_t end_ARG italic_d italic_V            Ωe[𝑵]T1ε2[(𝑵ϕen+1)21]𝑵ϕen+1𝑑V+Ωt[𝑵]T(ϕen+1𝒏)𝑑S.subscriptsubscriptΩ𝑒superscriptdelimited-[]𝑵𝑇1superscript𝜀2delimited-[]superscript𝑵superscriptsubscriptbold-italic-ϕ𝑒𝑛121𝑵superscriptsubscriptbold-italic-ϕ𝑒𝑛1differential-d𝑉subscriptsubscriptΩ𝑡superscriptdelimited-[]𝑵𝑇superscriptsubscriptbold-italic-ϕ𝑒𝑛1𝒏differential-d𝑆-\int_{\Omega_{e}}[\bm{N}]^{T}\frac{1}{\varepsilon^{2}}\left[\left(\bm{N}\bm{% \phi}_{e}^{n+1}\right)^{2}-1\right]\bm{N}\bm{\phi}_{e}^{n+1}~{}dV+\int_{\Omega% _{t}}[\bm{N}]^{T}\left(\nabla\bm{\phi}_{e}^{n+1}\cdot\bm{n}\right)~{}dS.- ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ ( bold_italic_N bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ] bold_italic_N bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT italic_d italic_V + ∫ start_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_N ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∇ bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ⋅ bold_italic_n ) italic_d italic_S .
iFOL=e=1nel[ϕen+1]T[𝐦𝐢𝐬𝐬𝐢𝐧𝐠(𝒌=𝟏𝒏𝒊𝒏𝒕𝒘𝒌𝟐det(𝑱)[𝑩(𝝃𝒌)]𝑻𝑩(𝝃𝒌))ϕen+1]subscript𝑖𝐹𝑂𝐿superscriptsubscript𝑒1subscript𝑛𝑒𝑙superscriptdelimited-[]superscriptsubscriptbold-italic-ϕ𝑒𝑛1𝑇delimited-[]𝐦𝐢𝐬𝐬𝐢𝐧𝐠superscriptsubscript𝒌1subscript𝒏𝒊𝒏𝒕subscript𝒘𝒌2det𝑱superscriptdelimited-[]𝑩subscript𝝃𝒌𝑻𝑩subscript𝝃𝒌superscriptsubscriptbold-italic-ϕ𝑒𝑛1\mathcal{L}_{iFOL}=\sum_{e=1}^{n_{el}}\left[\bm{\phi}_{e}^{n+1}\right]^{T}% \left[\bm{\left missing}(\sum_{k=1}^{n_{int}}\dfrac{w_{k}}{2}~{}\text{det}(\bm% {J})~{}[\bm{B}(\bm{\xi}_{k})]^{T}\bm{B}(\bm{\xi}_{k})\right)~{}\bm{\phi}_{e}^{% n+1}\right]caligraphic_L start_POSTSUBSCRIPT italic_i italic_F italic_O italic_L end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ bold_missing bold_( bold_∑ start_POSTSUBSCRIPT bold_italic_k bold_= bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_italic_n start_POSTSUBSCRIPT bold_italic_i bold_italic_n bold_italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG bold_italic_w start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT end_ARG start_ARG bold_2 end_ARG det bold_( bold_italic_J bold_) bold_[ bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) bold_] start_POSTSUPERSCRIPT bold_italic_T end_POSTSUPERSCRIPT bold_italic_B bold_( bold_italic_ξ start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT bold_) ) bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ]                +e=1nel[k=1nintwk2det(𝑱)[𝑵(𝝃k)]T(𝑵(𝝃k)ϕen+1𝑵(𝝃k)ϕen)2Δt]superscriptsubscript𝑒1subscript𝑛𝑒𝑙delimited-[]superscriptsubscript𝑘1subscript𝑛𝑖𝑛𝑡subscript𝑤𝑘2det𝑱superscriptdelimited-[]𝑵subscript𝝃𝑘𝑇superscript𝑵subscript𝝃𝑘superscriptsubscriptbold-italic-ϕ𝑒𝑛1𝑵subscript𝝃𝑘superscriptsubscriptbold-italic-ϕ𝑒𝑛2Δ𝑡+\sum_{e=1}^{n_{el}}\left[\bm{\sum}_{k=1}^{n_{int}}\dfrac{w_{k}}{2}~{}\text{% det}(\bm{J})~{}[\bm{N}(\bm{\xi}_{k})]^{T}\frac{\left(\bm{N}(\bm{\xi}_{k})\bm{% \phi}_{e}^{n+1}-\bm{N}(\bm{\xi}_{k})\bm{\phi}_{e}^{n}\right)^{2}}{\Delta t}\right]+ ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG det ( bold_italic_J ) [ bold_italic_N ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG ( bold_italic_N ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT - bold_italic_N ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ italic_t end_ARG ]                +e=1nel[k=1nintwkdet(𝑱)1ε2[(𝑵(𝝃k)ϕen+1)21]24].superscriptsubscript𝑒1subscript𝑛𝑒𝑙delimited-[]superscriptsubscript𝑘1subscript𝑛𝑖𝑛𝑡subscript𝑤𝑘det𝑱1superscript𝜀2superscriptdelimited-[]superscript𝑵subscript𝝃𝑘superscriptsubscriptbold-italic-ϕ𝑒𝑛12124+\sum_{e=1}^{n_{el}}\left[\bm{\sum}_{k=1}^{n_{int}}w_{k}~{}\text{det}(\bm{J})~% {}\frac{1}{\varepsilon^{2}}\dfrac{\left[(\bm{N}(\bm{\xi}_{k})\bm{\phi}_{e}^{n+% 1})^{2}-1\right]^{2}}{4}\right].+ ∑ start_POSTSUBSCRIPT italic_e = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ bold_∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT det ( bold_italic_J ) divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG [ ( bold_italic_N ( bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_ϕ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ] .

In all the reported studies, we utilized simple first-order shape functions. The element type used in the studies varies based on the application, ranging from structured to unstructured meshes, including quadrilateral and tetrahedral elements, demonstrating the flexibility of the approach in handling problems with complex geometries.

Moreover, in the above set of equations, nintsubscript𝑛intn_{\text{int}}italic_n start_POSTSUBSCRIPT int end_POSTSUBSCRIPT and nelsubscript𝑛eln_{\text{el}}italic_n start_POSTSUBSCRIPT el end_POSTSUBSCRIPT represent the number of Gaussian integration points and number of elements, respectively. 𝝃ksubscript𝝃𝑘\bm{\xi}_{k}bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denote the coordinates and weight of the k𝑘kitalic_k-th integration point. The determinant of the Jacobian matrix is denoted by det(𝑱)det𝑱\text{det}(\bm{J})det ( bold_italic_J ). Readers are encouraged to refer to Bathe (1996); Hughes (2000) as well as Rezaei et al. (2025b); Yamazaki et al. (2025b) and the references therein for more details on the FE formulation.

Appendix B Choice of Hyperparameters

A comprehensive summary of the network’s hyperparameters and configurations for all the models introduced above, as well as the various analyses conducted, is provided in Tables 4 and 5. NVIDIA Quadro RTX 6000s with 24 GB of RAM and NVIDIA A100 with 40 GB of RAM were utilized for the training of the stationary problems. The studies on the transient problems were performed on NVIDIA GeForce RTX 4090 with 24 GB of RAM.

We conducted extensive studies on multiple parameters, including latent size, learning rate, neural network structure, and the frequency of sinusoidal functions, among others. The values reported in Tables 4 and 5 correspond to the best-performing configurations. In general, we did not observe significant improvements with frequency parameter ω0subscript𝜔0\omega_{0}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT other than 30303030, which interestingly performed well across various problem classes. Increasing the depth and latent size proved consistently beneficial with the sinusoidal activation function used in this study, though it comes with a higher training cost. Depending on problem complexity, reducing the number of epochs and latent iterations can further accelerate both training and inference in the proposed iFOL method. As discussed in the results section, having a larger number of samples also leads to better performance. On the other hand, the training cost may increase depending on the chosen batch size, and one must adapt the network architecture and latent size accordingly.

Table 4: Network hyperparameters for stationary problems
Training parameter Nonlin. elas. (Sec. 4.1) Lin. elas. (Sec. 4.2) Staion. Nonlin. therm. (Sec. 4.3)
Number of samples 8000 1000 8000
Grid in training 41×41414141\times 4141 × 41 5605560556055605 5485548554855485
Grid in evaluation 101×101101101101\times 101101 × 101 5605560556055605 37362
Synthesizers [64]*4 [256]*6 [128]*6
ω0subscript𝜔0\omega_{0}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 30 30 30
Modulators Linear (FiLM) Linear (FiLM) Linear (FiLM)
Latent size 512 1024 512
Number of latent iterations 3 3 3
Latent/encoding learning rate 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
Training learning rate 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT to 107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT
Batch size 320 120 100
Gradient normalization Yes Yes Yes
Number of epochs 10000 10000 3800
Optimizer Adam Adam Adam
Total trainable parameters 143938 330755 476417
Table 5: Network hyperparameters for the transient problems
Training parameter Transient nonlin. thermal (Sec. 4.4) Transient nonlin. Allen-Cahn (Sec. 4.5)
Number of samples 6000 8000
Grid in training 51×51515151\times 5151 × 51 2624
Grid in evaluation 51×51515151\times 5151 × 51, 101×101101101101\times 101101 × 101 2624, 5678, 9873
Synthesizers [256]*6 [256]*6
ω0subscript𝜔0\omega_{0}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 30 30
Modulators Linear (FiLM) Linear (FiLM)
Latent size 256 256
Number of latent iterations 3 3
Latent/encoding learning rate 1.0×1021.0superscript1021.0\times 10^{-2}1.0 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 1.0×1021.0superscript1021.0\times 10^{-2}1.0 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
Training learning rate 1.0×1041.0superscript1041.0\times 10^{-4}1.0 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT to 1.0×1071.0superscript1071.0\times 10^{-7}1.0 × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 1.0×1041.0superscript1041.0\times 10^{-4}1.0 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT to 1.0×1071.0superscript1071.0\times 10^{-7}1.0 × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT
Batch size 120 100
Gradient normalization Yes Yes
Number of epochs 10000 10000
Optimizer Adam Adam
Total trainable parameters 723457 723201

References

  • Raissi et al. [2019] M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2018.10.045. URL https://www.sciencedirect.com/science/article/pii/S0021999118307125.
  • Rezaei et al. [2022] Shahed Rezaei, Ali Harandi, Ahmad Moeineddin, Bai-Xiang Xu, and Stefanie Reese. A mixed formulation for physics-informed neural networks as a potential solver for engineering problems in heterogeneous domains: Comparison with finite element method. Computer Methods in Applied Mechanics and Engineering, 401:115616, 2022. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2022.115616. URL https://www.sciencedirect.com/science/article/pii/S0045782522005722.
  • Grossmann et al. [2024] Tamara G Grossmann, Urszula Julia Komorowska, Jonas Latz, and Carola-Bibiane Schönlieb. Can physics-informed neural networks beat the finite element method? IMA Journal of Applied Mathematics, 89(1):143–174, 05 2024. ISSN 0272-4960. doi:10.1093/imamat/hxae011. URL https://doi.org/10.1093/imamat/hxae011.
  • Faroughi et al. [2024] Salah A. Faroughi, Nikhil M. Pawar, Célio Fernandes, Maziar Raissi, Subasish Das, Nima K. Kalantari, and Seyed Kourosh Mahjour. Physics-guided, physics-informed, and physics-encoded neural networks and operators in scientific computing: Fluid and solid mechanics. Journal of Computing and Information Science in Engineering, 24(4):1–45, 2024. doi:10.1115/1.4064449. URL https://doi.org/10.1115/1.4064449.
  • Mitusch et al. [2021] Sebastian K. Mitusch, Simon W. Funke, and Miroslav Kuchta. Hybrid fem-nn models: Combining artificial neural networks with the finite element method. Journal of Computational Physics, 446:110651, 2021. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2021.110651. URL https://www.sciencedirect.com/science/article/pii/S0021999121005465.
  • Škardová et al. [2024] Kateřina Škardová, Alexandre Daby-Seesaram, and Martin Genet. Finite element neural network interpolation. part i: Interpretable and adaptive discretization for solving pdes. 2024. URL https://arxiv.org/abs/2412.05719.
  • Xiong et al. [2025] Wei Xiong, Xiangyun Long, Stéphane P.A. Bordas, and Chao Jiang. The deep finite element method: A deep learning framework integrating the physics-informed neural networks with the finite element method. Computer Methods in Applied Mechanics and Engineering, 436:117681, 2025. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2024.117681. URL https://www.sciencedirect.com/science/article/pii/S0045782524009356.
  • Li et al. [2024] Haolin Li, Yuyang Miao, Zahra Sharif Khodaei, and M. H. Aliabadi. Finite-pinn: A physics-informed neural network architecture for solving solid mechanics problems with general geometries. 2024. URL https://arxiv.org/abs/2412.09453.
  • Lu et al. [2021] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021. ISSN 2522-5839. doi:10.1038/s42256-021-00302-5. URL https://doi.org/10.1038/s42256-021-00302-5.
  • Abueidda et al. [2025] Diab W. Abueidda, Panos Pantidis, and Mostafa E. Mobasher. Deepokan: Deep operator network based on kolmogorov arnold networks for mechanics problems. Computer Methods in Applied Mechanics and Engineering, 436:117699, 2025. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2024.117699. URL https://www.sciencedirect.com/science/article/pii/S0045782524009538.
  • He et al. [2024] Junyan He, Seid Koric, Diab Abueidda, Ali Najafi, and Iwona Jasiuk. Geom-deeponet: A point-cloud-based deep operator network for field predictions on 3d parameterized geometries. Computer Methods in Applied Mechanics and Engineering, 429:117130, 2024. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2024.117130. URL https://www.sciencedirect.com/science/article/pii/S0045782524003864.
  • Kumar et al. [2025] Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D. Shields, and George Em Karniadakis. Synergistic learning with multi-task deeponet for efficient pde problem solving. Neural Networks, 184:107113, 2025. ISSN 0893-6080. doi:https://doi.org/10.1016/j.neunet.2024.107113. URL https://www.sciencedirect.com/science/article/pii/S0893608024010426.
  • Yu et al. [2024] Xinling Yu, Sean Hooten, Ziyue Liu, Yequan Zhao, Marco Fiorentino, Thomas Van Vaerenbergh, and Zheng Zhang. Separable operator networks, 2024. URL https://arxiv.org/abs/2407.11253.
  • Wang et al. [2021] Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of parametric partial differential equations with physics-informed deeponets. Science Advances, 7(40):eabi8605, 2021. doi:10.1126/sciadv.abi8605. URL https://www.science.org/doi/abs/10.1126/sciadv.abi8605.
  • Mandl et al. [2025] Luis Mandl, Somdatta Goswami, Lena Lambers, and Tim Ricken. Separable physics-informed deeponet: Breaking the curse of dimensionality in physics-informed machine learning. Computer Methods in Applied Mechanics and Engineering, 434:117586, 2025. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2024.117586. URL https://www.sciencedirect.com/science/article/pii/S0045782524008405.
  • Li et al. [2025] Haolin Li, Yuyang Miao, Zahra Sharif Khodaei, and M.H. Aliabadi. An architectural analysis of deeponet and a general extension of the physics-informed deeponet model on solving nonlinear parametric partial differential equations. Neurocomputing, 611:128675, 2025. ISSN 0925-2312. doi:https://doi.org/10.1016/j.neucom.2024.128675. URL https://www.sciencedirect.com/science/article/pii/S0925231224014462.
  • Li et al. [2021] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations, 2021. URL https://arxiv.org/abs/2010.08895.
  • Azizzadenesheli et al. [2024] Kamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, and Anima Anandkumar. Neural operators for accelerating scientific simulations and design. Nature Reviews Physics, 6(5):320–328, 2024. ISSN 2522-5820. doi:10.1038/s42254-024-00712-5. URL https://doi.org/10.1038/s42254-024-00712-5.
  • Li et al. [2023a] Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar. Fourier neural operator with learned deformations for pdes on general geometries. Journal of Machine Learning Research, 24(388):1–26, 2023a. URL http://jmlr.org/papers/v24/23-0064.html.
  • Li et al. [2023b] Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. Physics-informed neural operator for learning partial differential equations, 2023b. URL https://arxiv.org/abs/2111.03794.
  • Ronneberger et al. [2015] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation, 2015. URL https://arxiv.org/abs/1505.04597.
  • Chen et al. [2019] Junfeng Chen, Jonathan Viquerat, and Elie Hachem. U-net architectures for fast prediction of incompressible laminar flows, 2019. URL https://arxiv.org/abs/1910.13532.
  • Mendizabal et al. [2020] Andrea Mendizabal, Pablo Márquez-Neila, and Stéphane Cotin. Simulation of hyperelastic materials in real-time using deep learning. Medical Image Analysis, 59:101569, 2020. ISSN 1361-8415. doi:https://doi.org/10.1016/j.media.2019.101569. URL https://www.sciencedirect.com/science/article/pii/S1361841519301094.
  • Mianroodi et al. [2022] Jaber Rezaei Mianroodi, Shahed Rezaei, Nima H. Siboni, Bai-Xiang Xu, and Dierk Raabe. Lossless multi-scale constitutive elastic relations with artificial intelligence. npj Computational Materials, 8(1):67, April 2022. ISSN 2057-3960. doi:10.1038/s41524-022-00753-3. URL https://doi.org/10.1038/s41524-022-00753-3.
  • Gupta et al. [2023] Ashwini Gupta, Anindya Bhaduri, and Lori Graham-Brady. Accelerated multiscale mechanics modeling in a deep learning framework. Mechanics of Materials, 184:104709, 2023. ISSN 0167-6636. doi:https://doi.org/10.1016/j.mechmat.2023.104709. URL https://www.sciencedirect.com/science/article/pii/S0167663623001552.
  • Najafi Koopas et al. [2025] Rasoul Najafi Koopas, Shahed Rezaei, Natalie Rauter, Richard Ostwald, and Rolf Lammering. A spatiotemporal deep learning framework for prediction of crack dynamics in heterogeneous solids: Efficient mapping of concrete microstructures to its fracture properties. Engineering Fracture Mechanics, 314:110675, 2025. ISSN 0013-7944. doi:https://doi.org/10.1016/j.engfracmech.2024.110675. URL https://www.sciencedirect.com/science/article/pii/S0013794424008385.
  • Yin et al. [2024] Minglang Yin, Nicolas Charon, Ryan Brody, Lu Lu, Natalia Trayanova, and Mauro Maggioni. A scalable framework for learning the geometry-dependent solution operators of partial differential equations. Nature Computational Science, 4(12):928–940, 2024. ISSN 2662-8457. doi:10.1038/s43588-024-00732-2. URL https://doi.org/10.1038/s43588-024-00732-2.
  • Xiao et al. [2024] Shanshan Xiao, Pengzhan Jin, and Yifa Tang. A deformation-based framework for learning solution mappings of pdes defined on varying domains, 2024. URL https://arxiv.org/abs/2412.01379.
  • Zeng et al. [2025] Chenyu Zeng, Yanshu Zhang, Jiayi Zhou, Yuhan Wang, Zilin Wang, Yuhao Liu, Lei Wu, and Daniel Zhengyu Huang. Point cloud neural operator for parametric pdes on complex and variable geometries, 2025. URL https://arxiv.org/abs/2501.14475.
  • Chen et al. [2024a] Gengxiang Chen, Zhi Li, Chao Li, Zhen Li, and Yike Guo. Learning neural operators on riemannian manifolds. National Science Open, 3(1):20240001, 2024a. doi:10.1360/nso/20240001. URL https://www.sciengine.com/NSO/doi/10.1360/nso/20240001.
  • Yamazaki et al. [2025a] Yusuke Yamazaki, Ali Harandi, Mayu Muramatsu, Alexandre Viardin, Markus Apel, Tim Brepols, Stefanie Reese, and Shahed Rezaei. A finite element-based physics-informed operator learning framework for spatiotemporal partial differential equations on arbitrary domains. Engineering with Computers, 41(1):1–29, 2025a. ISSN 1435-5663. doi:10.1007/s00366-024-02033-8. URL https://doi.org/10.1007/s00366-024-02033-8.
  • Rezaei et al. [2025a] Shahed Rezaei, Reza Najian Asl, Shirko Faroughi, Mahdi Asgharzadeh, Ali Harandi, Rasoul Najafi Koopas, Gottfried Laschet, Stefanie Reese, and Markus Apel. A finite operator learning technique for mapping the elastic properties of microstructures to their mechanical deformations. International Journal for Numerical Methods in Engineering, 126(1):e7637, 2025a. doi:https://doi.org/10.1002/nme.7637. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.7637.
  • Xu et al. [2024] Tengfei Xu, Dachuan Liu, Peng Hao, and Bo Wang. Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations. Journal of the Mechanics and Physics of Solids, 190:105714, 2024. ISSN 0022-5096. doi:https://doi.org/10.1016/j.jmps.2024.105714. URL https://www.sciencedirect.com/science/article/pii/S0022509624001807.
  • Eshaghi et al. [2025] Mohammad Sadegh Eshaghi, Cosmin Anitescu, Manish Thombre, Yizheng Wang, Xiaoying Zhuang, and Timon Rabczuk. Variational physics-informed neural operator (vino) for solving partial differential equations. Computer Methods in Applied Mechanics and Engineering, 437:117785, 2025. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2025.117785. URL https://www.sciencedirect.com/science/article/pii/S004578252500057X.
  • Lee et al. [2025] Jae Yong Lee, Seungchan Ko, and Youngjoon Hong. Finite element operator network for solving elliptic-type parametric pdes, 2025. URL https://arxiv.org/abs/2308.04690.
  • Kaewnuratchadasorn et al. [2024] Chawit Kaewnuratchadasorn, Jiaji Wang, and Chul-Woo Kim. Physics-informed neural operator solver and super-resolution for solid mechanics. Computer-Aided Civil and Infrastructure Engineering, 39(22):3435–3451, 2024. doi:https://doi.org/10.1111/mice.13292. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/mice.13292.
  • Franco et al. [2023] Nicola Rares Franco, Andrea Manzoni, and Paolo Zunino. Mesh-informed neural networks for operator learning in finite element spaces. Journal of Scientific Computing, 97(35), 2023. doi:10.1007/s10915-023-02331-1. URL https://doi.org/10.1007/s10915-023-02331-1.
  • Harandi et al. [2025] Ali Harandi, Hooman Danesh, Kevin Linka, Stefanie Reese, and Shahed Rezaei. A spectral-based physics-informed finite operator learning for prediction of mechanical behavior of microstructures, 2025. URL https://arxiv.org/abs/2410.19027.
  • Serrano et al. [2023] Louis Serrano, Lise Le Boudec, Armand Kassaï Koupaï, Thomas X Wang, Yuan Yin, Jean-Noël Vittaut, and Patrick Gallinari. Operator learning with neural fields: Tackling pdes on general geometries. Advances in Neural Information Processing Systems, 36:70581–70611, 2023.
  • Naour et al. [2024] Etienne Le Naour, Louis Serrano, Léon Migus, Yuan Yin, Ghislain Agoua, Nicolas Baskiotis, Patrick Gallinari, and Vincent Guigue. Time series continuous modeling for imputation and forecasting with implicit neural representations. 2024. URL https://arxiv.org/abs/2306.05880.
  • Boudec et al. [2024] Lise Le Boudec, Emmanuel de Bezenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, and Patrick Gallinari. Learning a neural solver for parametric pde to enhance physics-informed methods. 2024. URL https://arxiv.org/abs/2410.06820.
  • Yeom et al. [2025] Taesun Yeom, Sangyoon Lee, and Jaeho Lee. Fast training of sinusoidal neural fields via scaling initialization, 2025. URL https://arxiv.org/abs/2410.04779.
  • Hagnberger et al. [2024] Jan Hagnberger, Marimuthu Kalimuthu, Daniel Musekamp, and Mathias Niepert. Vectorized conditional neural fields: A framework for solving time-dependent parametric partial differential equations, 2024. URL https://arxiv.org/abs/2406.03919.
  • Du et al. [2024a] Pan Du, Meet Hemant Parikh, Xiantao Fan, Xin-Yang Liu, and Jian-Xun Wang. Conditional neural field latent diffusion model for generating spatiotemporal turbulence. Nature Communications, 15(1):10416, November 2024a. ISSN 2041-1723. doi:10.1038/s41467-024-54712-1. URL https://doi.org/10.1038/s41467-024-54712-1.
  • Dupont et al. [2022a] Emilien Dupont, Hrushikesh Loya, Milad Alizadeh, Adam Goliński, Yee Whye Teh, and Arnaud Doucet. Coin++: Neural compression across modalities. arXiv preprint arXiv:2201.12904, 2022a.
  • Catalani et al. [2024] Giovanni Catalani, Siddhant Agarwal, Xavier Bertrand, Frédéric Tost, Michael Bauerheim, and Joseph Morlier. Neural fields for rapid aircraft aerodynamics simulations. Scientific Reports, 14(1):25496, 2024.
  • Sitzmann et al. [2020] Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
  • Xie et al. [2022] Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, and Srinath Sridhar. Neural fields in visual computing and beyond. In Computer Graphics Forum, volume 41, pages 641–676. Wiley Online Library, 2022.
  • Perez et al. [2018] Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  • Dupont et al. [2022b] Emilien Dupont, Hyunjik Kim, SM Eslami, Danilo Rezende, and Dan Rosenbaum. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204, 2022b.
  • Yin et al. [2022] Yuan Yin, Matthieu Kirchmeyer, Jean-Yves Franceschi, Alain Rakotomamonjy, and Patrick Gallinari. Continuous pde dynamics forecasting with implicit neural representations. arXiv preprint arXiv:2209.14855, 2022.
  • Zintgraf et al. [2019] Luisa Zintgraf, Kyriacos Shiarli, Vitaly Kurin, Katja Hofmann, and Shimon Whiteson. Fast context adaptation via meta-learning. In International conference on machine learning, pages 7693–7702. PMLR, 2019.
  • Rezaei et al. [2025b] Shahed Rezaei, Reza Najian Asl, Shirko Faroughi, Mahdi Asgharzadeh, Ali Harandi, Rasoul Najafi Koopas, Gottfried Laschet, Stefanie Reese, and Markus Apel. A finite operator learning technique for mapping the elastic properties of microstructures to their mechanical deformations. International Journal for Numerical Methods in Engineering, 126(1):e7637, 2025b.
  • Rezaei et al. [2024] Shahed Rezaei, Reza Najian Asl, Kianoosh Taghikhani, Ahmad Moeineddin, Michael Kaliske, and Markus Apel. Finite operator learning: Bridging neural operators and numerical methods for efficient parametric solution and optimization of pdes. arXiv preprint arXiv:2407.04157, 2024.
  • Du et al. [2024b] Pan Du, Meet Hemant Parikh, Xiantao Fan, Xin-Yang Liu, and Jian-Xun Wang. Conditional neural field latent diffusion model for generating spatiotemporal turbulence. Nature Communications, 15(1):10416, 2024b.
  • Dingreville et al. [2024] Rémi Dingreville, Pablo Seleson, Nathaniel Trask, Mitchell Wood, and Donghyun You. Rethinking materials simulations: Blending direct numerical simulations with neural operators. npj Computational Materials, 10(1):124, 2024. doi:10.1038/s41524-024-01319-1. URL https://www.nature.com/articles/s41524-024-01319-1.
  • Yamazaki et al. [2025b] Yusuke Yamazaki, Ali Harandi, Mayu Muramatsu, Alexandre Viardin, Markus Apel, Tim Brepols, Stefanie Reese, and Shahed Rezaei. A finite element-based physics-informed operator learning framework for spatiotemporal partial differential equations on arbitrary domains. Engineering with Computers, 41(1):1–29, 2025b.
  • Wang et al. [2024] Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality for training physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 421:116813, 2024. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2024.116813. URL https://www.sciencedirect.com/science/article/pii/S0045782524000690.
  • Chen et al. [2024b] Wan-Xin Chen, Jeffery M. Allen, Shahed Rezaei, Orkun Furat, Volker Schmidt, Avtar Singh, Peter J. Weddle, Kandler Smith, and Bai-Xiang Xu. Cohesive phase-field chemo-mechanical simulations of inter- and trans- granular fractures in polycrystalline nmc cathodes via image-based 3d reconstruction. Journal of Power Sources, 596:234054, 2024b. ISSN 0378-7753. doi:https://doi.org/10.1016/j.jpowsour.2024.234054. URL https://www.sciencedirect.com/science/article/pii/S0378775324000053.
  • Li et al. [2023c] Wei Li, Martin Z. Bazant, and Juner Zhu. Phase-field deeponet: Physics-informed deep operator neural network for fast simulations of pattern formation governed by gradient flows of free-energy functionals. Computer Methods in Applied Mechanics and Engineering, 416:116299, 2023c. ISSN 0045-7825. doi:https://doi.org/10.1016/j.cma.2023.116299. URL https://www.sciencedirect.com/science/article/pii/S0045782523004231.
  • Koopas et al. [2024] Rasoul Najafi Koopas, Shahed Rezaei, Natalie Rauter, Richard Ostwald, and Rolf Lammering. Introducing a microstructure-embedded autoencoder approach for reconstructing high-resolution solution field data from a reduced parametric space. Computational Mechanics, November 2024. doi:10.1007/s00466-024-02568-z. URL https://link.springer.com/article/10.1007/s00466-024-02568-z.
  • Quarteroni and Rozza [2011] Alfio Quarteroni and Gianluigi Rozza. Reduced order methods for modeling and computational reduction. Archives of Computational Methods in Engineering, 17(3):149–159, 2011. doi:10.1007/s11831-011-9064-7.
  • Bradbury et al. [2018] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. Jax: Composable transformations of python+numpy programs. GitHub repository, 2018. URL https://github.com/google/jax.
  • Allen and Cahn [1979] Samuel M. Allen and John W. Cahn. A microscopic theory for antiphase boundary motion and its application to antiphase domain coarsening. Acta Metallurgica, 27(6):1085–1095, 1979. ISSN 0001-6160. doi:https://doi.org/10.1016/0001-6160(79)90196-2. URL https://www.sciencedirect.com/science/article/pii/0001616079901962.
  • Karma and Rappel [1998] Alain Karma and Wouter-Jan Rappel. Quantitative phase-field modeling of dendritic growth in two and three dimensions. Phys. Rev. E, 57:4323–4349, Apr 1998. doi:10.1103/PhysRevE.57.4323. URL https://link.aps.org/doi/10.1103/PhysRevE.57.4323.
  • Bourdin et al. [2000] B. Bourdin, G.A. Francfort, and J-J. Marigo. Numerical experiments in revisited brittle fracture. Journal of the Mechanics and Physics of Solids, 48(4):797–826, 2000. ISSN 0022-5096. doi:https://doi.org/10.1016/S0022-5096(99)00028-9. URL https://www.sciencedirect.com/science/article/pii/S0022509699000289.
  • Rezaei et al. [2021] Shahed Rezaei, Jaber Rezaei Mianroodi, Tim Brepols, and Stefanie Reese. Direction-dependent fracture in solids: Atomistically calibrated phase-field and cohesive zone model. Journal of the Mechanics and Physics of Solids, 147:104253, 2021. ISSN 0022-5096. doi:https://doi.org/10.1016/j.jmps.2020.104253. URL https://www.sciencedirect.com/science/article/pii/S0022509620304634.
  • Hughes [2000] Thomas J. R. Hughes. The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Dover Publications, 2000.
  • Bathe [1996] Klaus-Jürgen Bathe. Finite Element Procedures. Prentice Hall, 1996.