Implicit Neural Differential Model for Spatiotemporal Dynamics
Abstract
Hybrid neural–physics modeling frameworks through differentiable programming have emerged as powerful tools in scientific machine learning, enabling the integration of known physics with data-driven learning to improve prediction accuracy and generalizability. However, most existing hybrid frameworks rely on explicit recurrent formulations, which suffer from numerical instability and error accumulation during long-horizon forecasting. In this work, we introduce Im-PiNDiff, a novel implicit physics-integrated neural differentiable solver for stable and accurate modeling of spatiotemporal dynamics. Inspired by deep equilibrium models, Im-PiNDiff advances the state using implicit fixed-point layers, enabling robust long-term simulation while remaining fully end-to-end differentiable. To enable scalable training, we introduce a hybrid gradient propagation strategy that integrates adjoint-state methods with reverse-mode automatic differentiation. This approach eliminates the need to store intermediate solver states and decouples memory complexity from the number of solver iterations, significantly reducing training overhead. We further incorporate checkpointing techniques to manage memory in long-horizon rollouts. Numerical experiments on various spatiotemporal PDE systems, including advection–diffusion processes, Burgers’ dynamics, and multi-physics chemical vapor infiltration processes, demonstrate that Im-PiNDiff achieves superior predictive performance, enhanced numerical stability, and substantial reductions in memory and runtime cost relative to explicit and naive implicit baselines. This work provides a principled, efficient, and scalable framework for hybrid neural–physics modeling.
keywords:
, Differentiable programming , Implicit neural networks , Scientific Machine Learning , Hybrid model , Grey-box model1 Introduction
Computational science is experiencing a transformative shift, driven by advancements in numerical techniques, artificial intelligence (AI), and the growing availability of extensive datasets. At the core of this transformation lies the scientific machine learning (SciML), a rapidly evolving field that synergistically integrates physics-based modeling with modern machine learning (ML). By combining data-driven insights with fundamental physical principles, SciML offers unprecedented opportunities to accurately model complex systems, enhance predictive capabilities, and optimize computational workflows across a wide range of scientific and engineering applications.
Notable methodologies underpinning SciML include physics-informed neural networks (PINNs) [1, 2, 3], neural operators [4, 5, 6], equation discovery techniques [7, 8, 9], and hybrid neural-physics models [10, 11, 12, 13, 14, 15, 16]. Among these, hybrid neural-physics models have attracted significant attention due to their explicit incorporation of domain knowledge into learning architecture, addressing the key limitations of purely data-driven approaches, such as limited extrapolation and generalization capabilities. This integration ensures consistency with known physical laws meanwhile enabling us to capture uncharacterized system dynamics, thus balancing flexibility with reliability. Historically, however, most hybrid frameworks have employed weak coupling strategies, wherein ML models are trained offline and subsequently embedded into conventional numerical solvers [17, 18, 19]. Such approaches, while prevalent in turbulence modeling [20, 21, 22] and atmospheric simulations [23, 24], are fundamentally limited. Their reliance on separately trained components prevents end-to-end optimization, restricting their robustness, generalization, and adaptability to complex, unseen scenarios.
Recognizing these challenges, recent trends advocate strongly integrated frameworks that enable end-to-end differentiable hybridization, leveraging indirect and sparse observational data. This integration is made possible by differentiable programming (DP) [25], which facilitates joint optimization of ML models and numerical solvers. Recent advances in differentiable physics and hybrid neural-physics models have demonstrated significant potential across various scientific domains [26, 27, 28, 29, 30, 10, 31, 13, 11, 12, 32]. For example, Kochkov et al. [10] utilized convolutional neural networks (CNNs) to accelerate differentiable computational fluid dynamics (CFD) solvers, while Huang et al. [30] embedded neural networks within a differentiable finite element solver to learn constitutive relations from indirect observations. Further advances by Wang and coworkers introduced a physics-integrated neural differentiable (PiNDiff) modeling framework, unifying neural operators with numerical PDE solvers to achieve enhanced generalization and accuracy [13, 11, 12, 32, 16].
Despite their considerable potential, current PiNDiff frameworks predominantly utilize explicit, auto-regressive recurrent architectures. Such explicit recurrent structures suffer from inherent numerical instability, error accumulation, and deteriorating performance in long-term predictions, particularly for stiff or chaotic systems, thereby limiting their practical applicability. Inspired by classical numerical analysis, where implicit methods offer superior stability properties, this paper proposes an innovative implicit neural differential model (Im-PiNDiff) for robust spatiotemporal dynamics prediction. By employing implicit neural network layers, our framework mitigates error accumulation and significantly enhances numerical stability and accuracy, enabling reliable long-term simulations.
However, adopting implicit neural architectures within differentiable frameworks introduces considerable computational and memory challenges. This is primarily due to the requirement of automatic differentiation (AD), which involves storing intermediate activations, computational graphs, and input data as buffers during forward propagation to compute gradients [33, 34]. These memory requirements grow exponentially in implicit learning architectures, involving bilevel optimization, iterative solvers, and extended simulations, frequently leading to prohibitive training times [35, 32, 25]. To address these computational hurdles, we introduce a hybrid training strategy that combines adjoint state methods with reverse-mode AD. The adjoint method decouple memory requirements from iterative solver iterations, significantly reducing computational overhead and memory usage. Our approach employs adjoint-based methods to compute and propagate gradients over the implicit layers while utilizing reverse-mode AD for explicit model components. Further, we employ strategic checkpointing techniques to optimize memory usage, ensuring scalability and practical feasibility for large-scale, complex problems. In summary, the key contributions of this work include: (1) a novel implicit PiNDiff framework integrating implicit neural architectures, differentiable numerical PDEs, and conditional neural field, enabling stable and accurate long-term predictions of complex spatiotemporal dynamics; (2) a hybrid gradient computation strategy that leverages adjoint-state methods and reverse-mode AD, significantly improving computational efficiency and reducing memory overhead; (3) numerical validation demonstrating improved performance, stability, and computational feasibility of the proposed Im-PiNDiff framework. Together, these innovations represent a significant step toward enabling the efficient and scalable application of implicit PiNDiff frameworks.
The rest of this paper is organized as follows: Section 2 details the proposed methodology, including the mathematical formulation, implicit neural differentiable model, and hybrid training strategies. Section 3 presents numerical experiments demonstrating the framework’s performance across a range of applications. Finally, Section 4 summarizes the contributions and outlines potential future research directions.
2 Methodology
2.1 Problem formulation
Most fundamental physical laws governing phenomena across diverse scientific and engineering disciplines, such as fluid dynamics, solid mechanics, heat transfer, electromagnetism, and quantum mechanics, are naturally expressed in the mathematical form of partial differential equations (PDEs). In practice, however, the exact forms of these PDEs often contain unknown or uncertain components due to incomplete understanding of underlying physics or inherent modeling limitations. Such scenarios can be represented by generic PDEs,
(1a) | ||||
(1b) | ||||
(1c) |
where nonlinear functions and represent the known and unknown components of the PDEs for state variable , coupled via the nonlinear functional . The initial and boundary conditions are abstractly defined by the PDE operators and , respectively. These functions rely on a set of physical parameters, with the known ones denoted by and uncertain ones by , respectively. The spatial and temporal coordinates are denoted as , whit physical domain , boundary , and time domain , resulting in a spatiotemporal domain .
Due to incomplete PDE formulations, hybrid neural models based on DP, e.g., PiNDiff, integrate deep neural networks (DNNs) to approximate unknown PDE components/operators or enhance known components through learnable operators. The integrated neural network parameters and uncertain physical parameters form a unified set of trainable parameters , optimized concurrently as part of a unified network, enabled by DP. The training of PiNDiff model over dataset is formulated as a PDE-constrained optimization problem,
(2a) | ||||
s.t. | (2b) | |||
(2c) | ||||
(2d) |
where is the objective function. Typically, existing PiNDiff models rely on explicit recurrent formulations, wherein solutions at each time step explicitly depend on the previous state. Although straightforward, these auto-regressive recurrent approaches commonly suffer from instability and error accumulation, especially during long-horizon predictions.
To address these limitations, we introduce an implicit PiNDiff model, termed Im-PiNDiff, inspired by implicit numerical schemes known for their superior numerical stability and robustness. Specifically, we replace explicit recurrent layers with an implicit layer formulation, analogous to the deep equilibrium model (DEQ) introduced by Bai et al. [36]. DEQ employs implicit neural computations through solving a fixed-point equilibrium, enabling effectively infinite-depth representations without explicit unrolling of iterative layers. Similarly, in the context of hybrid neural-physics modeling, Im-PiNDiff conceptualizes each time step as solving an implicit nonlinear equation rather than explicit forward stepping. Training the Im-PiNDiff model involves explicitly embedding PDE constraints, resulting in a PDE-constrained bilevel optimization problem formulated mathematically as follows:
(3a) | ||||
s.t. | (3b) | |||
(3c) | ||||
(3d) |
where the outer-level optimization aims to minimize the discrepancy between model predictions and observed data (the loss function), while the inner-level optimization involves solving a nonlinear equilibrium equation to determine the solution at each rollout step . A schematics illustrating the Im-PiNDiff and its training strategy is presented in Fig. 1

As shown in Fig. 1, propagating gradients through bilevel optimization poses significant challenges since standard AD frameworks typically require static computational graphs. Naively implementing unrolled differentiation with a fixed-iteration inner solver is computationally expensive and memory-intensive, severely limiting scalability. To circumvent this, we propose a hybrid training strategy integrating the adjoint-state implicit differentiation method with reverse-mode AD. Specifically, the adjoint-state approach propagates gradients through implicit equilibrium constraints, subsequently combined with reverse-mode AD gradients via the chain rule. A key benefit of adjoint-based gradient propagation is that it eliminates the necessity for storing intermediate computational nodes during forward propagation in implicit layers, significantly reducing memory usage. Additionally, we leverage checkpointing strategies to enhance memory efficiency during training. By combining adjoint methods with strategic checkpointing, our hybrid training strategy effectively balances computational performance and memory efficiency, enabling practical and scalable training of Im-PiNDiff models for large-scale scientific modeling problems.
2.2 Adjoint-based hybrid AD for efficient gradient propagation
Adjoint-based gradient computation methods have been increasingly adopted to enhance the efficiency and scalability of neural network training, especially for architectures involving implicitly defined operations [37, 38, 39, 40, 36]. In this study, we present an adjoint-based implicit differentiation approach for efficient Im-PiNDiff training, which require solving a nonlinear inner optimization during forward propagation.
The fundamental distinction between explicit and implicit layers in PiNDiff models is illustrated in Fig. 2.

We formally define an implicit layer by:
(4) |
where is the discrete state vector at the current step and represents known previous states, dependent implicitly on parameter . The nonlinear implicit function . Given the typically high dimensionality of the problem, iterative numerical root-finding algorithms are usually employed to solve for . Thus, the forward pass through the implicit layer can be succinctly expressed as,
(5) |
DP with standard AD computes gradients by systematically applying the chain rule to decompose complex computer program into elementary operations. During the forward propagation, DP evaluates and records intermediate variables and their associated computational dependencies, building a computational graph from the inputs to the outputs. In reverse-mode AD, gradients are computed by traversing this computational graph backwards. Specifically, starting from the output, DP computes vector-Jacobian products (VJPs) at each node, sequentially applying the chain rule in reverse order. These VJPs represent the sensitivity of the output with respect to each intermediate variable. For the total loss defined over the entire rollout trajectory of steps,
(6) |
the gradient of loss with respect to can be computed using the chain rule,
(7) |
where
(8a) | |||
(8b) |
where and are VJPs. Given that the output from implicit layer is obtained by solving a nonlinear equation through an iterative numerical solver, the process can be abstractly written as a sequence of intermediate iterates:
(9) |
where denotes the number of iterations needed to converge to the equilibrium solution . If one directly applies standard reverse-mode AD to propagate gradients through this implicit layer, the AD engine must record and retain the full computational graph of all intermediate iterates to compute VJPs. This results in substantial memory overhead, especially when the number of iterations or the state dimension is large.
To address this limitation, we present a hybrid gradient computation strategy using the discrete adjoint-state method, which decouples gradient propagation from the forward iteration history. Instead of tracing all internal solver steps, the adjoint method efficiently computes the required VJP by solving a single linear system, avoiding the need to store intermediate iterates of the root-finding process. Concretely, for each time step , the Jacobians of with respect to and are computed using the implicit function theorem [35],
(10a) | |||
(10b) |
Now we define the adjoint vector as,
(11) |
and then the VJPs are expressed as,
(12a) | |||
(12b) |
where is obtained by solving the linear system,
(13) |
This linear equation is solved efficiently using iterative numerical linear solvers (such as GMRES or conjugate gradient methods). Instead of explicitly forming the Jacobian matrix and , we utilize VJP functions provided by AD tools to directly obtain . Modern AD libraries (e.g., JAX, PyTorch, TensorFlow) directly provide VJP computations without forming the entire Jacobian matrix explicitly. Thus, the product is efficiently computed on-the-fly via AD-generated VJP functions, greatly enhancing computational and memory efficiency (More details on the derivation of the hybrid adjoint backpropagation can be found in C).
The complete hybrid adjoint-based AD procedure for gradient back-propagation through an implicit layer can be summarized explicitly as follows: (1) solve the linear equation iteratively for ; (2) compute the gradient with respect to parameters and using the obtained adjoint vector and provide it to the AD to continue further backpropagation.

The schematic illustration of forward and backward passes through the implicit layer is provided in Fig. 3, and the complete algorithmic implementation steps are clearly outlined below in Algorithm 1.
2.3 Conditional neural fields for latent physics inference
To capture spatially and temporally varying latent physical quantities, such as unresolved PDE terms, parametric fields, or unmodeled operators, we incorporate conditional neural fields (CNFs) into the Im-PiNDiff framework. Neural fields (NF), also known as coordinate-based implicit neural representations, offer a flexible and expressive mechanism for modeling continuous functions and operator-valued mappings over space and time. These representations have gained significant traction in computer vision, graphics, and scientific machine learning due to their ability to encode high-frequency and nonstationary features in data with minimal inductive bias [41, 42, 43].
Within the context of PDE-constrained modeling, neural fields provide an elegant tool for parameterizing unknown coefficients or operators that vary over the spatial or spatiotemporal domain. The coordinate-continuous nature makes them particularly well suited for this task, as they can be queried at arbitrary spatial or temporal resolutions, ensuring mesh-invariant predictions and generalization to unseen domains. In this work, we employ a conditional formulation of neural fields, wherein the predicted physical field, e.g., unknown advection velocity or diffusivity field , is conditioned on a latent embedding derived from auxiliary input. This conditioning allows the model to encode global contextual information, such as initial or boundary conditions, simulation settings, or observed response trajectories. As illustrated in Fig. 4, our architecture comprises three modules: a hypernetwork that maps condition vectors to latent codes, a linear projector that translates latent codes to NF parameters, and a base NF network that evaluates the inferred field at queried coordinates.

Specifically, given a conditioning vector representing contextual input (e.g., initial/boundary condition encodings or low-dimensional representations of observed dynamics), a hypernetwork produces a latent code ,
(14) |
which is subsequently projected via a learnable linear operator into the NF parameters . The base NF network, implemented as a sinusoidal representation network (SIREN) [42], then evaluates the spatial or spatiotemporal field as
(15) |
where represents the inferred latent field, such as a velocity or forcing term. The use of SIREN allows the model to resolve fine-scale variations in the latent field and capture complex physical patterns across domains with smooth and expressive function approximations.
Crucially, the conditional neural field is mixed directly with the known PDE operators to form the hybrid neural PDE operator in the Im-PiNDiff solver, allowing it to affect the system dynamics during training. Because the CNF module is fully differentiable and compatible with our adjoint-based gradient propagation strategy, end-to-end learning remains tractable and memory-efficient. This integration enables the identification of latent physics from sparse indirect observations. That is, only a small part of the state trajectories are observable, and the physical field itself is hidden.
3 Results and Discussion
3.1 Forward and nnverse modeling of spatiotemporal physics
We assessed the proposed Im-PiNDiff framework on two canonical spatiotemporal PDE systems: the advection-diffusion equation and the scalar Burgers’ equation. These case studies demonstrate the model’s capability to perform both forward prediction and inverse inference of latent physical fields under linear and nonlinear dynamics, respectively. In particular, the advection–diffusion system serves as a testbed for investigating the accuracy and stability of Im-PiNDiff in handling smooth transport-diffusion processes, while the Burgers’ equation probes its robustness in capturing nonlinear wave propagation and shock formation.
3.1.1 Advection–diffusion processes with steady advection fields
We begin with a 2D advection–diffusion system, which describes the spatiotemporal evolution of a scalar field under combined effects of directional transport and diffusion,
(16) |
where is the diffusion coefficient and , are the advection coefficients in the and directions, respectively. Here, and are treated as unknown spatial fields to be inferred during Im-PiNDiff training, where partial observations of the state variable are used as labels. These unknown fields are parameterized by CNFs.
To generate training and testing data, we employed a high resolution finite-volume (FV) solver with fourth-order Runge–Kutta (RK4) time integration and a time step of s. The physical domain of size was discretized using a Cartesian mesh. A set of ten initial conditions was randomly generated from Gaussian processes (GPs) on a coarse grid and projecting them onto the simulation mesh. These GP fields were constructed using radial basis kernels with varying length scales to promote diversity in initial configurations. Ground truth advection velocity fields were generated as smooth superpositions of cosine functions (details in D) and similarly projected to the simulation grid. For training, only a very limited number of state observations, specifically, two snapshots of , were provided, simulating a data-sparse regime. The model was then evaluated on unseen initial conditions to assess its ability to reconstruct the full spatiotemporal state () trajectories over extended prediction horizons, while simultaneously inferring the hidden advection velocity fields from these sparse, indirect observations.
The Im-PiNDiff model was constructed using the FV discretization of the governing PDEs, where the unknown spatial advection velocity fields are modeled as CNFs parameterized continuously over space and conditioned on time. To achieve stable and accurate long-term state propagation, the temporal autoregression is through the implicit layer using a second-order Crank–Nicolson scheme,
(17) |
propagating state from to . The neural model uses a relative large time step of s, an order of magnitude larger than the step size used in generating the training data. At each time step, Eq. (17) requires solving a nonlinear system due to its implicit dependence on , which is accomplished via the Biconjugate Gradient Stabilized (BiCGStab) method. To enable efficient end-to-end training, we applied the hybrid adjoint-based AD backpropagation algorithm introduced in Section 2.
We first evaluated the Im-PiNDiff framework for the scenario where the advection velocity fields were assumed to be spatially varying but temporally invariant, i.e., and . The model was trained using only two snapshots of the state field: the initial condition and an observation at s, as indicated in the top-right panel of Fig. 5.

After 2000 epochs of training, the model produced accurate forward predictions of the scalar field over the extended horizon s and simultaneously inferred the underlying steady advection fields. As shown in Fig. 5, the predicted scalar field trajectories closely match the ground truth with a relative error of approximately 2%, while the advection fields and are accurately inferred as well, with 3% error compared to the true fields.

The trained model was further tested on out-of-distribution (OOD) initial conditions generated using radial basis kernels with characteristic length scales different from those used during training. As illustrated in Fig. 6, the model accurately predicted the spatiotemporal evolution of the scalar field over the full time horizon s, despite having never seen such initial states during training. The predictions remain stable and accurate across time, with relative errors generally below 2%, indicating strong robustness to randomly generated unseen initial conditions.
These results highlight the model’s capacity to perform robust forward/inverse modeling from extremely limited data, accurately reconstructing both state dynamics and hidden physics. Notably, no direct observations of the advection fields were used during training, underscoring the effectiveness of the CNF and hybrid adjoint-AD gradient propagation over the entire program in inferring latent physics from sparse indirect measurements.
3.1.2 Advection–diffusion processes with dynamic advection fields
To further evaluate the robustness of Im-PiNDiff in modeling non-stationary physics, we consider a more challenging setting where the advection fields vary in both space and time, i.e., and . This setting poses a significant challenge for spatiotemporal inversion, as the time-varying latent fields are inferred from sparse and irregular data.


The training set consists of eight snapshots sampled at nonuniform time intervals s over a total simulation horizon of s. This choice reflects realistic scenarios where sensor data may be irregularly sampled in time. The model is trained for 10,000 epochs to jointly learn both the forward state evolution and the hidden time-varying advection fields and . The learned model is then used to reconstruct full spatiotemporal fields at unobserved time points , which are also sampled irregularly.
As shown in Fig. 7, the Im-PiNDiff model accurately recovers the ground truth dynamics and latent velocity fields. The left panels show the inferred and fields in comparison with their ground truth counterparts, along with relative error maps across time. Despite the inherent ill-posedness of recovering time-dependent vector fields from limited state observations, the inferred advection fields capture the spatial-temporal structures reasonably well, with a relative error around 10%. The right panels show the predicted scalar field , achieving high accuracy with a relative error consistently below 2% across all prediction times. Generalization performance is further evaluated using OOD initial conditions, generated from unseen Gaussian process realizations. The predictions on these OOD cases, shown in Fig. 8, demonstrate excellent agreement with ground truth, confirming that the Im-PiNDiff model generalizes robustly to new initializations with high accuracy.
3.1.3 Scalar Burgers’ dynamics with spatially varying viscosity fields
We further evaluate the Im-PiNDiff framework on the 2D scalar Burgers’ equation to test its capability in recovering spatially varying latent physical parameters under nonlinear dynamics,
(18) |
where is the scalar velocity field and denotes the unknown spatially varying viscosity field. In this setting, the convective nonlinearity leads to steep gradients and localized structures, making the recovery of latent viscosity fields from sparse observations particularly challenging. Note that the viscosity is set to be of order ; higher values would suppress the convective dynamics and render the solution diffusion-dominated.
The Im-PiNDiff model is trained using only four snapshots of the velocity field, without any direct supervision on the viscosity. The CNF module is tasked with recovering from the observed dynamics.

Figure 9 shows the results of Im-PiNDiff for the scalar Burgers’ dynamics with spatially varying viscosity. The left panel presents the inferred steady viscosity field in comparison with the ground truth, alongside its corresponding relative error. Although recovering viscosity from indirect state observations is highly ill-posed, particularly in this regime where is of order and exerts only a weak influence on the dynamics, the inferred captures the dominant spatial patterns and exhibits reasonable structural agreement with the true field. Some deviation is observed in regions of low sensitivity, resulting in higher relative errors (up to 28%). The low sensitivity is evident from the velocity prediction results. As shown in the right panel, the model achieves highly accurate predictions of the velocity field, with a relative error around 4%. These results demonstrate that Im-PiNDiff remains robust in recovering hidden spatial parameters while preserving predictive fidelity in nonlinear PDE systems.
Further, the trained model was tested on unseen initial conditions.

As shown in Fig. 10, the model accurately predicts the spatiotemporal evolution of the velocity field across both test cases, achieving relative errors consistently around 4%. These results confirm that the proposed framework generalizes robustly to new initializations, despite being trained under limited data and an ill-posed inference setting.
3.2 Temporal error accumulation and stability analysis
An important motivation for the proposed Im-PiNDiff framework lies in its ability to perform robust learning and stable long-horizon forecasting, while mitigating error accumulation. This advantage stems from the use of implicit autoregressive forward passes, which have long been favored in classical numerical analysis for their superior stability properties, especially in stiff or convection-dominated systems. To systematically quantify and compare the temporal error accumulation behavior, we conduct a controlled study using the advection–diffusion problem with steady advection fields. We benchmark the performance of Im-PiNDiff against its explicit counterpart, referred to as Ex-PiNDiff, which follows the original PiNDiff formulation with a recurrent explicit forward pass [13]. Both models are trained on the same dataset using the same model architecture, differing only in the time-stepping mechanism. We investigate two key aspects: (1) error accumulation over time for a fixed time-step size and (2) sensitivity of the final prediction error to varying temporal resolutions. The results are shown in Fig. 11.


Figure 11(a) reports the accumulation of L1-norm relative error in the scalar field over the simulation horizon of s, comparing three models: Ex-PiNDiff with s (black), Ex-PiNDiff with s (blue), and Im-PiNDiff with s (red). As the simulation progresses, the explicit model with coarse time steps ( s) exhibits significant error growth, reflecting compounding integration and learning errors. In contrast, the implicit Im-PiNDiff maintains a nearly constant error profile throughout the rollout, highlighting its temporal stability. Notably, the implicit model achieves comparable or better accuracy than the explicit model even when trained at a tenfold coarser temporal resolution. Figure 11(b) further quantifies the effect of time-step size on final prediction accuracy by plotting the L1 relative error in at s across various values. The explicit model shows a steep increase in error as increases, underscoring its sensitivity to temporal resolution. In contrast, Im-PiNDiff remains remarkably robust, showing negligible degradation in accuracy even as grows. This result confirms that the implicit layer stabilizes long-horizon predictions, suppresses numerical drift, and allows for larger autoregressive steps without compromising predictive quality.
Taken together, these findings illustrate the key strength of the Im-PiNDiff framework in preserving predictive accuracy and stability over extended time horizons. By decoupling learning stability from time-step resolution, Im-PiNDiff not only ensures physically consistent forecasting but also enables significant computational savings, a benefit explored further in the subsequent section on computational cost analysis.
3.3 Computational cost analysis
To understand the efficiency and scalability of Im-PiNDiff models, we analyze the computational and memory complexity of Im-PiNDiff relative to Ex-PiNDiff under various settings. In the Ex-PiNDiff model, the system evolves through explicit autoregressive updates, forming a forward trajectory of the form,
(19) |
where each transition step requires storing the full state and its associated computational graph for backpropagation through time. In contrast, the Im-PiNDiff model propagates the state implicitly by solving a nonlinear fixed-point equation at each time step. A naive AD-based implementation that unrolls solver iterations per step as,
(20) |
where each denotes the -th iteration of the nonlinear solver at time . This naive Im-PiNDiff approach incurs memory and runtime costs that scale linearly with both the number of time steps and the number of solver iterations . To overcome this limitation, the proposed hybrid training strategy combines adjoint-state methods with reverse-mode AD, avoiding the need to store intermediate iterations by solving a single linear system via the implicit function theorem. The number of solver steps can be determined dynamically via convergence criteria. This approach decouples memory usage from while still retaining the flexibility of accurate, adaptive inner solvers, yielding significant gains in both scalability and efficiency. Table 1 summarizes the computational complexity of different strategies in terms of memory usage and training time.
Memory footprint | Training time | |
---|---|---|
Ex-PiNDiff | ||
Naive Im-PiNDiff | ||
Im-PiNDiff |
The key observation is that the memory footprint of the hybrid Im-PiNDiff model is independent of ,

unlike the naive version. This enables substantial memory savings without compromising expressivity or stability. Furthermore, since is adaptively determined, the total training time is often lower than that of the fixed- naive strategy.
Figure 12 presents the empirical computational cost for the advection–diffusion case, showing peak memory and wall-clock training time per epoch over a range of time-step sizes. For small values, Im-PiNDiff and Ex-PiNDiff exhibit similar memory usage, but the implicit model typically incurs longer training times due to iterative solver overhead. However, Im-PiNDiff supports significantly larger while maintaining predictive accuracy. For instance, to achieve a relative error below , Ex-PiNDiff requires s, whereas Im-PiNDiff remain this accuracy even at s. At this larger step size, the number of rollout steps is reduced by a factor of 20, yielding a 14× reduction in memory usage and a 3× speedup in training time. For most practical problems, the effective inequality always holds, yielding superior performance. In the case of the naive Im-PiNDiff model, the memory usage increases linearly with the specified inner iteration , and for , it consumes around 34 GBs. Since training time scales directly with simulation execution time, the findings of training time are equally applicable to inference time. To further reduce memory requirements, we apply checkpointing to the implicit model (Im-Cp). This technique stores selected intermediate states and recomputes others during the backward pass, significantly reducing memory usage while incurring a modest increase in computational time. As shown in Fig. 12, checkpointing flattens the memory growth curve across increasing simulation lengths, making long-horizon training feasible on resource-constrained hardware.
To further demonstrate the scalability and versatility of the hybrid adjoint-based gradient propagation strategy, we investigated a multi-physics reaction–diffusion problem arising in Chemical Vapor Infiltration (CVI) modeling. Specifically, we applied our approach to the PiNDiff-CVI framework developed in [14], which simulates porous infiltration by solving two tightly coupled PDEs: an elliptic Poisson equation governing steady-state molarity distribution and a hyperbolic transport equation modeling time-evolving deposition dynamics. The elliptic component introduces an additional layer of complexity, as it necessitates an inner numerical solve at each time step, leading to a nested bilevel optimization structure (More details can be found in E). In the original implementation of PiNDiff-CVI [14], the Poisson solver was handled using a fixed number of unrolled iterations, incurring considerable memory overhead during training due to the explicit differentiation of each inner step. In contrast, we replaced this naive backpropagation scheme with our proposed adjoint-based differentiation strategy, which computes gradients through the elliptic solver using the implicit function theorem, thus avoiding the need to store intermediate iterates. For the hyperbolic time integration component, we employed checkpointing to reduce memory requirements by recomputing selected intermediate states during the backward pass. This combination of adjoint differentiation and checkpointing significantly improves both efficiency and scalability. The comparative performance of the original Ex-PiNDiff, Im-PiNDiff, and Im-PiNDiff with checkpointing (Im-Cp) is summarized in Fig. 13.

The hybrid Im-PiNDiff model achieved nearly a 2× reduction in both memory consumption and wall-clock training time relative to the explicit baseline, without sacrificing predictive accuracy. Moreover, the integration of checkpointing further reduced memory usage to approximately 696 MB, an order of magnitude lower than the Ex-PiNDiff baseline, while incurring only a modest increase in computational time. These results highlight the practicality of hybrid gradient propagation strategies for large-scale, stiff, or tightly coupled PDE systems, where traditional AD approaches are often prohibitive.
4 Conclusion
This work presents a hybrid neural-physics modeling framework, termed Im-PiNDiff, for complex spatiotemporal dynamics. By introducing implicit neural differential layers into the PiNDiff architecture, we address the challenge of numerical instability and error accumulation inherent in explicit recurrent formulations. The use of implicit time stepping significantly improves the temporal stability and long-horizon predictive accuracy.
A key innovation of the proposed framework lies in its hybrid gradient propagation strategy, which integrates adjoint-based implicit differentiation with reverse-mode AD. This approach decouples gradient computation from the number of solver iterations, enabling memory-efficient training without compromising accuracy. Moreover, we incorporate checkpointing schemes to further reduce the peak memory footprint, making the framework viable for large-scale, long-time simulations on memory-constrained hardware. Together, these algorithmic advances allow Im-PiNDiff to scale to previously intractable problem regimes, achieving superior efficiency and stability over traditional AD-based implementations.
For latent physics inference, we leverage CNFs to parameterize spatially and temporally varying physical quantities, which enables the model to recover unobserved fields or operators from sparse, indirect measurements, expanding the applicability of PiNDiff models to scenarios where direct supervision is unavailable or limited. Extensive numerical experiments, including linear advection-diffusion, nonlinear Burgers’ dynamics, and multiphysics CVI processes, demonstrate the proposed framework’s effectiveness in both forward and inverse modeling tasks under challenging data and conditions.
Overall, Im-PiNDiff represents a significant step toward enabling stable, efficient, and generalizable hybrid modeling for real-world scientific systems. The combination of implicit architectures, adjoint-based training, and neural field parameterizations offers a flexible and robust paradigm for next-generation scientific machine learning. Future extensions will explore adaptive solvers, stochastic PDEs, and coupling with experimental data streams to further broaden the utility of this framework in data-constrained and multiscale physical modeling settings.
Declaration of competing interests
The authors declare no competing interests.
Data availability
All data needed to evaluate the conclusions in the paper are either present in the paper or can be regenerated by the code provided.
Acknowledgment
The authors would like to acknowledge the funds from the Air Force Office of Scientific Research (AFOSR), United States of America under award number FA9550-22-1-0065. JXW would also like to acknowledge the funding support from the Office of Naval Research under award number N00014-23-1-2071 and the National Science Foundation under award number OAC-2047127 in supporting this study.
References
- [1] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019) 686–707.
- [2] L. Sun, H. Gao, S. Pan, J.-X. Wang, Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data, Computer Methods in Applied Mechanics and Engineering 361 (2020) 112732.
- [3] L. Sun, J.-X. Wang, Physics-constrained bayesian neural network for fluid flow reconstruction with sparse and noisy data, Theoretical and Applied Mechanics Letters 10 (3) (2020) 161–169.
- [4] Z. Li, N. B. Kovachki, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, A. Anandkumar, et al., Fourier neural operator for parametric partial differential equations, in: International Conference on Learning Representations, 2021.
- [5] L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via deeponet based on the universal approximation theorem of operators, Nature Machine Intelligence 3 (3) (2021) 218–229.
- [6] S. Wang, H. Wang, P. Perdikaris, Learning the solution operator of parametric partial differential equations with physics-informed deeponets, Science advances 7 (40) (2021) eabi8605.
- [7] S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the national academy of sciences 113 (15) (2016) 3932–3937.
- [8] Z. Chen, Y. Liu, H. Sun, Physics-informed learning of governing equations from scarce data, Nature communications 12 (1) (2021) 1–13.
- [9] L. Sun, D. Z. Huang, H. Sun, J.-X. Wang, Bayesian spline learning for equation discovery of nonlinear dynamics with quantified uncertainty, in: NeurIPS, PMLR, 2022.
- [10] D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, S. Hoyer, Machine learning–accelerated computational fluid dynamics, Proceedings of the National Academy of Sciences 118 (21) (2021) e2101784118.
- [11] X.-Y. Liu, M. Zhu, L. Lu, H. Sun, J.-X. Wang, Multi-resolution partial differential equations preserved learning framework for spatiotemporal dynamics, Communications Physics 7 (1) (2024) 31.
- [12] X. Fan, J.-X. Wang, Differentiable hybrid neural modeling for fluid-structure interaction, Journal of Computational Physics 496 (2024) 112584.
- [13] D. Akhare, T. Luo, J.-X. Wang, Physics-integrated neural differentiable (PiNDiff) model for composites manufacturing, Computer Methods in Applied Mechanics and Engineering 406 (2023) 115902.
- [14] D. Akhare, Z. Chen, R. Gulotty, T. Luo, J.-X. Wang, Probabilistic physics-integrated neural differentiable modeling for isothermal chemical vapor infiltration process, npj Computational Materials 10 (1) (2024) 120.
- [15] D. Akhare, T. Luo, J.-X. Wang, Diffhybrid-uq: uncertainty quantification for differentiable hybrid neural modeling, arXiv preprint arXiv:2401.00161 (2023).
- [16] X. Fan, D. Akhare, J.-X. Wang, Neural differentiable modeling with diffusion-based super-resolution for two-dimensional spatiotemporal turbulence, arXiv preprint arXiv:2406.20047 (2024).
- [17] J. Tompson, K. Schlachter, P. Sprechmann, K. Perlin, Accelerating eulerian fluid simulation with convolutional networks, in: International Conference on Machine Learning, PMLR, 2017, pp. 3424–3433.
- [18] R. Vinuesa, S. L. Brunton, Enhancing computational fluid dynamics with machine learning, Nature Computational Science 2 (6) (2022) 358–366.
- [19] N. Margenberg, R. Jendersie, C. Lessig, T. Richter, Dnn-mg: A hybrid neural network/finite element method with applications to 3d simulations of the navier–stokes equations, Computer Methods in Applied Mechanics and Engineering 420 (2024) 116692.
- [20] J. Wang, J. Wu, H. Xiao, A physics-informed machine learning approach of improving rans predicted reynolds stresses, in: 55th AIAA aerospace sciences meeting, 2017, p. 1712.
- [21] K. Duraisamy, G. Iaccarino, H. Xiao, Turbulence modeling in the age of data, Annual Review of Fluid Mechanics 51 (2019) 357–377.
- [22] J.-X. Wang, J. Huang, L. Duan, H. Xiao, Prediction of reynolds stresses in high-mach-number turbulent boundary layers using physics-informed machine learning, Theoretical and Computational Fluid Dynamics 33 (1) (2019) 1–19.
- [23] L. Zanna, T. Bolton, Data-driven equation discovery of ocean mesoscale closures, Geophysical Research Letters 47 (17) (2020) e2020GL088376.
- [24] M. A. Mendez, A. Ianiro, B. R. Noack, S. L. Brunton, Data-Driven Fluid Mechanics: Combining First Principles and Machine Learning, Cambridge University Press, 2023.
- [25] R. Newbury, J. Collins, K. He, J. Pan, I. Posner, D. Howard, A. Cosgun, A review of differentiable simulators, IEEE Access (2024).
- [26] B. List, L.-W. Chen, K. Bali, N. Thuerey, Differentiability in unrolled training of neural physics simulators on transient dynamics, Computer Methods in Applied Mechanics and Engineering 433 (2025) 117441.
- [27] A. M. Schweidtmann, D. Zhang, M. von Stosch, A review and perspective on hybrid modelling methodologies, Digital Chemical Engineering (2023) 100136.
- [28] M. Innes, A. Edelman, K. Fischer, C. Rackauckas, E. Saba, V. B. Shah, W. Tebbutt, A differentiable programming system to bridge machine learning and scientific computing, arXiv preprint arXiv:1907.07587 (2019).
- [29] F. D. A. Belbute-Peres, T. Economon, Z. Kolter, Combining differentiable pde solvers and graph neural networks for fluid flow prediction, in: international conference on machine learning, PMLR, 2020, pp. 2402–2411.
- [30] D. Z. Huang, K. Xu, C. Farhat, E. Darve, Learning constitutive relations from indirect observations using deep neural networks, Journal of Computational Physics 416 (2020) 109491.
- [31] B. List, L.-W. Chen, N. Thuerey, Learned turbulence modelling with differentiable fluid solvers: physics-based loss functions and optimisation horizons, Journal of Fluid Mechanics 949 (2022) A25.
- [32] X. Fan, J.-X. Wang, Differentiable hybrid neural modeling for fluid-structure interaction, Journal of Computational Physics 496 (2024) 112584.
- [33] C. C. Margossian, A review of automatic differentiation and its efficient implementation, Wiley interdisciplinary reviews: data mining and knowledge discovery 9 (4) (2019) e1305.
- [34] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in machine learning: a survey, Journal of machine learning research 18 (153) (2018) 1–43.
- [35] M. Blondel, V. Roulet, The elements of differentiable programming, arXiv preprint arXiv:2403.14606 (2024).
- [36] S. Bai, J. Z. Kolter, V. Koltun, Deep equilibrium models, Advances in neural information processing systems 32 (2019).
- [37] J. Pan, J. H. Liew, V. Y. Tan, J. Feng, H. Yan, Adjointdpm: Adjoint sensitivity method for gradient backpropagation of diffusion probabilistic models, arXiv preprint arXiv:2307.10711 (2023).
- [38] T. Matsubara, Y. Miyatake, T. Yaguchi, The symplectic adjoint method: Memory-efficient backpropagation of neural-network-based differential equations, IEEE Transactions on Neural Networks and Learning Systems (2023).
- [39] K. Fidkowski, Adjoint-based adaptive training of deep neural networks, in: AIAA AVIATION 2021 FORUM, 2021, p. 2904.
- [40] R. T. Chen, Y. Rubanova, J. Bettencourt, D. K. Duvenaud, Neural ordinary differential equations, Advances in neural information processing systems 31 (2018).
- [41] Y. Xie, T. Takikawa, S. Saito, O. Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V. Sitzmann, S. Sridhar, Neural fields in visual computing and beyond, in: Computer Graphics Forum, Vol. 41, Wiley Online Library, 2022, pp. 641–676.
- [42] V. Sitzmann, J. Martel, A. Bergman, D. Lindell, G. Wetzstein, Implicit neural representations with periodic activation functions, Advances in neural information processing systems 33 (2020) 7462–7473.
- [43] P. Du, M. H. Parikh, X. Fan, X.-Y. Liu, J.-X. Wang, Conditional neural field latent diffusion model for generating spatiotemporal turbulence, Nature Communications 15 (1) (2024) 10416.
Appendix A Backpropagation for Autoregressive model

Consider a one-step function with called at every time step to generated the temporal dynamics. Ex-PiNDiff model’s forward computation to generate temporal dynamics up to time , can be expressed as:
(21) |
or
(22) |
Let’s represent the forward trajectory of states as
(23) |
The total loss is defined over the entire rollout trajectory of steps as
(24) |
and the gradient of loss with respect to can be computed using the chain rule,
(25) |
where
(26a) | |||
(26b) |
Here and are VJPs, and represent Jacobian matrix.
Appendix B Gradient backpropagation for implicit-PiNDiff with naive AD
In case of Im-PiNDiff, we employ a method to find the state by finding solution for at each time step, which can be expressed as
(27) |
or
(28) |
The iterative solver creates a sequence of guesses
(29) |
that converge to final solution satisfying . For Naive AD, we can consider a fixed number of iterations for algorithm. Here, the forward propagation of state looks like
(30) |
and the total loss defined over the entire rollout trajectory of steps will be,
(31) |
The gradient of loss with respect to for naive Im-PiNDiff is
(32) |
where the AD will store every intermediate iterates at every time step in addition to storing at every time step, resulting in substantial memory overhead.
Appendix C Derivation of adjoint-based VJP for implicit layers
To save memory, we need to compute without needing to save the intermediate iterates . Recall . So we need to compute VJPs: and for the implicit layer. The joint-state method provides a way to compute these VJPs as a solution of the linear system. The derivation of the linear equation for obtaining VJPs is obtained using the implicit function theorem.
Implicit function theorem:[35]
For a function , that is continuously differentiable function in a neighborhood of with and is invertible, then there exists a neighbourhood of in which there is a function such that
-
•
-
•
for all in the neighborhood,
-
•
Conisder with as , the implicit function theorem provides
(33a) | |||
(33b) |
Multilpy row vector on left side, we get
(34a) | |||
(34b) |
Now we define the adjoint vector as,
(35) |
which is obtained by solving the linear system,
(36) |
This linear equation is solved efficiently using iterative numerical linear solvers (such as GMRES or conjugate gradient methods) Finally, the VJPs are expressed as,
(37a) | |||
(37b) |
We can compute for every implicit layer without needing to save the intermediate iterates, thereby reducing the memory requirement for the AD backpropagation.
Appendix D Spatiotemporal fields using a combination of cosine functions
In order to make inference challenging, a combination of cosine functions is superimposed to generate a more complex spatio-temporal field for advection and viscosity.
(38) |
where are randomly sampled.
Appendix E PiNDiff modeling for chemical vapor infiltration
Here we talk in more detail about the reaction-diffusion system, chemical vapor infiltration (CVI) used in this study. CVI is a materials processing technique used to fabricate composite materials. In this method, a porous preform—usually made of fibers like carbon is exposed to a reactive gas mixture at elevated temperatures inside a furnace. The gaseous precursors diffuse into the pores of the preform and undergo chemical reactions, typically decomposition or reduction, to deposit a solid material onto the internal surfaces of the preform. Over time, this gradual deposition fills the pores and forms a dense matrix around the fibers without significantly disturbing the structure. CVI allows for precise control over material composition and microstructure, making it suitable for producing high-temperature, corrosion-resistant components used in aerospace, energy, and defense applications. A PiNDiff-CVI model based on foundational physics was developed to simulate the CVI[14] process, whose equations are given as
(39a) | |||
(39b) |
In the above equations, denotes the effective molarity field (mol m-3) of all reactive gases, is the porosity of the preform, represent a constant stichometric coefficient, is the molar mass (kg mol-1), and is the density (kg m-3) of the deposited solid (carbon or SiC). represents the effective diffusion coefficient field, is the deposition reaction rate, and corresponds to the surface-to-volume ratio.

The details regarding this solver are described by Akhare et al.[14]. In the PiNDiff-CVI model, the underlying transport and reaction functions were unknown and modeled as the operators , , and using DNNs as shown in Fig. 15. Equation 39ba is an elliptic Poisson equation governing steady-state molarity distribution, solved iteratively at each time step using the point-Jacobi method (inner optimization). While Equation 39bb is a hyperbolic transport equation modeling time-evolving deposition dynamics, solved explicitly using the Euler time integration scheme. Previously, a naive approach involved using a fixed number of iterations for solving Equation 39ba was employed, necessary for constructing a static computational graph required for gradient calculations, resulting in large memory usage. By leveraging adjoint-based backpropagation, we eliminate the need to save the intermediate iterates, thereby reducing memory usage.