Temporal Gaussian Copula For Clinical Multivariate Time Series Data Imputation
thanks: *Lin Chen and Yuwen Chen are the corresponding authors. This research is supported in part by the National Nature Science Foundation of China (82030066 and 62371438) and also by the Chongqing Municipal Education Commission (HZ2021008, 24XJC630014, HZ2021017)

Ye Su Chongqing Institute of
Green and Intelligent Technology, CAS

University of Chinese Academy of Sciences
Chongqing, China
[email protected]
   Hezhe Qiao Singapore Management University
Singapore, Singapore
[email protected]
   Di Wu Southwest University
Chongqing, China
[email protected] {@IEEEauthorhalign} Yuwen Chen Chongqing Institute of
Green and Intelligent Technology, CAS

Chongqing, China
[email protected]
   Lin Chen Chongqing Institute of
Green and Intelligent Technology, CAS

Chongqing, China
[email protected]
Abstract

The imputation of the Multivariate time series (MTS) is particularly challenging since the MTS typically contains irregular patterns of missing values due to various factors such as instrument failures, interference from irrelevant data, and privacy regulations. Existing statistical methods and deep learning methods have shown promising results in time series imputation. In this paper, we propose a Temporal Gaussian Copula Model (TGC) for three-order MTS imputation. The key idea is to leverage the Gaussian Copula to explore the cross-variable and temporal relationships based on the latent Gaussian representation. Subsequently, we employ an Expectation-Maximization (EM) algorithm to improve robustness in managing data with varying missing rates. Comprehensive experiments were conducted on three real-world MTS datasets. The results demonstrate that our TGC substantially outperforms the state-of-the-art imputation methods. Additionally, the TGC model exhibits stronger robustness to the varying missing ratios in the test dataset. Our code is available at https://github.com/MVL-Lab/TGC-MTS.

Index Terms:
Multivariate Time Series Data, Gaussian Copula, Electronic Health Record, Temporal Missing Data Imputation.

I Introduction

Refer to caption

Figure 1: The framework of TGC.

As a prevalent data in Electronic Health Records (EHR), clinical multivariate time series (MTS) data is widely used for various tasks for biomedical analysis, such as evaluating treatment effects [1], morbidity forecasts [2], and mortality projections [3, 4]. However, clinical MTS in the real world often contains many missing values due to various factors such as instrument failures, interference from irrelevant data, and privacy regulations [5], which presents a significant challenge to the application of existing machine learning methods and further impacts the accuracy of downstream analyses and decision-making. Therefore, developing an accurate and effective strategy to handle missing data in MTS is essential for accurate clinical diagnosis.

While existing statistical and deep learning approaches [6, 7, 8] have shown promise in clinical MTS imputation, several challenges remain: (1) The complex relationships among clinical values at different time points are not fully captured. (2) Irregular and non-continuous recording intervals among patients can bias EHR data distribution, and current methods often struggle with varying sampling densities, resulting in suboptimal performance.

To tackle these limitations, we developed a cross-variable and longitudinal imputation approach, named Temporal Gaussian Copula (TGC), based on a Gaussian Copula operation to fully explore the complex relation between different clinical values and time-points. We utilize a conditional expectation formula based on EM optimization to ensure that all observed values are fully leveraged in imputing missing values, thereby accommodating any sampling density. Specifically, we first unfold the 3D clinical data as a single matrix by concatenating the clinical longitudinal information of the patients. Then, we employ the Gaussian copula to estimate complex multivariate distributions of variable-by-time correlations through the feature transformations of the latent Gaussian vectors. Finally, the model was solved using an effective approximate Expectation-Maximization (EM) algorithm, and the missing value was imputed using a defined inverse potential Gaussian vector transform. We conducted extensive experiments on three real-world medical MTS datasets at varying missing rates to evaluate the performance of TGC. The experimental results demonstrate that the TGC model exhibits the best average performance across different missing rates.

II Methodology

The overview of TGC is illustrated in Fig 1. The TGC is primarily divided into four modules: (1) Three-order Tensor Unfolding. (2) Latent Gaussian Representation. (3) Covariance Matrix Construction. (4) Missing Value Estimation and Imputation. Subsequently, a detailed exposition of each constituent of the TGC will be presented.

II-A Three-order Tensor Unfolding

We assume the MTS for a specific patient 𝐗iT×Fsubscript𝐗𝑖superscript𝑇𝐹{\bf{X}}_{i}\in\mathbb{R}^{T\times F}bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × italic_F end_POSTSUPERSCRIPT, following the matrix Gaussian distribution with the parameters including mean matrix 𝐌T×F𝐌superscript𝑇𝐹\mathbf{M}\in\mathbb{R}^{T\times F}bold_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × italic_F end_POSTSUPERSCRIPT, temporal covariance 𝐔T×T𝐔superscript𝑇𝑇\mathbf{U}\in\mathbb{R}^{T\times T}bold_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × italic_T end_POSTSUPERSCRIPT, and covariance matrix 𝐂F×F𝐂superscript𝐹𝐹\mathbf{C}\in\mathbb{R}^{F\times F}bold_C ∈ blackboard_R start_POSTSUPERSCRIPT italic_F × italic_F end_POSTSUPERSCRIPT, which is formulated as the following:

𝒫(𝐗i𝐌,𝐔,𝐂)𝒫conditionalsubscript𝐗𝑖𝐌𝐔𝐂\displaystyle\mathcal{P}({{\bf{X}}_{i}}\mid{\bf{M}},{\bf{U}},{\bf{C}})caligraphic_P ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ bold_M , bold_U , bold_C )
=(2π)TF2|𝐂|T2|𝐔|F2e12Tr[𝐂1(𝐗i𝐌)𝐔1(𝐗i𝐌)].absentsuperscript2𝜋𝑇𝐹2superscript𝐂𝑇2superscript𝐔𝐹2superscript𝑒12Trsuperscript𝐂1superscriptsubscript𝐗𝑖𝐌topsuperscript𝐔1subscript𝐗𝑖𝐌\displaystyle=(2\pi)^{-\frac{TF}{2}}|{\bf{C}}|^{-\frac{T}{2}}|{{\bf{U}}}|^{-% \frac{F}{2}}e^{-\frac{1}{2}\operatorname{Tr}\left[{{\bf{C}}}^{-1}({\bf{X}}_{i}% -{{\bf{M}}})^{\top}{{\bf{U}}}^{-1}({\bf{X}}_{i}-{{\bf{M}}})\right]}.= ( 2 italic_π ) start_POSTSUPERSCRIPT - divide start_ARG italic_T italic_F end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT | bold_C | start_POSTSUPERSCRIPT - divide start_ARG italic_T end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT | bold_U | start_POSTSUPERSCRIPT - divide start_ARG italic_F end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Tr [ bold_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_M ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_M ) ] end_POSTSUPERSCRIPT . (1)

Thus, we can establish the temporal cross-variable correlations for missing data estimation by computing the covariance matrix 𝚺𝚺\mathbf{\Sigma}bold_Σ, using the matrices 𝐂𝐂\mathbf{C}bold_C and 𝐔𝐔\mathbf{U}bold_U.

However, estimating high-dimensional parameters of 𝐌𝐌\mathbf{M}bold_M, 𝐂𝐂\mathbf{C}bold_C, and 𝐔𝐔\mathbf{U}bold_U is challenging due to the complexities inherent in matrix Gaussian. Therefore, to apply a simple yet effective missing data estimation approach, we can unfold the 3D tensor 𝒳𝒳{\cal{X}}caligraphic_X into a single 2D matrix 𝐕N×M𝐕superscript𝑁𝑀\mathbf{V}\in\mathbb{R}^{N\times M}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_M end_POSTSUPERSCRIPT which is formulated as:

𝐕={vec(𝐗1),vec(𝐗2),,vec(𝐗N)},𝐕𝑣𝑒𝑐subscript𝐗1𝑣𝑒𝑐subscript𝐗2𝑣𝑒𝑐subscript𝐗𝑁\displaystyle\mathbf{V}=\{vec({\mathbf{X}}_{1}),vec({\mathbf{X}}_{2}),...,vec(% {\mathbf{X}}_{N})\},bold_V = { italic_v italic_e italic_c ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_v italic_e italic_c ( bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_v italic_e italic_c ( bold_X start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) } , (2)

where M=T×F𝑀𝑇𝐹M=T\times Fitalic_M = italic_T × italic_F, and vec(𝐗i)𝑣𝑒𝑐subscript𝐗𝑖vec(\mathbf{X}_{i})italic_v italic_e italic_c ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the vectorization operator that transforms matrix 𝐗isubscript𝐗𝑖\mathbf{X}_{i}bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into a column vector. Given each 𝐯iMsubscript𝐯𝑖superscript𝑀{\mathbf{v}}_{i}\in{\mathbb{R}}^{M}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT follows the multivariate normal distribution, which is formulated as the following:

𝒫(𝐯i|μ,𝚺)=(2π)M2|𝚺|12e12(𝐯iμ)𝚺1(𝐯iμ),𝒫conditionalsubscript𝐯𝑖𝜇𝚺superscript2𝜋𝑀2superscript𝚺12superscript𝑒12superscriptsubscript𝐯𝑖𝜇topsuperscript𝚺1subscript𝐯𝑖𝜇\displaystyle\mathcal{P}({\bf{v}}_{i}|\mathbf{\mu},\mathbf{\Sigma})=(2\pi)^{-% \frac{M}{2}}\lvert{\mathbf{\Sigma}}\rvert^{-\frac{1}{2}}e^{-\frac{1}{2}({\bf{v% }}_{i}-\mathbf{\mu})^{\top}\mathbf{\Sigma}^{-1}({\bf{v}}_{i}-\mathbf{\mu})},caligraphic_P ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_μ , bold_Σ ) = ( 2 italic_π ) start_POSTSUPERSCRIPT - divide start_ARG italic_M end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT | bold_Σ | start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ ) end_POSTSUPERSCRIPT , (3)

where μ𝜇\mathbf{\mu}italic_μ is the mean vector and 𝚺𝚺\mathbf{\Sigma}bold_Σ is the corresponding covariance matrix. In this way, we only need to estimate μ𝜇\mathbf{\mu}italic_μ and 𝚺𝚺\mathbf{\Sigma}bold_Σ using the multivariate normal distribution, which reduces the computational burden of estimating irrelevant parameters. After this transformation, the missing data imputation for MTS becomes that filling in the missing sample 𝐕msuperscript𝐕𝑚\mathbf{V}^{m}bold_V start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT given the observed 𝐕osuperscript𝐕𝑜\mathbf{V}^{o}bold_V start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT.

II-B Latent Gaussian Representation

We then employ a mapping function ()\mathcal{F}(\cdot)caligraphic_F ( ⋅ ) to transform the variables of 𝐯isubscript𝐯𝑖{\mathbf{v}}_{i}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into a latent Gaussian representation with Gaussian copula, which is formulated as:

zij=j(vij),subscript𝑧𝑖𝑗subscript𝑗subscript𝑣𝑖𝑗z_{ij}=\mathcal{F}_{j}(v_{ij}),italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) , (4)

where zijsubscript𝑧𝑖𝑗z_{ij}italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the latent variable corresponding to vijsubscript𝑣𝑖𝑗v_{ij}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. j()subscript𝑗\mathcal{F}_{j}(\cdot)caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ) represents the mapping function defined on the j𝑗jitalic_j-th variable in 𝐕𝐕\bf{V}bold_V. Each vij𝐕subscript𝑣𝑖𝑗𝐕v_{ij}\in\bf{V}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ bold_V follows a multivariate Gaussian distribution 𝒩(μ,𝚺)𝒩𝜇𝚺\mathcal{N}({\bf{\mu}},{\bf{\Sigma}})caligraphic_N ( italic_μ , bold_Σ ).

Based on the latent Gaussian representation 𝐳isubscript𝐳𝑖{\bf{z}}_{i}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the prior probability distribution 𝒫(𝐯i|μ,𝚺)𝒫conditionalsubscript𝐯𝑖𝜇𝚺\mathcal{P}({\bf{v}}_{i}|\mathbf{\mu},\mathbf{\Sigma})caligraphic_P ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_μ , bold_Σ ) can be flexible represented by avoiding μ𝜇\bf{\mu}italic_μ, which is formulated as:

𝒫(𝐳i|𝚺)=(2π)M2|Σ|12e12𝐳iΣ1𝐳i,𝒫conditionalsubscript𝐳𝑖𝚺superscript2𝜋𝑀2superscriptΣ12superscript𝑒12superscriptsubscript𝐳𝑖topsuperscriptΣ1subscript𝐳𝑖\displaystyle\mathcal{P}({{\bf{z}}_{i}|\mathbf{\Sigma}})=(2\pi)^{-\frac{M}{2}}% \lvert{\Sigma}\rvert^{-\frac{1}{2}}e^{-\frac{1}{2}{\bf{z}}_{i}^{\top}\Sigma^{-% 1}{\bf{z}}_{i}},caligraphic_P ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_Σ ) = ( 2 italic_π ) start_POSTSUPERSCRIPT - divide start_ARG italic_M end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT | roman_Σ | start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (5)

where 𝐳i=[zi1,zi2,,ziM]Msubscript𝐳𝑖subscript𝑧𝑖1subscript𝑧𝑖2subscript𝑧𝑖𝑀superscript𝑀{\bf{z}}_{i}=[z_{i1},z_{i2},...,z_{iM}]\in\mathbb{R}^{M}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_z start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_i italic_M end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT which can be obtained by the mapping 𝐯isubscript𝐯𝑖{\bf{v}}_{i}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The mapping function j()subscript𝑗\mathcal{F}_{j}(\cdot)caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ) is implemented by:

j=Φ1Pj,subscript𝑗superscriptΦ1subscript𝑃𝑗\displaystyle\mathcal{F}_{j}=\Phi^{-1}\circ P_{j},caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (6)

where ΦΦ\Phiroman_Φ is the standard normal cumulative distribution function (CDF). Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the CDF corresponding to the j𝑗jitalic_j-th variable.

The ΦΦ\Phiroman_Φ is strictly monotonically increasing, making Φ1(x)superscriptΦ1𝑥\Phi^{-1}(x)roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x ) exists, which is typically defined as the following:

Φ(x)=12πxet22𝑑t.Φ𝑥12𝜋superscriptsubscript𝑥superscript𝑒superscript𝑡22differential-d𝑡\displaystyle\Phi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}e^{-\frac{t^{2}}{2% }}dt.roman_Φ ( italic_x ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG end_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d italic_t . (7)

The pjsubscript𝑝𝑗p_{j}italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT manifests variations attributable to the diverse distributions of vijsubscript𝑣𝑖𝑗{v}_{ij}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. For example, if vijsubscript𝑣𝑖𝑗{v}_{ij}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT follows a normal distribution N(μ,σ)𝑁𝜇𝜎N(\mu,\sigma)italic_N ( italic_μ , italic_σ ), it makes the uij=Pj(𝐯ij)subscript𝑢𝑖𝑗subscript𝑃𝑗subscript𝐯𝑖𝑗u_{ij}=P_{j}({\bf{v}}_{ij})italic_u start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) follows a uniform distribution on [0,1]01[0,1][ 0 , 1 ], which is formulated as the following:

Pj(x)=1σ2πxe(tμ)22σ2𝑑t.subscript𝑃𝑗𝑥1𝜎2𝜋superscriptsubscript𝑥superscript𝑒superscript𝑡𝜇22superscript𝜎2differential-d𝑡\displaystyle P_{j}(x)=\frac{1}{\sigma\sqrt{2\pi}}\int_{-\infty}^{x}e^{-\frac{% (t-\mu)^{2}}{2\sigma^{2}}}dt.italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG 1 end_ARG start_ARG italic_σ square-root start_ARG 2 italic_π end_ARG end_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG ( italic_t - italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT italic_d italic_t . (8)

Similarly, through zij=Φ1(uij)subscript𝑧𝑖𝑗superscriptΦ1subscript𝑢𝑖𝑗z_{ij}=\Phi^{-1}(u_{ij})italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ), the uniform distribution can be transformed into a standard normal distribution N(0,1)𝑁01N(0,1)italic_N ( 0 , 1 ).

In order to map 𝐳ijsubscript𝐳𝑖𝑗{{\bf{z}}_{ij}}bold_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT to 𝐯ijsubscript𝐯𝑖𝑗{{\bf{v}}_{ij}}bold_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT back after imputing, we need define j1superscriptsubscript𝑗1\mathcal{F}_{j}^{-1}caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT for all features. To accommodate the presence of outliers, vijsubscript𝑣𝑖𝑗{v_{ij}}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT can take any value within the interval [,+][-\infty,+\infty][ - ∞ , + ∞ ]. Therefore, for continuous variables, the probability density function is always greater than 0, and CDF is strictly monotonically increasing. By definition, the inverse of a strictly monotonically increasing function exists, hence Pj1superscriptsubscript𝑃𝑗1P_{j}^{-1}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is present.

However, if the variable vij{1,2,,K}subscript𝑣𝑖𝑗12𝐾v_{ij}\in\{1,2,...,K\}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ { 1 , 2 , … , italic_K }, then CDF P(vijk+γ)=P(vijk),k{1,2,K},γ(0,1)formulae-sequence𝑃subscript𝑣𝑖𝑗𝑘𝛾𝑃subscript𝑣𝑖𝑗𝑘formulae-sequence𝑘12𝐾𝛾01P(v_{ij}\leq k+\gamma)=P(v_{ij}\leq k),k\in\{1,2,...K\},\gamma\in(0,1)italic_P ( italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≤ italic_k + italic_γ ) = italic_P ( italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≤ italic_k ) , italic_k ∈ { 1 , 2 , … italic_K } , italic_γ ∈ ( 0 , 1 ), meaning it only satisfies monotonically increasing. Therefore, we need to define a specific Pj1superscriptsubscript𝑃𝑗1P_{j}^{-1}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT to ensure the existence of the inverse mapping.

When considering a variable vijsubscript𝑣𝑖𝑗v_{ij}italic_v start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT that belongs to the set {1,2,K}12𝐾\{1,2,...K\}{ 1 , 2 , … italic_K }, where its CDF is Pjsubscript𝑃𝑗P_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and Φ(zij)Φsubscript𝑧𝑖𝑗\Phi(z_{ij})roman_Φ ( italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) falls within the interval [Pj(k1),Pj(k)]subscript𝑃𝑗𝑘1subscript𝑃𝑗𝑘[P_{j}(k-1),P_{j}(k)][ italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_k - 1 ) , italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_k ) ] for k{1,2,,K}𝑘12𝐾k\in\{1,2,...,K\}italic_k ∈ { 1 , 2 , … , italic_K }, we defines j1(zij)=Pj1(Φ(zij))=ksuperscriptsubscript𝑗1subscript𝑧𝑖𝑗superscriptsubscript𝑃𝑗1Φsubscript𝑧𝑖𝑗𝑘\mathcal{F}_{j}^{-1}(z_{ij})=P_{j}^{-1}(\Phi(z_{ij}))=kcaligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_Φ ( italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) = italic_k.

So far, we define the hidden Gaussian vector mapping j()subscript𝑗\mathcal{F}_{j}(\cdot)caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ) and its inverse mapping j1()subscriptsuperscript1𝑗\mathcal{F}^{-1}_{j}(\cdot)caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ) for MTS data.

II-C Covariance Matrix Construction

To fully explore the relation of latent variables, we leverage the maximum likelihood estimation (MLE) to estimate the correlation matrix 𝚺𝚺\bf{\Sigma}bold_Σ, which is formulated as the following:

(𝚺;𝐙)=c12log(|𝚺|)12Tr[𝚺11Ni=1N𝐳i(𝐳i)],𝚺𝐙𝑐12𝚺12Trdelimited-[]superscript𝚺11𝑁superscriptsubscript𝑖1𝑁subscript𝐳𝑖superscriptsubscript𝐳𝑖top\begin{array}[]{l}\ell({\bf{\Sigma}};{\bf{Z}})=c-\frac{1}{2}\log(|{\bf{\Sigma}% }|)-\frac{1}{2}{\mathop{\rm Tr}\nolimits}\left[{{{\bf{\Sigma}}^{-1}}\frac{1}{N% }\sum\limits_{i=1}^{N}{{{\bf{z}}_{i}}}{{\left({{{\bf{z}}_{i}}}\right)}^{\top}}% }\right],\end{array}start_ARRAY start_ROW start_CELL roman_ℓ ( bold_Σ ; bold_Z ) = italic_c - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log ( | bold_Σ | ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Tr [ bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] , end_CELL end_ROW end_ARRAY (9)

where c𝑐citalic_c is a constant, N𝑁Nitalic_N denotes the number of patients.

When 𝐙𝐙\mathbf{Z}bold_Z represents complete data, by maximize Equation 9, we can derive the expression for 𝚺𝚺\mathbf{\Sigma}bold_Σ as follows:

𝚺^=1Ni=1N𝐳i𝐳i.^𝚺1𝑁superscriptsubscript𝑖1𝑁subscript𝐳𝑖superscriptsubscript𝐳𝑖top\displaystyle\hat{\mathbf{\Sigma}}=\frac{1}{N}\sum_{i=1}^{N}\mathbf{z}_{i}% \mathbf{z}_{i}^{\top}.over^ start_ARG bold_Σ end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT . (10)

It is worth noting that 𝐙𝐙{\bf{Z}}bold_Z contains missing values. We maximize the joint probability of between 𝚺𝚺\mathbf{\Sigma}bold_Σ and observed value 𝐙osuperscript𝐙𝑜{\bf{Z}}^{o}bold_Z start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT in the 𝐙𝐙\bf{Z}bold_Z.

o(𝚺;𝐙o)=1Ni=1Nz𝐳io𝒫(z;𝚺o,o)𝑑z,subscript𝑜𝚺superscript𝐙𝑜1𝑁superscriptsubscript𝑖1𝑁subscript𝑧superscriptsubscript𝐳𝑖𝑜𝒫𝑧subscript𝚺𝑜𝑜differential-d𝑧\displaystyle\ell_{o}({\bf{\Sigma}};{\bf{Z}}^{o})=\frac{1}{N}\sum_{i=1}^{N}% \int_{z\in{\bf{z}}_{i}^{{{o}}}}\mathcal{P}(z;\mathbf{\Sigma}_{o,o})\ dz,roman_ℓ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( bold_Σ ; bold_Z start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_z ∈ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_P ( italic_z ; bold_Σ start_POSTSUBSCRIPT italic_o , italic_o end_POSTSUBSCRIPT ) italic_d italic_z , (11)

where 𝚺o,osubscript𝚺𝑜𝑜{\bf{\Sigma}}_{o,o}bold_Σ start_POSTSUBSCRIPT italic_o , italic_o end_POSTSUBSCRIPT are submatrices of 𝚺𝚺{\bf{\Sigma}}bold_Σ, with rows and columns corresponding to (𝐳io,𝐳io)superscriptsubscript𝐳𝑖𝑜superscriptsubscript𝐳𝑖𝑜({\bf{z}}_{i}^{o},{\bf{z}}_{i}^{o})( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ). The optimal Σ^^Σ\hat{\Sigma}over^ start_ARG roman_Σ end_ARG can be obtained by maximizing the joint probability.

𝚺^=argmax𝚺o(𝚺;𝐙o).^𝚺subscript𝚺subscript𝑜𝚺superscript𝐙𝑜\displaystyle\hat{\bf{\Sigma}}=\arg\max\limits_{\bf{\Sigma}}{\ell_{o}({\bf{% \Sigma}};{\bf{Z}}^{o})}.over^ start_ARG bold_Σ end_ARG = roman_arg roman_max start_POSTSUBSCRIPT bold_Σ end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( bold_Σ ; bold_Z start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) . (12)

II-D Missing Value Estimation

The missing values 𝐳imsuperscriptsubscript𝐳𝑖𝑚{\bf{z}}_{i}^{m}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT for the i𝑖iitalic_i-th patient can be obtained based on its observed values 𝐳iosuperscriptsubscript𝐳𝑖𝑜{\bf{z}}_{i}^{o}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT and covariance matrix 𝚺𝚺\bf{\Sigma}bold_Σ:

𝔼[𝐳im|𝐳io,𝚺]=𝚺m,o𝚺o,o1𝐳io,𝔼delimited-[]conditionalsuperscriptsubscript𝐳𝑖𝑚superscriptsubscript𝐳𝑖𝑜𝚺subscript𝚺𝑚𝑜subscriptsuperscript𝚺1𝑜𝑜superscriptsubscript𝐳𝑖𝑜\displaystyle\mathbb{E}[{\bf{z}}_{i}^{m}|{\bf{z}}_{i}^{o},{\bf{\Sigma}}]={\bf{% \Sigma}}_{m,o}{\bf{\Sigma}}^{-1}_{o,o}{\bf{z}}_{i}^{o},blackboard_E [ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_Σ ] = bold_Σ start_POSTSUBSCRIPT italic_m , italic_o end_POSTSUBSCRIPT bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o , italic_o end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , (13)

where 𝚺m,osubscript𝚺𝑚𝑜{\bf{\Sigma}}_{m,o}bold_Σ start_POSTSUBSCRIPT italic_m , italic_o end_POSTSUBSCRIPT and 𝚺o,osubscript𝚺𝑜𝑜{\bf{\Sigma}}_{o,o}bold_Σ start_POSTSUBSCRIPT italic_o , italic_o end_POSTSUBSCRIPT are submatrices of 𝚺𝚺{\bf{\Sigma}}bold_Σ, with rows and columns corresponding to (𝐳im,𝐳io)superscriptsubscript𝐳𝑖𝑚superscriptsubscript𝐳𝑖𝑜({\bf{z}}_{i}^{m},{\bf{z}}_{i}^{o})( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) and (𝐳io,𝐳io)superscriptsubscript𝐳𝑖𝑜superscriptsubscript𝐳𝑖𝑜({\bf{z}}_{i}^{o},{\bf{z}}_{i}^{o})( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ), respectively. We then utilize the inverse mapping j1()superscriptsubscript𝑗1\mathcal{F}_{j}^{-1}(\cdot)caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ⋅ ) to transform 𝐙𝐙\bf{Z}bold_Z back into 𝐕𝐕{\bf{V}}bold_V.

II-E EM Optimization

The EM algorithm has shown a remarkable ability to handle data with substantial missing values [9]. To make a more accurate estimation of 𝚺𝚺\mathbf{\Sigma}bold_Σ, we employ the EM algorithm to iteratively update the covariance matrix. Firstly, we initialize the missing part of the latent variable matrix 𝐙msuperscript𝐙𝑚\mathbf{Z}^{m}bold_Z start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT to zero and compute the corresponding covariance matrix 𝚺(0)superscript𝚺0\mathbf{\Sigma}^{(0)}bold_Σ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT. The general procedure of the EM algorithm is as follows:

II-E1 E-Step

Utilizing 𝚺(t)superscript𝚺𝑡\mathbf{\Sigma}^{(t)}bold_Σ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT and 𝐳iosuperscriptsubscript𝐳𝑖𝑜\mathbf{z}_{i}^{o}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT, we recalculate 𝐳imsuperscriptsubscript𝐳𝑖𝑚\mathbf{z}_{i}^{m}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT through Equation 13 for all patients.We assume that the filled values are 𝐳iIsuperscriptsubscript𝐳𝑖𝐼\mathbf{z}_{i}^{I}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT.

𝐳iI=𝔼[𝐳im|𝐳io,𝚺(t)].superscriptsubscript𝐳𝑖𝐼𝔼delimited-[]conditionalsuperscriptsubscript𝐳𝑖𝑚subscriptsuperscript𝐳𝑜𝑖superscript𝚺𝑡\displaystyle\mathbf{z}_{i}^{I}=\mathbb{E}[\mathbf{z}_{i}^{m}|{\bf{z}}^{o}_{i}% ,{\bf{\Sigma}}^{(t)}].bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT = blackboard_E [ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | bold_z start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_Σ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ] . (14)

II-E2 M-Step

We calculate the updated covariance matrix 𝚺^(t+1)superscript^𝚺𝑡1\hat{\mathbf{\Sigma}}^{(t+1)}over^ start_ARG bold_Σ end_ARG start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT using G(𝚺(t),𝐳ioi)𝐺superscript𝚺𝑡superscriptsubscript𝐳𝑖subscript𝑜𝑖G({\bf{\Sigma}}^{(t)},{\bf{z}}_{i}^{o_{i}})italic_G ( bold_Σ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ). The function G(𝚺(t),𝐳ioi)𝐺superscript𝚺𝑡superscriptsubscript𝐳𝑖subscript𝑜𝑖G({\bf{\Sigma}}^{(t)},{\bf{z}}_{i}^{o_{i}})italic_G ( bold_Σ start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) is related to Equation 10 and is expressed as follows:

G(𝐳iI,𝐳io)=1Ni=1N𝔼[𝐳i𝐳i|𝐳io,𝐳iI].𝐺subscriptsuperscript𝐳𝐼𝑖superscriptsubscript𝐳𝑖𝑜1𝑁superscriptsubscript𝑖1𝑁𝔼delimited-[]conditionalsubscript𝐳𝑖superscriptsubscript𝐳𝑖topsubscriptsuperscript𝐳𝑜𝑖subscriptsuperscript𝐳𝐼𝑖\displaystyle G({\bf{z}}^{I}_{i},{\bf{z}}_{i}^{o})=\frac{1}{N}\sum_{i=1}^{N}% \mathbb{E}[{\bf{z}}_{i}{\bf{z}}_{i}^{\top}|{\bf{z}}^{o}_{i},{\bf{z}}^{I}_{i}].italic_G ( bold_z start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_E [ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT | bold_z start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_z start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] . (15)

II-E3 Scale

Given that Gaussian copulas correspond to the correlation matrix, we apply constraints on the iterative process of 𝚺𝚺{\bf{\Sigma}}bold_Σ through Pεsubscript𝑃𝜀P_{\varepsilon}italic_P start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT. The relationship is established as follows: Pε=𝐃12𝚺𝐃12subscript𝑃𝜀superscript𝐃12𝚺superscript𝐃12P_{\varepsilon}={\bf{D}}^{-\frac{1}{2}}{\bf{\Sigma}}{\bf{D}}^{-\frac{1}{2}}italic_P start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = bold_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_Σ bold_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, where 𝐃𝐃{\bf{D}}bold_D is the diagonal matrix derived from 𝚺𝚺{\bf{\Sigma}}bold_Σ. This transformation ensures that the constraints are effectively incorporated into the iterative estimation of 𝚺𝚺{\bf{\Sigma}}bold_Σ.

III Experiments

III-A Datasets Acquisition

The experiments are conducted on three public real-world healthcare datasets, including Medical Information Mart for Intensive Care (MIMIC) [10], PhysioNet 2012 Mortality Prediction Challenge (PhysioNet2012) [11] and PhysioNet in Cardiology Challenge 2019 (PhysioNet2019) [12]. The statistical results of the three datasets are shown in Table I.

TABLE I: General information of four datasets used in this work
MIMIC PhysioNet-2012 PhysioNet-2019
#Total Samples 20,940 11,988 36,409
#Features 21 36 34
#Sequence Length 31 48 18
Original Missing Rate 68% 80% 79%
TABLE II: The comparison of TGC with other methods. The best performance per dataset is highlighted in bold. The results in brackets represent the average standard deviation of three experiments.
Model MIMIC PhysioNet2012 PhysioNet2019
MAE MRE RMSE MAE MRE RMSE MAE MRE RMSE
LOCF [13] 0.547 (0.003) 0.722 (0.004) 0.842 (0.003) 0.447 (0.001) 0.632 (0.003) 0.864 (0.043) 0.520 (0.001) 0.698 (0.002) 0.800 (0.002)
ISVD (2001) [14] 0.525 (0.002) 0.693 (0.002) 0.748 (0.003) 0.520 (0.002) 0.735 (0.002) 0.814 (0.031) 0.519 (0.002) 0.696 (0.001) 0.753 (0.007)
SI (2010) [15] 0.592 (0.001) 0.781 (0.001) 0.808 (0.002) 0.508 (0.001) 0.719 (0.001) 0.799 (0.034) 0.603 (0.002) 0.808 (0.000) 0.827 (0.006)
BRITS (2018) [16] 0.436 (0.002) 0.575 (0.002) 0.691 (0.003) 0.356 (0.001) 0.504 (0.001) 0.689 (0.038) 0.435 (0.002) 0.583 (0.002) 0.679 (0.006)
KNN (2019) [17] 0.683 (0.002) 0.901 (0.002) 0.951 (0.003) 0.639 (0.001) 0.903 (0.003) 1.008 (0.026) 0.743 (0.003) 0.996 (0.003) 1.037 (0.006)
GPVAE (2020) [18] 0.568 (0.004) 0.749 (0.004) 0.791 (0.004) 0.544 (0.015) 0.769 (0.020) 0.834 (0.032) 0.569 (0.004) 0.762 (0.005) 0.798 (0.008)
FiLM (2022) [19] 0.596 (0.002) 0.786 (0.002) 0.835 (0.004) 0.533 (0.002) 0.754 (0.001) 0.848 (0.031) 0.580 (0.002) 0.777 (0.001) 0.822 (0.006)
SCINet (2022) [6] 0.530 (0.002) 0.699 (0.003) 0.756 (0.004) 0.465 (0.002) 0.657 (0.003) 0.770 (0.034) 0.510 (0.004) 0.683 (0.004) 0.747 (0.008)
SAITS (2023) [7] 0.401 (0.004) 0.528 (0.005) 0.631 (0.005) 0.352 (0.002) 0.498 (0.003) 0.677 (0.042) 0.418 (0.002) 0.560 (0.002) 0.660 (0.006)
Koopa (2024) [8] 0.527 (0.005) 0.695 (0.006) 0.745 (0.005) 0.476 (0.007) 0.674 (0.011) 0.768 (0.035) 0.519 (0.007) 0.696 (0.009) 0.757 (0.009)
FreTS (2024) [20] 0.521 (0.008) 0.688 (0.011) 0.756 (0.009) 0.488 (0.007) 0.690 (0.009) 0.814 (0.043) 0.477 (0.008) 0.640 (0.011) 0.718 (0.010)
TGC(ours) 0.377 (0.001) 0.497 (0.002) 0.610 (0.002) 0.309 (0.001) 0.437 (0.002) 0.639 (0.042) 0.389 (0.001) 0.521 (0.001) 0.638 (0.006)

Refer to caption

Figure 2: The performance of representative models at various levels of data missingness. The missing ratio on the horizontal axis represents the proportion of observations removed from the test set randomly, not the total missing rate of the test set.

III-B Implementation Detail

The data set is initially divided into the training set and the test set according to 80% and 20%. We randomly eliminate E%percent𝐸E\%italic_E % of observed values in the test set and use these values as ground truth to evaluate the imputation performance of models (See III-D). To fully leverage the computational power of statistical models designed for two-dimensional data, we reshaped the input data into a matrix format of size N×(TF)𝑁𝑇𝐹N\times(TF)italic_N × ( italic_T italic_F ). For deep learning-based models, we train the model on the training set and then perform interpolation on the masked test set to calculate the corresponding metrics. All the models are run through three rounds, with the average and variance of the corresponding metrics computed. We have carefully calibrated the parameters of these models based on the original papers and made necessary adjustments to adapt them to the varying characteristics of our datasets. Following the prior work [7, 8], MAE (mean absolute error), MRE (mean relative error), and RMSE (average square error of the radius) are used for evaluation. We implement these models using the PyPOTS and fancyimpute packages and employ the Adam optimizer for optimization on an NVIDIA GeForce RTX 4060 GPU.

III-C Comparison with Competing Methods

TABLE III: Ablation studies on the proposed model. The best performance is highlighted in bold. The results in brackets represent the average standard deviation of three experiments.
Model MIMIC PhysioNet2012 PhysioNet2019
MAE MRE RMSE MAE MRE RMSE MAE MRE RMSE
TGC-U 0.760 (0.001) 1.002 (0.000) 1.003 (0.002) 0.709 (0.002) 1.003 (0.000) 1.01 (0.024) 0.746 (0.002) 1.000 (0.000) 0.995 (0.006)
U-ISVD 0.529 (0.002) 0.698 (0.003) 0.752 (0.002) 0.520 (0.002) 0.735 (0.002) 0.814 (0.031) 0.519 (0.002) 0.696 (0.001) 0.753 (0.007)
TGC 0.377 (0.001) 0.497 (0.002) 0.610 (0.002) 0.309 (0.001) 0.437 (0.002) 0.639 (0.042) 0.389 (0.001) 0.521 (0.001) 0.638 (0.006)

We compare the TGC with the existing SOTA imputation algorithms by reporting the mean performance of the tested models across multiple trials where 20% to 80% of the observation data are randomly removed from the test set, which are shown in Table II. From Table II, we can see our TGC significantly outperforms other models, including statistical methods and deep learning-based methods on all databases.

TGC constructs a fully-connected covariance matrix to connect all the variables with all the time-points, thus it can learn complex dependencies and achieve the best results in MTS imputation.

III-D Imputation With Varing Missing Ratios

To demonstrate the model’s effectiveness of handling the data with varying sampling densities, we further conduct experiments on the three datasets with different missing ratios, which is shown in Figure 2. In this experiment, all models were trained on datasets with the original missing rate (shown in Table I), and the testing data was randomly removed by 20% to 80% to simulate different missing levels. From the results, we observe that our TGC exhibits superior adaptability on the dataset with high missing ratios, indicating strong robustness. The statistical models perform poorly on large datasets with high missing rates, overall being inferior to deep learning models. Deep learning models are second only to TGC and higher than other models overall.

III-E Ablation Study

In this subsection, to empirically evaluate the effectiveness of different modules of TGC, we have devised several variations of the TGC: (1) TGC-U, where the unfolding is replaced with a transformation of the form (NT)×F𝑁𝑇𝐹(NT)\times F( italic_N italic_T ) × italic_F, designed to differentiate it from the unfolding; (2) U-ISVD, which uses ISVD to fill in missing values after an unfolding operation.

The comparative analysis of the various models is summarized in Table III. These findings reveal that the TGC-U model failed to converge, emphasizing the crucial role of unfolding in assisting the model to grasp the complex interdependencies among different variables across various time points. Additionally, the U-ISVD model exhibits a considerably greater loss than the TGC model, thereby reinforcing the superior performance of the Gaussian copula.

IV Conclusion

In this paper, we propose a TGC model for clinical MTS missing value estimation. We explore the cross-variable and temporal relationships based on the Gaussian latent representation. We employ an EM algorithm to iteratively refine the covariance matrix and impute missing values, thereby enhancing model adaptability. Extensive experiments validate the TGC model’s superiority in clinical data imputation. Moreover, TGC exhibits remarkable robustness in handling test datasets with high missing rates.

References

  • [1] R. Liu, K. M. Hunold, J. M. Caterino, and P. Zhang, “Estimating treatment effects for time-to-treatment antibiotic stewardship in sepsis,” Nature machine intelligence, vol. 5, no. 4, pp. 421–431, 2023.
  • [2] Z. Nowroozilarki, A. Pakbin, J. Royalty, D. K. Lee, and B. J. Mortazavi, “Real-time mortality prediction using mimic-iv icu data via boosted nonparametric hazards,” in 2021 IEEE EMBS international conference on biomedical and health informatics (BHI).   IEEE, 2021, pp. 1–4.
  • [3] N. Hou, M. Li, L. He, B. Xie, L. Wang, R. Zhang, Y. Yu, X. Sun, Z. Pan, and K. Wang, “Predicting 30-days mortality for mimic-iii patients with sepsis-3: a machine learning approach using xgboost,” Journal of translational medicine, vol. 18, pp. 1–14, 2020.
  • [4] T. Bergquist, T. Schaffter, Y. Yan, T. Yu, J. Prosser, J. Gao, G. Chen, Ł. Charzewski, Z. Nawalany, I. Brugere et al., “Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine,” Journal of the American Medical Informatics Association, vol. 31, no. 1, pp. 35–44, 2024.
  • [5] L. Ren, T. Wang, A. S. Seklouli, H. Zhang, and A. Bouras, “A review on missing values for main challenges and methods,” Information Systems, p. 102268, 2023.
  • [6] M. Liu, A. Zeng, M. Chen, Z. Xu, Q. Lai, L. Ma, and Q. Xu, “Scinet: Time series modeling and forecasting with sample convolution and interaction,” Advances in Neural Information Processing Systems, vol. 35, pp. 5816–5828, 2022.
  • [7] W. Du, D. Côté, and Y. Liu, “Saits: Self-attention-based imputation for time series,” Expert Systems with Applications, vol. 219, p. 119619, 2023.
  • [8] Y. Liu, C. Li, J. Wang, and M. Long, “Koopa: Learning non-stationary time series dynamics with koopman predictors,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  • [9] Y. Zhao and M. Udell, “Missing value imputation for mixed data via gaussian copula,” in Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 636–646.
  • [10] A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J. Pollard, S. Hao, B. Moody, B. Gow et al., “Mimic-iv, a freely accessible electronic health record dataset,” Scientific data, vol. 10, no. 1, p. 1, 2023.
  • [11] I. Silva, G. Moody, D. J. Scott, L. A. Celi, and R. G. Mark, “Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012,” in 2012 computing in cardiology.   IEEE, 2012, pp. 245–248.
  • [12] M. A. Reyna, C. S. Josef, R. Jeter, S. P. Shashikumar, M. B. Westover, S. Nemati, G. D. Clifford, and A. Sharma, “Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019,” Critical care medicine, vol. 48, no. 2, pp. 210–217, 2020.
  • [13] J. E. Overall, S. Tonidandel, and R. R. Starbuck, “Last-observation-carried-forward (locf) and tests for difference in mean rates of change in controlled repeated measurements designs with dropouts,” Social Science Research, vol. 38, no. 2, pp. 492–503, 2009.
  • [14] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, “Missing value estimation methods for dna microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001.
  • [15] R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral regularization algorithms for learning large incomplete matrices,” The Journal of Machine Learning Research, vol. 11, pp. 2287–2322, 2010.
  • [16] W. Cao, D. Wang, J. Li, H. Zhou, L. Li, and Y. Li, “Brits: Bidirectional recurrent imputation for time series,” Advances in neural information processing systems, vol. 31, 2018.
  • [17] U. Pujianto, A. P. Wibawa, M. I. Akbar et al., “K-nearest neighbor (k-nn) based missing data imputation,” in 2019 5th International Conference on Science in Information Technology (ICSITech).   IEEE, 2019, pp. 83–88.
  • [18] V. Fortuin, D. Baranchuk, G. Rätsch, and S. Mandt, “Gp-vae: Deep probabilistic time series imputation,” in International conference on artificial intelligence and statistics.   PMLR, 2020, pp. 1651–1661.
  • [19] T. Zhou, Z. Ma, Q. Wen, L. Sun, T. Yao, W. Yin, R. Jin et al., “Film: Frequency improved legendre memory model for long-term time series forecasting,” Advances in neural information processing systems, vol. 35, pp. 12 677–12 690, 2022.
  • [20] K. Yi, Q. Zhang, W. Fan, S. Wang, P. Wang, H. He, N. An, D. Lian, L. Cao, and Z. Niu, “Frequency-domain mlps are more effective learners in time series forecasting,” Advances in Neural Information Processing Systems, vol. 36, 2024.