Statistics Theory
See recent articles
Showing new listings for Thursday, 17 April 2025
- [1] arXiv:2504.11834 [pdf, other]
-
Title: Estimation and inference in error-in-operator modelSubjects: Statistics Theory (math.ST)
Many statistical problems can be reduced to a linear inverse problem in which only a noisy version of the operator is available. Particular examples include random design regression, deconvolution problem, instrumental variable regression, functional data analysis, error-in-variable regression, drift estimation in stochastic diffusion, and many others. The pragmatic plug-in approach can be well justified in the classical asymptotic setup with a growing sample size. However, recent developments in high dimensional inference reveal some new features of this problem. In high dimensional linear regression with a random design, the plug-in approach is questionable but the use of a simple ridge penalization yields a benign overfitting phenomenon; see \cite{baLoLu2020}, \cite{ChMo2022}, \cite{NoPuSp2024}. This paper revisits the general Error-in-Operator problem for finite samples and high dimension of the source and image spaces. A particular focus is on the choice of a proper regularization. We show that a simple ridge penalty (Tikhonov regularization) works properly in the case when the operator is more regular than the signal. In the opposite case, some model reduction technique like spectral truncation should be applied.
- [2] arXiv:2504.11993 [pdf, other]
-
Title: New Three Different Generators for Constructing New Three Different Bivariate CopulasSubjects: Statistics Theory (math.ST); Probability (math.PR)
In this paper, the author introduces new methods to construct Archimedean copulas. The generator of each copula fulfills the sufficient conditions as regards the boundary and being continuous, decreasing, and convex. Each inverse generator also fulfills the necessary conditions as regards the boundary conditions, marginal uniformity, and 2-increasing properties. Although these copulas satisfy these conditions, they have some limitations. They do not cover the entire dependency spectrum, ranging from perfect negative dependency to perfect positive dependency, passing through the independence state
- [3] arXiv:2504.12190 [pdf, other]
-
Title: Creating non-reversible rejection-free samplers by rebalancing skew-balanced Markov jump processesComments: 28 pages, 7 figuresSubjects: Statistics Theory (math.ST); Computation (stat.CO)
Markov chain sampling methods form the backbone of modern computational statistics. However, many popular methods are prone to random walk behavior, i.e., diffusion-like exploration of the sample space, leading to slow mixing that requires intricate tuning to alleviate. Non-reversible samplers can resolve some of these issues. We introduce a device that turns jump processes that satisfy a skew-detailed balance condition for a reference measure into a process that samples a target measure that is absolutely continuous with respect to the reference measure. The resulting sampler is rejection-free, non-reversible, and continuous-time. As an example, we apply the device to Hamiltonian dynamics discretized by the leapfrog integrator, resulting in a rejection-free non-reversible continuous-time version of Hamiltonian Monte Carlo (HMC). We prove the geometric ergodicity of the resulting sampler under certain convexity conditions, and demonstrate its qualitatively different behavior to HMC through numerical examples.
New submissions (showing 3 of 3 entries)
- [4] arXiv:2504.11759 (cross-list from stat.ME) [pdf, html, other]
-
Title: Bringing closure to FDR control: beating the e-Benjamini-Hochberg procedureComments: 11 pages, 1 figureSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
False discovery rate (FDR) has been a key metric for error control in multiple hypothesis testing, and many methods have developed for FDR control across a diverse cross-section of settings and applications. We develop a closure principle for all FDR controlling procedures, i.e., we provide a characterization based on e-values for all admissible FDR controlling procedures. We leverage this idea to formulate the closed eBH procedure, a (usually strict) improvement over the eBH procedure for FDR control when provided with e-values. We demonstrate the practical performance of closed eBH in simulations.
- [5] arXiv:2504.11848 (cross-list from stat.ME) [pdf, html, other]
-
Title: Proximal Inference on Population Intervention Indirect EffectComments: 60 pages, 3 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
The population intervention indirect effect (PIIE) is a novel mediation effect representing the indirect component of the population intervention effect. Unlike traditional mediation measures, such as the natural indirect effect, the PIIE holds particular relevance in observational studies involving unethical exposures, when hypothetical interventions that impose harmful exposures are inappropriate. Although prior research has identified PIIE under unmeasured confounders between exposure and outcome, it has not fully addressed the confounding that affects the mediator. This study extends the PIIE identification to settings where unmeasured confounders influence exposure-outcome, exposure-mediator, and mediator-outcome relationships. Specifically, we leverage observed covariates as proxy variables for unmeasured confounders, constructing three proximal identification frameworks. Additionally, we characterize the semiparametric efficiency bound and develop multiply robust and locally efficient estimators. To handle high-dimensional nuisance parameters, we propose a debiased machine learning approach that achieves $\sqrt{n}$-consistency and asymptotic normality to estimate the true PIIE values, even when the machine learning estimators for the nuisance functions do not converge at $\sqrt{n}$-rate. In simulations, our estimators demonstrate higher confidence interval coverage rates than conventional methods across various model misspecifications. In a real data application, our approaches reveal an indirect effect of alcohol consumption on depression risk mediated by depersonalization symptoms.
- [6] arXiv:2504.11978 (cross-list from cs.IT) [pdf, other]
-
Title: On the Intersection and Composition properties of conditional independenceComments: 12 pages; submitted to WUPES '25Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)
Compositional graphoids are fundamental discrete structures which appear in probabilistic reasoning, particularly in the area of graphical models. They are semigraphoids which satisfy the Intersection and Composition properties. These important properties, however, are not enjoyed by general probability distributions. We survey what is known in terms of sufficient conditions for Intersection and Composition and derive a set of new sufficient conditions in the context of discrete random variables based on conditional information inequalities for Shannon entropies.
Cross submissions (showing 3 of 3 entries)
- [7] arXiv:2210.06672 (replaced) [pdf, html, other]
-
Title: Variance-Aware Estimation of Kernel Mean EmbeddingSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
An important feature of kernel mean embeddings (KME) is that the rate of convergence of the empirical KME to the true distribution KME can be bounded independently of the dimension of the space, properties of the distribution and smoothness features of the kernel. We show how to speed-up convergence by leveraging variance information in the reproducing kernel Hilbert space. Furthermore, we show that even when such information is a priori unknown, we can efficiently estimate it from the data, recovering the desiderata of a distribution agnostic bound that enjoys acceleration in fortuitous settings. We further extend our results from independent data to stationary mixing sequences and illustrate our methods in the context of hypothesis testing and robust parametric estimation.
- [8] arXiv:2405.01368 (replaced) [pdf, html, other]
-
Title: Sub-uniformity of harmonic mean p-valuesSubjects: Statistics Theory (math.ST)
We obtain several inequalities on the generalized means of dependent p-values. In particular, the weighted harmonic mean of p-values is strictly sub-uniform under several dependence assumptions of p-values, including independence, negative upper orthant dependence, the class of extremal mixture copulas, and some Clayton copulas. Sub-uniformity of the harmonic mean of p-values has an important implication in multiple hypothesis testing: It is statistically invalid (anti-conservative) to merge p-values using the harmonic mean unless a proper threshold or multiplier adjustment is used, and this applies across all significance levels. The required multiplier adjustment on the harmonic mean p-value grows sub-linearly to infinity as the number of p-values increases, and hence there does not exist a constant multiplier that works for any number of p-values, even under independence.
- [9] arXiv:2410.00078 (replaced) [pdf, html, other]
-
Title: Shuffled Linear Regression via Spectral MatchingComments: This work has been submitted to the IEEE for possible publicationSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Spectral Theory (math.SP); Machine Learning (stat.ML)
Shuffled linear regression (SLR) seeks to estimate latent features through a linear transformation, complicated by unknown permutations in the measurement dimensions. This problem extends traditional least-squares (LS) and Least Absolute Shrinkage and Selection Operator (LASSO) approaches by jointly estimating the permutation, resulting in shuffled LS and shuffled LASSO formulations. Existing methods, constrained by the combinatorial complexity of permutation recovery, often address small-scale cases with limited measurements. In contrast, we focus on large-scale SLR, particularly suited for environments with abundant measurement samples. We propose a spectral matching method that efficiently resolves permutations by aligning spectral components of the measurement and feature covariances. Rigorous theoretical analyses demonstrate that our method achieves accurate estimates in both shuffled LS and shuffled LASSO settings, given a sufficient number of samples. Furthermore, we extend our approach to address simultaneous pose and correspondence estimation in image registration tasks. Experiments on synthetic datasets and real-world image registration scenarios show that our method outperforms existing algorithms in both estimation accuracy and registration performance.
- [10] arXiv:2311.09446 (replaced) [pdf, html, other]
-
Title: Scalable simulation-based inference for implicitly defined models using a metamodel for Monte Carlo log-likelihood estimatorSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
Models implicitly defined through a random simulator of a process have become widely used in scientific and industrial applications in recent years. However, simulation-based inference methods for such implicit models, like approximate Bayesian computation (ABC), often scale poorly as data size increases. We develop a scalable inference method for implicitly defined models using a metamodel for the Monte Carlo log-likelihood estimator derived from simulations. This metamodel characterizes both statistical and simulation-based randomness in the distribution of the log-likelihood estimator across different parameter values. Our metamodel-based method quantifies uncertainty in parameter estimation in a principled manner, leveraging the local asymptotic normality of the mean function of the log-likelihood estimator. We apply this method to construct accurate confidence intervals for parameters of partially observed Markov process models where the Monte Carlo log-likelihood estimator is obtained using the bootstrap particle filter. We numerically demonstrate that our method enables accurate and highly scalable parameter inference across several examples, including a mechanistic compartment model for infectious diseases.
- [11] arXiv:2404.02169 (replaced) [pdf, html, other]
-
Title: Invariant kernels on the space of complex covariance matricesComments: under review Journal of Machine Learning ResearchSubjects: Functional Analysis (math.FA); Statistics Theory (math.ST)
The present work develops certain analytical tools required to construct and compute invariant kernels on the space of complex covariance matrices. The main result is the $\mathrm{L}^1$--Godement theorem, which states that any invariant kernel, which is (in a certain natural sense) also integrable, can be computed by taking the inverse spherical transform of a positive function. General expressions for inverse spherical transforms are then provided, which can be used to explore new families of invariant kernels, at a rather moderate computational cost. A further, alternative approach for constructing new invariant kernels is also introduced, based on Ramanujan's master theorem for symmetric cones. This leads to a novel closed-form invariant kernel, called the Beta-prime kernel. Numerical experiments highlight the computational and performance advantages of this kernel, especially in the context of two-sample hypothesis testing.
- [12] arXiv:2410.20068 (replaced) [pdf, html, other]
-
Title: Understanding the Effect of GCN Convolutions in Regression TasksComments: 25 pagesSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g., consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, we consider networks for which the graph structure implies that neighboring nodes exhibit similar signals and provide statistical theory for the impact of convolution operators. Focusing on estimators based solely on neighborhood aggregation, we examine how two common convolutions - the original GCN and GraphSAGE convolutions - affect the learning error as a function of the neighborhood topology and the number of convolutional layers. We explicitly characterize the bias-variance type trade-off incurred by GCNs as a function of the neighborhood size and identify specific graph topologies where convolution operators are less effective. Our theoretical findings are corroborated by synthetic experiments, and provide a start to a deeper quantitative understanding of convolutional effects in GCNs for offering rigorous guidelines for practitioners.
- [13] arXiv:2502.12999 (replaced) [pdf, html, other]
-
Title: Asymptotic Optimism of Random-Design Linear and Kernel Regression ModelsComments: 56 pages;Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We derived the closed-form asymptotic optimism of linear regression models under random designs, and generalizes it to kernel ridge regression. Using scaled asymptotic optimism as a generic predictive model complexity measure, we studied the fundamental different behaviors of linear regression model, tangent kernel (NTK) regression model and three-layer fully connected neural networks (NN). Our contribution is two-fold: we provided theoretical ground for using scaled optimism as a model predictive complexity measure; and we show empirically that NN with ReLUs behaves differently from kernel models under this measure. With resampling techniques, we can also compute the optimism for regression models with real data.
- [14] arXiv:2503.21138 (replaced) [pdf, html, other]
-
Title: A Computational Framework for Efficient Model Evaluation with Causal GuaranteesSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
In order to reduce the cost of experimental evaluation for models, we introduce a computational theory of evaluation for prediction and decision models: build evaluation model to accelerate the evaluation procedures. We prove upper bounds of generalized error and generalized causal effect error of given evaluation models. We also prove efficiency, and consistency to estimated causal effect from deployed subject to evaluation metric by prediction. To learn evaluation models, we propose a meta-learner to handle heterogeneous evaluation subjects space problem. Comparing with existed evaluation approaches, our (conditional) evaluation model reduced 24.1\%-99.0\% evaluation errors across 12 scenes, including individual medicine, scientific simulation, social experiment, business activity, and quantum trade. The evaluation time is reduced 3-7 order of magnitude comparing with experiments or simulations.
- [15] arXiv:2504.09029 (replaced) [pdf, html, other]
-
Title: A Hierarchical Decomposition of Kullback-Leibler Divergence: Disentangling Marginal Mismatches from Statistical DependenciesComments: 17 pages, 3 figuresSubjects: Information Theory (cs.IT); Statistics Theory (math.ST)
The Kullback-Leibler (KL) divergence is a foundational measure for comparing probability distributions. Yet in multivariate settings, its single value often obscures the underlying reasons for divergence, conflating mismatches in individual variable distributions (marginals) with effects arising from statistical dependencies. We derive an algebraically exact, additive, and hierarchical decomposition of the KL divergence between a joint distribution P(X1,...,Xn) and a standard product reference distribution Q(X1,...,Xn) = product_i q(Xi), where variables are assumed independent and identically distributed according to a common reference q. The total divergence precisely splits into two primary components: (1) the summed divergence of each marginal distribution Pi(Xi) from the common reference q(Xi), quantifying marginal deviations; and (2) the total correlation (or multi-information), capturing the total statistical dependency among variables. Leveraging Mobius inversion on the subset lattice, we further decompose this total correlation term into a hierarchy of signed contributions from distinct pairwise, triplet, and higher-order statistical interactions, expressed using standard Shannon information quantities. This decomposition provides an algebraically complete and interpretable breakdown of KL divergence using established information measures, requiring no approximations or model assumptions. Numerical validation using hypergeometric sampling confirms the decomposition's exactness to machine precision across diverse system configurations. This framework enables a precise diagnosis of divergence origins--distinguishing marginal effects from interaction effects--with potential applications across machine learning, econometrics, and complex systems analysis.