CRC-SGAD: Conformal Risk Control for Supervised Graph Anomaly Detection
^†^†thanks:

Songran Bai12, Xiaolong Zheng12, Daniel Dajun Zeng12 1 State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
2School of Artificial Intelligence, University of Chinese Academy of Sciences
Beijing, China
{baisongran2020, xiaolong.zheng, dajun.zeng}@ia.ac.cn

Abstract

Graph Anomaly Detection (GAD) is critical in security-sensitive domains, yet faces reliability challenges: miscalibrated confidence estimation (underconfidence in normal nodes, overconfidence in anomalies), adversarial vulnerability of derived confidence score under structural perturbations, and limited efficacy of conventional calibration methods for sparse anomaly patterns. Thus we propose CRC-SGAD, a framework integrating statistical risk control into GAD via two innovations: (1) A Dual-Threshold Conformal Risk Control mechanism that provides theoretically guaranteed bounds for both False Negative Rate (FNR) and False Positive Rate (FPR) through providing prediction sets; (2) A Subgraph-aware Spectral Graph Neural Calibrator (SSGNC) that optimizes node representations through adaptive spectral filtering while reducing the size of prediction sets via hybrid loss optimization. Experiments on four datasets and five GAD models demonstrate statistically significant improvements in FNR and FPR control and prediction set size. CRC-SGAD establishes a paradigm for statistically rigorous anomaly detection in graph-structured security applications.

Index Terms:

Graph Anomaly Detection, Conformal Risk Control, Uncertainty Quantification, Calibration, Spectral Graph Neural Network.

I Introduction

Graph-based anomaly detection (GAD) has become pivotal in security-critical applications like fraud detection[1] and misinformation identification[2], leveraging graph neural networks (GNNs) through two paradigms: spatial methods enhance graph homophily through neighborhood redefinition or mitigate over-smoothing via adaptive message passing mechanisms[3, 4, 5], while spectral approaches design specialized filters to capture mid- and high-frequency signals induced by anomalous patterns[6, 7].

However, the deployment of GAD models in high-stakes operational environments faces critical reliability challenges. For instance, malicious entities evade detection by fabricating connections, causing costly misclassifications[8, 9, 10]. Our preliminary experiments reveal three critical insights: (1) Even state-of-the-art models like BWGNN[6] exhibit unreliable confidence scores, displaying overconfidence for anomalous nodes while underestimating certainty for normal nodes (see Fig1(a)(b)); (2) Subtle targeted structural perturbations[11] on correctly classified nodes can induce high-confidence misclassifications (see Fig1(d)); (3) Global structural perturbations[12] exacerbate confidence disparity between node classes(see Fig1(c)). The lack of precise class-dependent uncertainty quantification pose fundamental limitations for reliable GAD models. Traditional calibration methods like Expected Calibration Error (ECE) are unreliable due to insufficient anomalous observations in most confidence bins caused by the inherent rarity. And individual-based calibration method[13] lacks theoretical guarantees. Conformal Risk Control (CRC)[14] emerges as a promising alternative through finite-sample statistical guarantees.

Refer to caption — Figure 1: The unreliable prediction confidence of BWGNN on Amazon dataset. (a) and (b) Figures depict the calibration error plots for anomalous nodes and normal nodes, respectively. (c) Figure illustrates the confidence distribution of misclassified nodes (both normal and anomalous) following a global attack. (d) Figure demonstrates the confidence variation of attacked nodes under a targeted attack scenario .

We therefore propose the CRC-SGAD framework that generates prediction sets with theoretically guaranteed risk control - the expected FNR and FPR are provably bounded below pre-defined thresholds (e.g., 0.1). Our key contributions are threefold: First, we develop a dual-threshold conformal risk control mechanism tailored for GAD, establishing set-valued functions for normal and anomalous nodes respectively. We prove this adaptation preserves exchangeability assumptions required for valid CRC. Second, to address the observed local heterogeneity in prediction set sizes across neighborhoods, we design a subgraph-aware spectral graph neural calibrator (SSGNC). This component learns frequency-aware node representations through subgraph-specific spectral filters, jointly optimizing prediction inefficiency via a hybrid loss combining conformal risk and weighted cross-entropy. Third, we conduct comprehensive evaluations across four datasets using five classic GAD methods, demonstrating statistically significant improvements in Inefficiency and FNR and FPR control over existing conformal approaches[15, 16, 17].

II Related Work

II-A Graph Anomaly Detection

GAD methods are typically categorized into supervised (SGAD) and unsupervised (UGAD) paradigms based on label availability[18]. While UGAD primarily employs reconstruction errors or graph contrastive learning for anomaly scoring[19, 20], we focus on SGAD due to its superior practicality enabled by partially available trustable labels. Recent advances address heterophily in SGAD through spatial and spectral approaches[21]. Spatial methods enhance node representation by redefining neighborhoods or modifying message passing: H2-FDetector[5] differentiates homophilic and heterophilic connections for adaptive message aggregation, PC-GNN[3] employs imbalance-aware subgraph sampling for structural refinement, and PMP[22] develops a self-tuning mechanism for information fusion from heterogeneous neighbors. Spectral approaches tackle this through adaptive filtering. AMNet[7] captures dual-frequency signals while BWGNN[6] designs beta wavelet-based band-pass filters to preserve anomalous high-frequency patterns.

II-B Uncertainty Quantification for GNNs

Following the taxonomy in deep neural networks, uncertainty in GNNs can also be categorized into aleatoric and epistemic uncertainty[23]. Existing approaches primarily quantify uncertainty through maximum softmax probability or entropy[24], Bayesian GNNs with Dirichlet priors[25], frequentist post-hoc methods (e.g., conformal prediction)[26, 27], Bayesian stochastic parameter models[28], or ensemble methods[29]. Among these, conformal prediction stands out as a model-agnostic frequentist approach that provides theoretical guarantees with minimal additional parameters, high computational efficiency, and robustness. Its extension, conformal risk control[14], offers more flexible risk management. For imbalanced graph classification tasks, [13] extended distribution-based calibration metrics to node-level predictions and proposed Expected Individual Calibration Error (EICE) to address miscalibration issues in rare minority classes. However, while EICE achieves better calibrated probabilities for both normal and anomalous nodes, it remains a point estimation method without theoretical guarantees for controlling false negative and positive rates.

III Preliminaries

III-A Supervised Graph Anomaly Detection

Consider an undirected graph $G=(V,\mathbf{\Sigma},\mathbf{A})$ where $V=\{v_{i}\}_{i=1}^{n}$ denotes the node set. Each node $v_{i}\in V$ possesses a feature vector $\mathbf{x}_{i}\in\mathbb{R}^{d}$ , forming the feature matrix $\mathbf{X}=[\mathbf{x}_{1},\mathbf{x}_{2},...,\mathbf{x}_{n}]^{\top}\in\mathbb% {R}^{n\times d}$ . The adjacency matrix $\mathbf{A}\in\{0,1\}^{n\times n}$ encodes edge connections, where $\mathbf{A}_{ij}=1$ iff nodes $v_{i}$ and $v_{j}$ are connected. Let $\mathbf{Y}\in\{0,1\}^{n}$ denote the binary anomaly indicator vector.

The objective is to learn an estimator $f_{\theta}:\mathbb{R}^{d}\times\{0,1\}^{n\times n}\rightarrow[0,1]^{2}$ that predicts anomaly probabilities, where $f_{\theta}^{0}(\mathbf{x}_{i},\mathbf{A})$ and $f_{\theta}^{1}(\mathbf{x}_{i},\mathbf{A})$ denote the predicted probabilities for normal and anomalous classes, respectively.

III-B Conformal Risk Control

The conformal risk control framework operates by applying a post-hoc calibration mechanism to refine model $f$ ’s predictions through a set-valued mapping $C_{\lambda}:\mathcal{X}\to 2^{\mathcal{Y}}$ . This mapping generates prediction sets whose conservatism is governed by a threshold parameter $\lambda\in\Lambda$ , with larger values of $\lambda$ producing more inclusive sets.

To assess the predictive performance of $C_{\lambda}(\cdot)$ , CRC consider a bounded loss function $\ell:2^{\mathcal{Y}}\times\mathcal{Y}\to(-\infty,B]$ that is non-increasing with respect to $\lambda$ . The primary objective is to select an adaptive threshold $\hat{\lambda}$ from calibration data that guarantees:

\mathbb{E}_{(x_{n+1},y_{n+1})}\left[\ell(C_{\hat{\lambda}}(x_{n+1}),y_{n+1})% \right]\leq\alpha

(1)

where $\alpha\in(-\infty,B)$ denotes the pre-specified risk tolerance level.

Given $n+1$ exchangeable random functions $\{L_{i}\}_{i=1}^{n+1}$ where each $L_{i}:\Lambda\to(-\infty,B]$ is non-increasing and bounded, [14] define the maximal admissible threshold as:

\lambda_{\max}:=\sup\left\{\lambda\in\Lambda\mid L_{n+1}(\lambda)\leq\alpha\right\}

(2)

The calibration procedure utilizes the first $n$ functions to determine $\hat{\lambda}$ that satisfies:

\mathbb{E}[L_{n+1}(\hat{\lambda})]\leq\alpha

(3)

The empirical risk estimator is defined as $\hat{R}_{n}(\lambda):=\frac{1}{n}\sum_{i=1}^{n}L_{i}(\lambda)$ . For desired risk level $\alpha$ , compute:

\begin{split}\hat{\lambda}&=\inf\left\{\lambda\in\Lambda:\frac{n}{n+1}\hat{R}_% {n}(\lambda)+\frac{B}{n+1}\leq\alpha\right\}\\ &=\inf\left\{\lambda\in\Lambda:\hat{R}_{n}(\lambda)\leq\alpha-\frac{B-\alpha}{% n}\right\}\end{split}

(4)

The monotonicity of both individual $L_{i}(\cdot)$ functions and the empirical risk $\hat{R}_{n}(\cdot)$ permits efficient computation of $\hat{\lambda}$ via binary search with arbitrary precision.

IV Methodology

IV-A Dual-Threshold Conformal Risk Control (DTCRC)

We adopt the false positive rate (FPR) and the false negative rate (FNR) as our target risk functions. The distributional discrepancy between anomalous and normal nodes in GAD scenarios introduces inherent limitations to single-threshold approaches in simultaneously controlling both FNR and FPR.

IV-A1 Risk Functions and Set-Valued Predictors in SGAD

For normal samples, the set-valued predictor is defined as:

C_{\lambda_{\text{normal}}}(x_{i})=\begin{cases}\{1\}&\text{if }f_{\theta}^{1}% (x_{i},A)\geq\lambda_{\text{normal}}\\ \{0,1\}&\text{otherwise}\end{cases}

(5)

with FPR risk $L_{i}^{\text{FPR}}(\lambda_{\text{normal}})=\mathbb{P}(0\notin C_{\lambda_{% \text{normal}}}(x_{i})|y_{i}=0)=\mathbb{I}(0\notin C_{\lambda_{\text{normal}}}% (x_{i}))$ . For anomalous samples:

C_{\lambda_{\text{ano}}}(x_{i})=\begin{cases}\{0\}&\text{if }f_{\theta}^{1}(x_% {i},A)<1-\lambda_{\text{ano}}\\ \{0,1\}&\text{otherwise}\end{cases}

(6)

with FNR risk $L_{i}^{\text{FNR}}(\lambda_{\text{ano}})=\mathbb{P}(1\notin C_{\lambda_{\text{% ano}}}(x_{i})|y_{i}=1)=\mathbb{I}(1\notin C_{\lambda_{\text{ano}}}(x_{i}))$ . Both risk functions are non-increasing and bounded within [0,1], satisfying the requirements for conformal risk control.

IV-A2 Exchangeability of Risk Functions

A sequence $\{(X_{i},Y_{i})\}_{i=1}^{N}$ is exchangeable if for any permutation $\sigma$ , their joint distribution satisfies $P((X_{1},Y_{1}),\ldots,(X_{n},Y_{n}))=P((X_{\sigma(1)},Y_{\sigma(1)}),\ldots,(% X_{\sigma(n)},Y_{\sigma(n)}))$ [15]. While traditional deep learning assumes independent and identically distributed (i.i.d.) data, SGAD exhibits two distinct properties:

•

The node-wise random variables exhibit interdependencies with adjacent neighbors, violating the i.i.d. assumption due to their inherent relational dependencies.
•

The representation learning in GNNs inherently induces structural dependencies through aggregation mechanisms that combine ego-node features with neighborhood information, thereby violating the i.i.d. assumption in their output distributions.

Notably, when using permutation-invariant GNN architectures in GAD scenario (e.g., GCN, GAT, GraphSAGE) under random data splits, the model’s predictions maintain exchangeability[27]. Consequently, the joint distribution of risk functions satisfies:

\mathbb{P}(L_{1},\ldots,L_{n+1})=\mathbb{P}(L_{\sigma(1)},\ldots,L_{\sigma(n+1% )})

(7)

IV-A3 FNR and FPR Control

Our framework integrates dual-threshold conformal risk control for false positive rate (CRC-FPR) and false negative rate (CRC-FNR), with final predictions derived through their intersection to ensure joint error control. The implementation pipeline comprises four critical stages. First, we train an anomaly detection model $f_{\theta}:\mathcal{X}\rightarrow\mathbb{R}^{n\times 2}$ through weighted cross-entropy optimization to learn discriminative node representations. During calibration, we strategically partition the data by class membership: For normal nodes, we compute the empirical FPR risk $\hat{R}_{n_{\textnormal{normal}}}(\lambda_{\textnormal{normal}})=\frac{1}{n_{% \textnormal{normal}}}\sum L_{i}^{\textnormal{FPR}}$ across the calibration set, while for anomalous nodes, we calculate the corresponding FNR risk $\hat{R}_{n_{\textnormal{ano}}}(\lambda_{\textnormal{ano}})=\frac{1}{n_{% \textnormal{ano}}}\sum L_{i}^{\textnormal{FNR}}$ . Subsequently, we employ binary search optimization to determine the optimal thresholds that satisfy the probabilistic error bounds:

	$\displaystyle\hat{\lambda}_{\textnormal{normal}}$	$\displaystyle=\inf\left\{\lambda:\hat{R}_{n_{\textnormal{normal}}}(\lambda)% \leq\alpha-\frac{1-\alpha}{n_{\textnormal{normal}}}\right\},$		(8)
	$\displaystyle\hat{\lambda}_{\textnormal{ano}}$	$\displaystyle=\inf\left\{\lambda:\hat{R}_{n_{\textnormal{ano}}}(\lambda)\leq% \alpha-\frac{1-\alpha}{n_{\textnormal{ano}}}\right\}.$		(9)

For test instances, the final prediction set $C(x_{\textnormal{test}})$ is obtained through the intersection $C_{\hat{\lambda}_{\textnormal{normal}}}(x_{\textnormal{test}})\cap C_{\hat{% \lambda}_{\textnormal{ano}}}(x_{\textnormal{test}})$ , ensuring simultaneous control of both error types. This approach provides provable guarantees that the expected FPR and FNR remain below the predefined significance level $\alpha$ .

TABLE I: Performance Comparison on Amazon and Yelp Datasets

Amazon Yelp Model Cov Ine Amb Single FNR FPR Cov Ine Amb Single FNR FPR GCN GCN-TPS 0.902 1.292 0.292 0.610 0.134 0.094 0.900 1.868 0.868 0.032 0.000 0.116 GCN-APS 0.999 1.987 0.987 0.012 0.000 0.001 0.999 1.999 0.999 0.000 0.000 0.000 GCN-RAPS 0.902 1.455 0.478 0.436 0.102 0.084 0.901 1.873 0.873 0.028 0.008 0.115 GCN-Ours 0.902 1.313 0.313 0.589 0.098 0.098 0.902 1.693 0.693 0.209 0.099 0.098 BernNet BernNet-TPS 0.901 1.124 0.124 0.777 0.098 0.099 0.901 1.516 0.516 0.384 0.044 0.109 BernNet-APS 1.000 1.972 0.972 0.028 0.000 0.000 0.998 1.989 0.989 0.010 0.000 0.002 BernNet-RAPS 0.901 1.250 0.327 0.613 0.069 0.059 0.900 1.596 0.601 0.302 0.048 0.106 BernNet-Ours 0.901 0.935 0.065 0.901 0.079 0.029 0.901 1.349 0.349 0.553 0.099 0.099 AMNet AMNet-TPS 0.901 0.923 0.077 0.901 0.059 0.018 0.900 1.445 0.445 0.455 0.075 0.104 AMNet-APS 0.901 0.941 0.059 0.901 0.121 0.030 0.999 1.989 0.989 0.010 0.000 0.002 AMNet-RAPS 0.902 1.019 0.166 0.810 0.066 0.020 0.900 1.539 0.547 0.357 0.065 0.101 AMNet-Ours 0.900 0.976 0.034 0.895 0.078 0.070 0.901 1.369 0.369 0.532 0.099 0.098 BWGNN BWGNN-TPS 0.901 0.914 0.086 0.901 0.047 0.010 0.900 1.248 0.248 0.653 0.147 0.092 BWGNN-APS 0.936 1.393 0.485 0.497 0.067 0.012 0.999 1.981 0.981 0.018 0.002 0.001 BWGNN-RAPS 0.902 0.976 0.135 0.846 0.065 0.014 0.900 1.360 0.394 0.523 0.124 0.076 BWGNN-Ours 0.900 0.942 0.058 0.900 0.050 0.041 0.901 1.320 0.320 0.581 0.100 0.098 GHRN GHRN-TPS 0.901 0.918 0.082 0.901 0.043 0.014 0.900 1.335 0.335 0.566 0.096 0.100 GHRN-APS 0.999 1.975 0.975 0.024 0.006 0.001 0.998 1.984 0.984 0.013 0.003 0.002 GHRN-RAPS 0.902 0.992 0.147 0.832 0.051 0.017 0.900 1.432 0.454 0.457 0.091 0.089 GHRN-Ours 0.900 0.932 0.068 0.900 0.039 0.032 0.901 1.312 0.312 0.590 0.100 0.098

IV-B Subgraph-aware Spectral Graph Neural Calibrator

Although the framework theoretically maintains both FNR and FPR below $\alpha$ , its practical utility is limited by prediction sets containing $\{0,1\}$ or $\emptyset$ , which are uninformative outcomes indicating classification uncertainty. To enhance reliability, we prioritize maximizing singleton predictions (0 or 1) through calibration-aware learning. Our empirical analysis reveals that the prediction set sizes after DTCRC exhibit subgraph-dependent distributions, corresponding to heterogeneous node-level frequency patterns in graph signals (see Fig.3).

Thus we propose Subgraph-aware Spectral Graph Neural Calibrator (SSGNC), featuring: (1) dynamic routing for latent subgraph representation learning, (2) Chebyshev polynomial-based spectral convolution with subgraph-specific filters, and (3) multi-layer architecture with hierarchical feature aggregation. The design explicitly addresses frequency heterogeneity through adaptive spectral filtering across localized subgraph structures.

IV-B1 Dynamic Routing Mechanism

Given input node features $H\in\mathbb{R}^{n\times d}$ , we learn $K$ latent subgraph representations through an iterative routing process. Let $C\in\mathbb{R}^{K\times d}$ denote the trainable prototype vectors. The routing probabilities are computed as:

s_{ik}^{(0)}=\frac{\exp(\langle h_{i},c_{k}\rangle)}{\sum_{j=1}^{K}\exp(% \langle h_{i},c_{j}\rangle)}

(10)

where $\langle\cdot,\cdot\rangle$ denotes the dot product, $h_{i}$ is the $i$ -th row of $H$ , and $c_{k}$ is the $k$ -th prototype. We refine the routing probabilities through $T$ iterations of normalization and prototype adjustment:

c_{k}^{(t)}=\frac{\sum_{i=1}^{N}s_{ik}^{(t-1)}h_{i}}{\sum_{i=1}^{N}s_{ik}^{(t-% 1)}+\epsilon}

(11)

s_{ik}^{(t)}=\frac{\exp(\langle h_{i},c_{k}^{(t)}\rangle)}{\sum_{j=1}^{K}\exp(% \langle h_{i},c_{j}^{(t)}\rangle)}

(12)

The final prototype vectors are updated with momentum:

C\leftarrow\beta C+(1-\beta)C^{(T)}

(13)

where $\beta$ is the momentum coefficient set to 0.9 in our implementation.

IV-B2 Chebyshev Graph Convolution

For spectral graph filtering, we employ Chebyshev polynomial approximation of graph spectral convolution[30]. Given the normalized graph Laplacian $\tilde{L}=I-D^{-1/2}AD^{-1/2}$ , the $m$ -th order Chebyshev basis is computed recursively:

$\displaystyle T_{0}(\tilde{L})X$	$\displaystyle=X$	(14)
$\displaystyle T_{1}(\tilde{L})X$	$\displaystyle=\tilde{L}X$	(15)
$\displaystyle T_{m}(\tilde{L})X$	$\displaystyle=2\tilde{L}T_{m-1}(\tilde{L})X-T_{m-2}(\tilde{L})X$	(16)

The subgraph-specific spectral convolution is then formulated as:

Z_{k}=\sum_{m=0}^{M}\theta_{k,m}T_{m}(\tilde{L})X

(17)

where $\theta_{k,m}\in\mathbb{R}^{d\times d}$ are learnable parameters for subgraph $k$ and polynomial order $m$ .

IV-B3 Subgraph-aware Spectral Filtering

The final node representations are obtained by combining the subgraph-specific convolutions weighted by the routing probabilities:

H^{l+1}=\sigma\left(\sum_{k=1}^{K}\text{diag}(s_{:,k}^{l})Z_{k}^{l}\right)

(18)

where $\sigma$ denotes the activation function and $s_{:,k}^{l}$ represents the routing probabilities for subgraph $k$ at layer $l$ .

IV-B4 Multi-layer Architecture

The complete SSGNC model with $L$ layers can be formulated as:

H^{(0)}=\text{MLP}_{\text{in}}(X)

(19)

H^{(l)}=\text{Dropout}\left(\sigma\left(\text{SS-Conv}^{(l)}(G,H^{(l-1)})% \right)\right)

(20)

Y=\text{MLP}_{\text{out}}\left(\text{Concat}(H^{(1)},\dots,H^{(L)})\right)

(21)

where $\text{SS-Conv}^{(l)}$ denotes our subgraph-aware spectral convolution layer at depth $l$ , and Concat represents feature concatenation.

V Experiments

V-A Datasets and Baselines

Datasets. We utilize four benchmarks spanning social networks and e-commerce: Reddit, Weibo, YelpChi, and Amazon. These encompass diverse graph structures including user-subreddit interactions, fraud patterns, and multi-relational reviews. Node features incorporate linguistic attributes (LIWC), text embeddings (FastText), and behavioral metadata.[21].

Baseline Models. We experiment on five GAD models: GCN, BernNet[31], AMNet[7], BWGNN[6], and GHRN[32]. For uncertainty quantification, three conformal predictors are evaluated: CP-TPS [15], CP-APS [16], and CP-RAPS [17]. The variants differ in non-conformity scoring: TPS prioritizes top-class probabilities, APS accumulates softmax values, while RAPS penalizes low-confidence class inclusion.

V-B Evaluations

To assess CRC-SGAD, we adopt six metrics[26]: Coverage (true label inclusion probability), Inefficiency (average set size), and Ambiguity (frequency of indeterminate 0,1 outputs). Singleton Rate tracks single-point predictions, while set-based FPR and FNR quantify error types. These enable rigorous multi-dimensional assessment of statistical reliability and practical utility in uncertainty-aware systems.

V-C Main Results

V-C1 Performance Analysis

Our experiments reveal fundamental limitations in existing conformal prediction approaches. Notably, both the approaches anchored in TPS and RAPS are incapable of ensuring control over the FPR and FNR. While the APS-based method appears to achieve dual error rates below 0.01, this performance comes at the expense of prohibitively large prediction sets. Specifically, average prediction set sizes exceed 1.7 across benchmark datasets. Such inflated uncertainty quantification renders the method impractical for real-world applications.

Conversely, the proposed CRC-SGAD framework more effectively addresses this critical trade-off. The method achieves strict false negative rate control (FNR $<$ 0.01, FPR $<$ 0.01) while maintaining prediction Inefficiency with average set sizes reduced by 15.3-52.8% compared to APS baselines.

V-C2 Effectiveness of Sub-Modules

We rigorously validate the effectiveness of our proposed SSGNC framework across five benchmark datasets and five GAD methods. We find the proposed spectral calibration module achieves an average 13.4% reduction in Singleton ratio across all experimental configurations, while maintaining risk control guarantees. This statistically significant improvement demonstrates the efficacy of our subgraph-aware calibration framework in enhancing the operational efficiency of prediction sets without compromising risk control objectives.

TABLE II: The improvement of Singleton ratio after SSGNC

	amazon	yelp	reddit	weibo
GCN	9.48% $\uparrow$	45.1% $\uparrow$	1.17% $\uparrow$	2.00% $\uparrow$
BernNet	3.70% $\uparrow$	4.73% $\uparrow$	66.9% $\uparrow$	9.56% $\uparrow$
AMNet	5.80% $\uparrow$	3.70% $\uparrow$	8.31% $\uparrow$	0.22% $\uparrow$
BWGNN	0.00% $\uparrow$	2.29% $\uparrow$	44.9% $\uparrow$	4.43% $\uparrow$
GHRN	0.00% $\uparrow$	3.15% $\uparrow$	52.5% $\uparrow$	0.11% $\uparrow$

V-C3 Parameter Sensitivity Analysis

We conduct a systematic evaluation of the BWGNN model on the Yelp dataset to investigate the impact of predefined risk thresholds and the number of SSGNC prototypes on the empirical FPR, FNR, and singleton ratio in prediction sets. As shown in Table III, our framework demonstrates robust error rate control through adaptive calibration of prediction sets, consistently satisfying the target error rates across all predefined FNR and FPR configurations. The analysis of prototype quantity reveals a non-monotonic relationship with singleton ratio: as the number of prototypes increases from 3 to 7, the singleton ratio first rises (peaking at 5 prototypes) and then declines.

TABLE III: Parameter sensitivity analysis of BWGNN model on Yelp dataset.

(FNR, FPR)	(0.1, 0.1)			(0.05, 0.1)	(0.1, 0.05)
Prototype Number	3	5	7	5	5
FNR	0.100	0.100	0.100	0.048	0.100
FPR	0.098	0.098	0.098	0.098	0.049
Singleton Ratio	0.572	0.587	0.576	0.443	0.537

VI Conclusion

In conclusion, our study addresses the critical need for class-aware uncertainty quantification and calibration in imbalanced graph anomaly detection. Our findings reveal that existing GAD models often yield poorly calibrated confidence estimates, which further deteriorate under structural perturbations. Our CRC-SGAD framework achieves provable statistical guarantees on FNR and FPR while incorporating a subgraph-aware spectral graph neural calibrator that optimizes prediction set sizes without compromising statistical validity. These results collectively highlight the framework’s potential to advance reliable graph anomaly detection systems in safety-critical applications.

References

[1] X. Huang, Y. Yang, Y. Wang, C. Wang, Z. Zhang, J. Xu, L. Chen, and M. Vazirgiannis, “Dgraph: A large-scale financial dataset for graph anomaly detection,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 22 765–22 777.
[2] G. Zhang, S. Zhang, and G. Yuan, “Bayesian graph local extrema convolution with long-tail strategy for misinformation detection,” ACM Trans. Knowl. Discov. Data, vol. 18, no. 4, 2024.
[3] Y. Liu, X. Ao, Z. Qin, J. Chi, J. Feng, H. Yang, and Q. He, “Pick and choose: A gnn-based imbalanced learning approach for fraud detection,” in Proceedings of the Web Conference 2021, 2021, p. 3168–3177.
[4] Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, “Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, p. 315–324.
[5] F. Shi, Y. Cao, Y. Shang, Y. Zhou, C. Zhou, and J. Wu, “H2-fdetector: A gnn-based fraud detector with homophilic and heterophilic connections,” in Proceedings of the ACM Web Conference 2022, 2022, p. 1486–1494.
[6] J. Tang, J. Li, Z. Gao, and J. Li, “Rethinking graph neural networks for anomaly detection,” in Proceedings of the 39th International Conference on Machine Learning, vol. 162, 2022, pp. 21 076–21 089.
[7] Z. Chai, S. You, Y. Yang, S. Pu, J. Xu, H. Cai, and W. Jiang, “Can abnormality be detected by graph neural networks?” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, 2022, pp. 1945–1951.
[8] H. Wang, Y. Dou, C. Chen, L. Sun, P. S. Yu, and K. Shu, “Attacking fake news detectors via manipulating news social engagement,” in Proceedings of the ACM Web Conference 2023, 2023, p. 3978–3986.
[9] J. Wu, X. Liu, D. Cheng, Y. Ouyang, X. Wu, and Y. Zheng, “Safeguarding fraud detection from attacks: A robust graph learning approach,” in Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, 2024, pp. 7500–7508.
[10] P. Zhu, Z. Pan, Y. Liu, J. Tian, K. Tang, and Z. Wang, “A general black-box adversarial attack on graph-based fake news detectors,” in Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, 2024, pp. 568–576.
[11] J. Chen, Y. Wu, X. Xu, Y. Chen, H. Zheng, and Q. Xuan, “Fast gradient attack on network embedding,” arXiv preprint arXiv:1809.02797, 2018.
[12] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan, “Hiding individuals and communities in a social network,” Nature Human Behaviour, vol. 2, no. 2, pp. 139–147, 2018.
[13] L. Wu, B. Lei, D. Xu, and D. Zhou, “Towards reliable rare category analysis on graphs via individual calibration,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, p. 2629–2638.
[14] A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster, “Conformal risk control,” in The Twelfth International Conference on Learning Representations, 2024.
[15] A. N. Angelopoulos and S. Bates, “Conformal prediction: A gentle introduction,” Foundations and Trends® in Machine Learning, vol. 16, no. 4, pp. 494–591, 2023.
[16] Y. Romano, M. Sesia, and E. Candes, “Classification with valid and adaptive coverage,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 3581–3591.
[17] A. N. Angelopoulos, S. Bates, M. Jordan, and J. Malik, “Uncertainty sets for image classifiers using conformal prediction,” in International Conference on Learning Representations, 2021.
[18] X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and L. Akoglu, “A comprehensive survey on graph anomaly detection with deep learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 012–12 038, 2023.
[19] M. Jin, Y. Liu, Y. Zheng, L. Chi, Y.-F. Li, and S. Pan, “Anemone: Graph anomaly detection with multi-scale contrastive learning,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, p. 3122–3126.
[20] X. Ma, F. Liu, J. Wu, J. Yang, S. Xue, and Q. Z. Sheng, “Rethinking unsupervised graph anomaly detection with deep learning: Residuals and objectives,” IEEE Transactions on Knowledge and Data Engineering, vol. 37, no. 2, pp. 881–895, 2025.
[21] J. Tang, F. Hua, Z. Gao, P. Zhao, and J. Li, “Gadbench: Revisiting and benchmarking supervised graph anomaly detection,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 29 628–29 653.
[22] W. Zhuo, Z. Liu, B. Hooi, B. He, G. Tan, R. Fathony, and J. Chen, “Partitioning message passing for graph fraud detection,” in The Twelfth International Conference on Learning Representations, 2024.
[23] F. Wang, Y. Liu, K. Liu, Y. Wang, S. Medya, and P. S. Yu, “Uncertainty in graph neural networks: A survey,” arXiv preprint arXiv:2403.07185, 2024.
[24] X. Wang, H. Liu, C. Shi, and C. Yang, “Be confident! towards trustworthy graph neural networks via confidence calibration,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 23 768–23 779.
[25] M. Stadler, B. Charpentier, S. Geisler, D. Zügner, and S. Günnemann, “Graph posterior network: Bayesian predictive uncertainty for node classification,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 18 033–18 048.
[26] S. H. Zargarbashi, S. Antonelli, and A. Bojchevski, “Conformal prediction sets for graph neural networks,” in Proceedings of the 40th International Conference on Machine Learning, vol. 202, 2023, pp. 12 292–12 318.
[27] K. Huang, Y. Jin, E. Candes, and J. Leskovec, “Uncertainty quantification over graph with conformalized graph neural networks,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 26 699–26 721.
[28] Y. Zhang, S. Pal, M. Coates, and D. Ustebay, “Bayesian graph convolutional neural networks for semi-supervised classification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 5829–5836, 2019.
[29] Q. Wang, S. Wang, D. Zhuang, H. Koutsopoulos, and J. Zhao, “Uncertainty quantification of spatiotemporal travel demand with probabilistic graph neural networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 8, pp. 8770–8781, 2024.
[30] M. He, Z. Wei, and J.-R. Wen, “Convolutional neural networks on graphs with chebyshev approximation, revisited,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 7264–7276.
[31] M. He, Z. Wei, z. Huang, and H. Xu, “Bernnet: Learning arbitrary graph spectral filters via bernstein approximation,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 14 239–14 251.
[32] Y. Gao, X. Wang, X. He, Z. Liu, H. Feng, and Y. Zhang, “Addressing heterophily in graph anomaly detection: A perspective of graph spectrum,” in Proceedings of the ACM Web Conference 2023, 2023, p. 1528–1538.

CRC-SGAD: Conformal Risk Control for Supervised Graph Anomaly Detection ††thanks: