CRC-SGAD: Conformal Risk Control for Supervised Graph Anomaly Detection
thanks:

Songran Bai12, Xiaolong Zheng12, Daniel Dajun Zeng12 1 State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
2School of Artificial Intelligence, University of Chinese Academy of Sciences
Beijing, China
{baisongran2020, xiaolong.zheng, dajun.zeng}@ia.ac.cn
Abstract

Graph Anomaly Detection (GAD) is critical in security-sensitive domains, yet faces reliability challenges: miscalibrated confidence estimation (underconfidence in normal nodes, overconfidence in anomalies), adversarial vulnerability of derived confidence score under structural perturbations, and limited efficacy of conventional calibration methods for sparse anomaly patterns. Thus we propose CRC-SGAD, a framework integrating statistical risk control into GAD via two innovations: (1) A Dual-Threshold Conformal Risk Control mechanism that provides theoretically guaranteed bounds for both False Negative Rate (FNR) and False Positive Rate (FPR) through providing prediction sets; (2) A Subgraph-aware Spectral Graph Neural Calibrator (SSGNC) that optimizes node representations through adaptive spectral filtering while reducing the size of prediction sets via hybrid loss optimization. Experiments on four datasets and five GAD models demonstrate statistically significant improvements in FNR and FPR control and prediction set size. CRC-SGAD establishes a paradigm for statistically rigorous anomaly detection in graph-structured security applications.

Index Terms:
Graph Anomaly Detection, Conformal Risk Control, Uncertainty Quantification, Calibration, Spectral Graph Neural Network.

I Introduction

Graph-based anomaly detection (GAD) has become pivotal in security-critical applications like fraud detection[1] and misinformation identification[2], leveraging graph neural networks (GNNs) through two paradigms: spatial methods enhance graph homophily through neighborhood redefinition or mitigate over-smoothing via adaptive message passing mechanisms[3, 4, 5], while spectral approaches design specialized filters to capture mid- and high-frequency signals induced by anomalous patterns[6, 7].

However, the deployment of GAD models in high-stakes operational environments faces critical reliability challenges. For instance, malicious entities evade detection by fabricating connections, causing costly misclassifications[8, 9, 10]. Our preliminary experiments reveal three critical insights: (1) Even state-of-the-art models like BWGNN[6] exhibit unreliable confidence scores, displaying overconfidence for anomalous nodes while underestimating certainty for normal nodes (see Fig1(a)(b)); (2) Subtle targeted structural perturbations[11] on correctly classified nodes can induce high-confidence misclassifications (see Fig1(d)); (3) Global structural perturbations[12] exacerbate confidence disparity between node classes(see Fig1(c)). The lack of precise class-dependent uncertainty quantification pose fundamental limitations for reliable GAD models. Traditional calibration methods like Expected Calibration Error (ECE) are unreliable due to insufficient anomalous observations in most confidence bins caused by the inherent rarity. And individual-based calibration method[13] lacks theoretical guarantees. Conformal Risk Control (CRC)[14] emerges as a promising alternative through finite-sample statistical guarantees.

Refer to caption
Figure 1: The unreliable prediction confidence of BWGNN on Amazon dataset. (a) and (b) Figures depict the calibration error plots for anomalous nodes and normal nodes, respectively. (c) Figure illustrates the confidence distribution of misclassified nodes (both normal and anomalous) following a global attack. (d) Figure demonstrates the confidence variation of attacked nodes under a targeted attack scenario .

We therefore propose the CRC-SGAD framework that generates prediction sets with theoretically guaranteed risk control - the expected FNR and FPR are provably bounded below pre-defined thresholds (e.g., 0.1). Our key contributions are threefold: First, we develop a dual-threshold conformal risk control mechanism tailored for GAD, establishing set-valued functions for normal and anomalous nodes respectively. We prove this adaptation preserves exchangeability assumptions required for valid CRC. Second, to address the observed local heterogeneity in prediction set sizes across neighborhoods, we design a subgraph-aware spectral graph neural calibrator (SSGNC). This component learns frequency-aware node representations through subgraph-specific spectral filters, jointly optimizing prediction inefficiency via a hybrid loss combining conformal risk and weighted cross-entropy. Third, we conduct comprehensive evaluations across four datasets using five classic GAD methods, demonstrating statistically significant improvements in Inefficiency and FNR and FPR control over existing conformal approaches[15, 16, 17].

II Related Work

II-A Graph Anomaly Detection

GAD methods are typically categorized into supervised (SGAD) and unsupervised (UGAD) paradigms based on label availability[18]. While UGAD primarily employs reconstruction errors or graph contrastive learning for anomaly scoring[19, 20], we focus on SGAD due to its superior practicality enabled by partially available trustable labels. Recent advances address heterophily in SGAD through spatial and spectral approaches[21]. Spatial methods enhance node representation by redefining neighborhoods or modifying message passing: H2-FDetector[5] differentiates homophilic and heterophilic connections for adaptive message aggregation, PC-GNN[3] employs imbalance-aware subgraph sampling for structural refinement, and PMP[22] develops a self-tuning mechanism for information fusion from heterogeneous neighbors. Spectral approaches tackle this through adaptive filtering. AMNet[7] captures dual-frequency signals while BWGNN[6] designs beta wavelet-based band-pass filters to preserve anomalous high-frequency patterns.

II-B Uncertainty Quantification for GNNs

Following the taxonomy in deep neural networks, uncertainty in GNNs can also be categorized into aleatoric and epistemic uncertainty[23]. Existing approaches primarily quantify uncertainty through maximum softmax probability or entropy[24], Bayesian GNNs with Dirichlet priors[25], frequentist post-hoc methods (e.g., conformal prediction)[26, 27], Bayesian stochastic parameter models[28], or ensemble methods[29]. Among these, conformal prediction stands out as a model-agnostic frequentist approach that provides theoretical guarantees with minimal additional parameters, high computational efficiency, and robustness. Its extension, conformal risk control[14], offers more flexible risk management. For imbalanced graph classification tasks, [13] extended distribution-based calibration metrics to node-level predictions and proposed Expected Individual Calibration Error (EICE) to address miscalibration issues in rare minority classes. However, while EICE achieves better calibrated probabilities for both normal and anomalous nodes, it remains a point estimation method without theoretical guarantees for controlling false negative and positive rates.

Refer to caption
Figure 2: The overall framework of our proposed CRC-SGAD.

III Preliminaries

III-A Supervised Graph Anomaly Detection

Consider an undirected graph G=(V,𝚺,𝐀)𝐺𝑉𝚺𝐀G=(V,\mathbf{\Sigma},\mathbf{A})italic_G = ( italic_V , bold_Σ , bold_A ) where V={vi}i=1n𝑉superscriptsubscriptsubscript𝑣𝑖𝑖1𝑛V=\{v_{i}\}_{i=1}^{n}italic_V = { italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denotes the node set. Each node viVsubscript𝑣𝑖𝑉v_{i}\in Vitalic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V possesses a feature vector 𝐱idsubscript𝐱𝑖superscript𝑑\mathbf{x}_{i}\in\mathbb{R}^{d}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, forming the feature matrix 𝐗=[𝐱1,𝐱2,,𝐱n]n×d𝐗superscriptsubscript𝐱1subscript𝐱2subscript𝐱𝑛topsuperscript𝑛𝑑\mathbf{X}=[\mathbf{x}_{1},\mathbf{x}_{2},...,\mathbf{x}_{n}]^{\top}\in\mathbb% {R}^{n\times d}bold_X = [ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT. The adjacency matrix 𝐀{0,1}n×n𝐀superscript01𝑛𝑛\mathbf{A}\in\{0,1\}^{n\times n}bold_A ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT encodes edge connections, where 𝐀ij=1subscript𝐀𝑖𝑗1\mathbf{A}_{ij}=1bold_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 iff nodes visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are connected. Let 𝐘{0,1}n𝐘superscript01𝑛\mathbf{Y}\in\{0,1\}^{n}bold_Y ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denote the binary anomaly indicator vector.

The objective is to learn an estimator fθ:d×{0,1}n×n[0,1]2:subscript𝑓𝜃superscript𝑑superscript01𝑛𝑛superscript012f_{\theta}:\mathbb{R}^{d}\times\{0,1\}^{n\times n}\rightarrow[0,1]^{2}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT → [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT that predicts anomaly probabilities, where fθ0(𝐱i,𝐀)superscriptsubscript𝑓𝜃0subscript𝐱𝑖𝐀f_{\theta}^{0}(\mathbf{x}_{i},\mathbf{A})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_A ) and fθ1(𝐱i,𝐀)superscriptsubscript𝑓𝜃1subscript𝐱𝑖𝐀f_{\theta}^{1}(\mathbf{x}_{i},\mathbf{A})italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_A ) denote the predicted probabilities for normal and anomalous classes, respectively.

III-B Conformal Risk Control

The conformal risk control framework operates by applying a post-hoc calibration mechanism to refine model f𝑓fitalic_f’s predictions through a set-valued mapping Cλ:𝒳2𝒴:subscript𝐶𝜆𝒳superscript2𝒴C_{\lambda}:\mathcal{X}\to 2^{\mathcal{Y}}italic_C start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT : caligraphic_X → 2 start_POSTSUPERSCRIPT caligraphic_Y end_POSTSUPERSCRIPT. This mapping generates prediction sets whose conservatism is governed by a threshold parameter λΛ𝜆Λ\lambda\in\Lambdaitalic_λ ∈ roman_Λ, with larger values of λ𝜆\lambdaitalic_λ producing more inclusive sets.

To assess the predictive performance of Cλ()subscript𝐶𝜆C_{\lambda}(\cdot)italic_C start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( ⋅ ), CRC consider a bounded loss function :2𝒴×𝒴(,B]:superscript2𝒴𝒴𝐵\ell:2^{\mathcal{Y}}\times\mathcal{Y}\to(-\infty,B]roman_ℓ : 2 start_POSTSUPERSCRIPT caligraphic_Y end_POSTSUPERSCRIPT × caligraphic_Y → ( - ∞ , italic_B ] that is non-increasing with respect to λ𝜆\lambdaitalic_λ. The primary objective is to select an adaptive threshold λ^^𝜆\hat{\lambda}over^ start_ARG italic_λ end_ARG from calibration data that guarantees:

𝔼(xn+1,yn+1)[(Cλ^(xn+1),yn+1)]αsubscript𝔼subscript𝑥𝑛1subscript𝑦𝑛1delimited-[]subscript𝐶^𝜆subscript𝑥𝑛1subscript𝑦𝑛1𝛼\mathbb{E}_{(x_{n+1},y_{n+1})}\left[\ell(C_{\hat{\lambda}}(x_{n+1}),y_{n+1})% \right]\leq\alphablackboard_E start_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ roman_ℓ ( italic_C start_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ] ≤ italic_α (1)

where α(,B)𝛼𝐵\alpha\in(-\infty,B)italic_α ∈ ( - ∞ , italic_B ) denotes the pre-specified risk tolerance level.

Given n+1𝑛1n+1italic_n + 1 exchangeable random functions {Li}i=1n+1superscriptsubscriptsubscript𝐿𝑖𝑖1𝑛1\{L_{i}\}_{i=1}^{n+1}{ italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT where each Li:Λ(,B]:subscript𝐿𝑖Λ𝐵L_{i}:\Lambda\to(-\infty,B]italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : roman_Λ → ( - ∞ , italic_B ] is non-increasing and bounded, [14] define the maximal admissible threshold as:

λmax:=sup{λΛLn+1(λ)α}assignsubscript𝜆supremumconditional-set𝜆Λsubscript𝐿𝑛1𝜆𝛼\lambda_{\max}:=\sup\left\{\lambda\in\Lambda\mid L_{n+1}(\lambda)\leq\alpha\right\}italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT := roman_sup { italic_λ ∈ roman_Λ ∣ italic_L start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( italic_λ ) ≤ italic_α } (2)

The calibration procedure utilizes the first n𝑛nitalic_n functions to determine λ^^𝜆\hat{\lambda}over^ start_ARG italic_λ end_ARG that satisfies:

𝔼[Ln+1(λ^)]α𝔼delimited-[]subscript𝐿𝑛1^𝜆𝛼\mathbb{E}[L_{n+1}(\hat{\lambda})]\leq\alphablackboard_E [ italic_L start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG ) ] ≤ italic_α (3)

The empirical risk estimator is defined as R^n(λ):=1ni=1nLi(λ)assignsubscript^𝑅𝑛𝜆1𝑛superscriptsubscript𝑖1𝑛subscript𝐿𝑖𝜆\hat{R}_{n}(\lambda):=\frac{1}{n}\sum_{i=1}^{n}L_{i}(\lambda)over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_λ ) := divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_λ ). For desired risk level α𝛼\alphaitalic_α, compute:

λ^=inf{λΛ:nn+1R^n(λ)+Bn+1α}=inf{λΛ:R^n(λ)αBαn}^𝜆infimumconditional-set𝜆Λ𝑛𝑛1subscript^𝑅𝑛𝜆𝐵𝑛1𝛼infimumconditional-set𝜆Λsubscript^𝑅𝑛𝜆𝛼𝐵𝛼𝑛\begin{split}\hat{\lambda}&=\inf\left\{\lambda\in\Lambda:\frac{n}{n+1}\hat{R}_% {n}(\lambda)+\frac{B}{n+1}\leq\alpha\right\}\\ &=\inf\left\{\lambda\in\Lambda:\hat{R}_{n}(\lambda)\leq\alpha-\frac{B-\alpha}{% n}\right\}\end{split}start_ROW start_CELL over^ start_ARG italic_λ end_ARG end_CELL start_CELL = roman_inf { italic_λ ∈ roman_Λ : divide start_ARG italic_n end_ARG start_ARG italic_n + 1 end_ARG over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_λ ) + divide start_ARG italic_B end_ARG start_ARG italic_n + 1 end_ARG ≤ italic_α } end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_inf { italic_λ ∈ roman_Λ : over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_λ ) ≤ italic_α - divide start_ARG italic_B - italic_α end_ARG start_ARG italic_n end_ARG } end_CELL end_ROW (4)

The monotonicity of both individual Li()subscript𝐿𝑖L_{i}(\cdot)italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) functions and the empirical risk R^n()subscript^𝑅𝑛\hat{R}_{n}(\cdot)over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ) permits efficient computation of λ^^𝜆\hat{\lambda}over^ start_ARG italic_λ end_ARG via binary search with arbitrary precision.

IV Methodology

IV-A Dual-Threshold Conformal Risk Control (DTCRC)

We adopt the false positive rate (FPR) and the false negative rate (FNR) as our target risk functions. The distributional discrepancy between anomalous and normal nodes in GAD scenarios introduces inherent limitations to single-threshold approaches in simultaneously controlling both FNR and FPR.

IV-A1 Risk Functions and Set-Valued Predictors in SGAD

For normal samples, the set-valued predictor is defined as:

Cλnormal(xi)={{1}if fθ1(xi,A)λnormal{0,1}otherwisesubscript𝐶subscript𝜆normalsubscript𝑥𝑖cases1if superscriptsubscript𝑓𝜃1subscript𝑥𝑖𝐴subscript𝜆normal01otherwiseC_{\lambda_{\text{normal}}}(x_{i})=\begin{cases}\{1\}&\text{if }f_{\theta}^{1}% (x_{i},A)\geq\lambda_{\text{normal}}\\ \{0,1\}&\text{otherwise}\end{cases}italic_C start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { start_ROW start_CELL { 1 } end_CELL start_CELL if italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_A ) ≥ italic_λ start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL { 0 , 1 } end_CELL start_CELL otherwise end_CELL end_ROW (5)

with FPR risk LiFPR(λnormal)=(0Cλnormal(xi)|yi=0)=𝕀(0Cλnormal(xi))superscriptsubscript𝐿𝑖FPRsubscript𝜆normal0conditionalsubscript𝐶subscript𝜆normalsubscript𝑥𝑖subscript𝑦𝑖0𝕀0subscript𝐶subscript𝜆normalsubscript𝑥𝑖L_{i}^{\text{FPR}}(\lambda_{\text{normal}})=\mathbb{P}(0\notin C_{\lambda_{% \text{normal}}}(x_{i})|y_{i}=0)=\mathbb{I}(0\notin C_{\lambda_{\text{normal}}}% (x_{i}))italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT FPR end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT ) = blackboard_P ( 0 ∉ italic_C start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) = blackboard_I ( 0 ∉ italic_C start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ). For anomalous samples:

Cλano(xi)={{0}if fθ1(xi,A)<1λano{0,1}otherwisesubscript𝐶subscript𝜆anosubscript𝑥𝑖cases0if superscriptsubscript𝑓𝜃1subscript𝑥𝑖𝐴1subscript𝜆ano01otherwiseC_{\lambda_{\text{ano}}}(x_{i})=\begin{cases}\{0\}&\text{if }f_{\theta}^{1}(x_% {i},A)<1-\lambda_{\text{ano}}\\ \{0,1\}&\text{otherwise}\end{cases}italic_C start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { start_ROW start_CELL { 0 } end_CELL start_CELL if italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_A ) < 1 - italic_λ start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL { 0 , 1 } end_CELL start_CELL otherwise end_CELL end_ROW (6)

with FNR risk LiFNR(λano)=(1Cλano(xi)|yi=1)=𝕀(1Cλano(xi))superscriptsubscript𝐿𝑖FNRsubscript𝜆ano1conditionalsubscript𝐶subscript𝜆anosubscript𝑥𝑖subscript𝑦𝑖1𝕀1subscript𝐶subscript𝜆anosubscript𝑥𝑖L_{i}^{\text{FNR}}(\lambda_{\text{ano}})=\mathbb{P}(1\notin C_{\lambda_{\text{% ano}}}(x_{i})|y_{i}=1)=\mathbb{I}(1\notin C_{\lambda_{\text{ano}}}(x_{i}))italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT FNR end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT ) = blackboard_P ( 1 ∉ italic_C start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) = blackboard_I ( 1 ∉ italic_C start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ). Both risk functions are non-increasing and bounded within [0,1], satisfying the requirements for conformal risk control.

IV-A2 Exchangeability of Risk Functions

A sequence {(Xi,Yi)}i=1Nsuperscriptsubscriptsubscript𝑋𝑖subscript𝑌𝑖𝑖1𝑁\{(X_{i},Y_{i})\}_{i=1}^{N}{ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is exchangeable if for any permutation σ𝜎\sigmaitalic_σ, their joint distribution satisfies P((X1,Y1),,(Xn,Yn))=P((Xσ(1),Yσ(1)),,(Xσ(n),Yσ(n)))𝑃subscript𝑋1subscript𝑌1subscript𝑋𝑛subscript𝑌𝑛𝑃subscript𝑋𝜎1subscript𝑌𝜎1subscript𝑋𝜎𝑛subscript𝑌𝜎𝑛P((X_{1},Y_{1}),\ldots,(X_{n},Y_{n}))=P((X_{\sigma(1)},Y_{\sigma(1)}),\ldots,(% X_{\sigma(n)},Y_{\sigma(n)}))italic_P ( ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) = italic_P ( ( italic_X start_POSTSUBSCRIPT italic_σ ( 1 ) end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_σ ( 1 ) end_POSTSUBSCRIPT ) , … , ( italic_X start_POSTSUBSCRIPT italic_σ ( italic_n ) end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_σ ( italic_n ) end_POSTSUBSCRIPT ) )[15]. While traditional deep learning assumes independent and identically distributed (i.i.d.) data, SGAD exhibits two distinct properties:

  • The node-wise random variables exhibit interdependencies with adjacent neighbors, violating the i.i.d. assumption due to their inherent relational dependencies.

  • The representation learning in GNNs inherently induces structural dependencies through aggregation mechanisms that combine ego-node features with neighborhood information, thereby violating the i.i.d. assumption in their output distributions.

Notably, when using permutation-invariant GNN architectures in GAD scenario (e.g., GCN, GAT, GraphSAGE) under random data splits, the model’s predictions maintain exchangeability[27]. Consequently, the joint distribution of risk functions satisfies:

(L1,,Ln+1)=(Lσ(1),,Lσ(n+1))subscript𝐿1subscript𝐿𝑛1subscript𝐿𝜎1subscript𝐿𝜎𝑛1\mathbb{P}(L_{1},\ldots,L_{n+1})=\mathbb{P}(L_{\sigma(1)},\ldots,L_{\sigma(n+1% )})blackboard_P ( italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) = blackboard_P ( italic_L start_POSTSUBSCRIPT italic_σ ( 1 ) end_POSTSUBSCRIPT , … , italic_L start_POSTSUBSCRIPT italic_σ ( italic_n + 1 ) end_POSTSUBSCRIPT ) (7)

IV-A3 FNR and FPR Control

Our framework integrates dual-threshold conformal risk control for false positive rate (CRC-FPR) and false negative rate (CRC-FNR), with final predictions derived through their intersection to ensure joint error control. The implementation pipeline comprises four critical stages. First, we train an anomaly detection model fθ:𝒳n×2:subscript𝑓𝜃𝒳superscript𝑛2f_{\theta}:\mathcal{X}\rightarrow\mathbb{R}^{n\times 2}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : caligraphic_X → blackboard_R start_POSTSUPERSCRIPT italic_n × 2 end_POSTSUPERSCRIPT through weighted cross-entropy optimization to learn discriminative node representations. During calibration, we strategically partition the data by class membership: For normal nodes, we compute the empirical FPR risk R^nnormal(λnormal)=1nnormalLiFPRsubscript^𝑅subscript𝑛normalsubscript𝜆normal1subscript𝑛normalsuperscriptsubscript𝐿𝑖FPR\hat{R}_{n_{\textnormal{normal}}}(\lambda_{\textnormal{normal}})=\frac{1}{n_{% \textnormal{normal}}}\sum L_{i}^{\textnormal{FPR}}over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_ARG ∑ italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT FPR end_POSTSUPERSCRIPT across the calibration set, while for anomalous nodes, we calculate the corresponding FNR risk R^nano(λano)=1nanoLiFNRsubscript^𝑅subscript𝑛anosubscript𝜆ano1subscript𝑛anosuperscriptsubscript𝐿𝑖FNR\hat{R}_{n_{\textnormal{ano}}}(\lambda_{\textnormal{ano}})=\frac{1}{n_{% \textnormal{ano}}}\sum L_{i}^{\textnormal{FNR}}over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_ARG ∑ italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT FNR end_POSTSUPERSCRIPT. Subsequently, we employ binary search optimization to determine the optimal thresholds that satisfy the probabilistic error bounds:

λ^normalsubscript^𝜆normal\displaystyle\hat{\lambda}_{\textnormal{normal}}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT =inf{λ:R^nnormal(λ)α1αnnormal},absentinfimumconditional-set𝜆subscript^𝑅subscript𝑛normal𝜆𝛼1𝛼subscript𝑛normal\displaystyle=\inf\left\{\lambda:\hat{R}_{n_{\textnormal{normal}}}(\lambda)% \leq\alpha-\frac{1-\alpha}{n_{\textnormal{normal}}}\right\},= roman_inf { italic_λ : over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_λ ) ≤ italic_α - divide start_ARG 1 - italic_α end_ARG start_ARG italic_n start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_ARG } , (8)
λ^anosubscript^𝜆ano\displaystyle\hat{\lambda}_{\textnormal{ano}}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT =inf{λ:R^nano(λ)α1αnano}.absentinfimumconditional-set𝜆subscript^𝑅subscript𝑛ano𝜆𝛼1𝛼subscript𝑛ano\displaystyle=\inf\left\{\lambda:\hat{R}_{n_{\textnormal{ano}}}(\lambda)\leq% \alpha-\frac{1-\alpha}{n_{\textnormal{ano}}}\right\}.= roman_inf { italic_λ : over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_λ ) ≤ italic_α - divide start_ARG 1 - italic_α end_ARG start_ARG italic_n start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_ARG } . (9)

For test instances, the final prediction set C(xtest)𝐶subscript𝑥testC(x_{\textnormal{test}})italic_C ( italic_x start_POSTSUBSCRIPT test end_POSTSUBSCRIPT ) is obtained through the intersection Cλ^normal(xtest)Cλ^ano(xtest)subscript𝐶subscript^𝜆normalsubscript𝑥testsubscript𝐶subscript^𝜆anosubscript𝑥testC_{\hat{\lambda}_{\textnormal{normal}}}(x_{\textnormal{test}})\cap C_{\hat{% \lambda}_{\textnormal{ano}}}(x_{\textnormal{test}})italic_C start_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT normal end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT test end_POSTSUBSCRIPT ) ∩ italic_C start_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT ano end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT test end_POSTSUBSCRIPT ), ensuring simultaneous control of both error types. This approach provides provable guarantees that the expected FPR and FNR remain below the predefined significance level α𝛼\alphaitalic_α.

TABLE I: Performance Comparison on Amazon and Yelp Datasets

Amazon Yelp Model Cov Ine Amb Single FNR FPR Cov Ine Amb Single FNR FPR GCN GCN-TPS 0.902 1.292 0.292 0.610 0.134 0.094 0.900 1.868 0.868 0.032 0.000 0.116 GCN-APS 0.999 1.987 0.987 0.012 0.000 0.001 0.999 1.999 0.999 0.000 0.000 0.000 GCN-RAPS 0.902 1.455 0.478 0.436 0.102 0.084 0.901 1.873 0.873 0.028 0.008 0.115 GCN-Ours 0.902 1.313 0.313 0.589 0.098 0.098 0.902 1.693 0.693 0.209 0.099 0.098 BernNet BernNet-TPS 0.901 1.124 0.124 0.777 0.098 0.099 0.901 1.516 0.516 0.384 0.044 0.109 BernNet-APS 1.000 1.972 0.972 0.028 0.000 0.000 0.998 1.989 0.989 0.010 0.000 0.002 BernNet-RAPS 0.901 1.250 0.327 0.613 0.069 0.059 0.900 1.596 0.601 0.302 0.048 0.106 BernNet-Ours 0.901 0.935 0.065 0.901 0.079 0.029 0.901 1.349 0.349 0.553 0.099 0.099 AMNet AMNet-TPS 0.901 0.923 0.077 0.901 0.059 0.018 0.900 1.445 0.445 0.455 0.075 0.104 AMNet-APS 0.901 0.941 0.059 0.901 0.121 0.030 0.999 1.989 0.989 0.010 0.000 0.002 AMNet-RAPS 0.902 1.019 0.166 0.810 0.066 0.020 0.900 1.539 0.547 0.357 0.065 0.101 AMNet-Ours 0.900 0.976 0.034 0.895 0.078 0.070 0.901 1.369 0.369 0.532 0.099 0.098 BWGNN BWGNN-TPS 0.901 0.914 0.086 0.901 0.047 0.010 0.900 1.248 0.248 0.653 0.147 0.092 BWGNN-APS 0.936 1.393 0.485 0.497 0.067 0.012 0.999 1.981 0.981 0.018 0.002 0.001 BWGNN-RAPS 0.902 0.976 0.135 0.846 0.065 0.014 0.900 1.360 0.394 0.523 0.124 0.076 BWGNN-Ours 0.900 0.942 0.058 0.900 0.050 0.041 0.901 1.320 0.320 0.581 0.100 0.098 GHRN GHRN-TPS 0.901 0.918 0.082 0.901 0.043 0.014 0.900 1.335 0.335 0.566 0.096 0.100 GHRN-APS 0.999 1.975 0.975 0.024 0.006 0.001 0.998 1.984 0.984 0.013 0.003 0.002 GHRN-RAPS 0.902 0.992 0.147 0.832 0.051 0.017 0.900 1.432 0.454 0.457 0.091 0.089 GHRN-Ours 0.900 0.932 0.068 0.900 0.039 0.032 0.901 1.312 0.312 0.590 0.100 0.098

IV-B Subgraph-aware Spectral Graph Neural Calibrator

Although the framework theoretically maintains both FNR and FPR below α𝛼\alphaitalic_α, its practical utility is limited by prediction sets containing {0,1}01\{0,1\}{ 0 , 1 } or \emptyset, which are uninformative outcomes indicating classification uncertainty. To enhance reliability, we prioritize maximizing singleton predictions (0 or 1) through calibration-aware learning. Our empirical analysis reveals that the prediction set sizes after DTCRC exhibit subgraph-dependent distributions, corresponding to heterogeneous node-level frequency patterns in graph signals (see Fig.3).

Refer to caption
Figure 3: The distribution of Neighborhood Inefficiency entropy of the BWGNN model on the weibo dataset after CRC.

Thus we propose Subgraph-aware Spectral Graph Neural Calibrator (SSGNC), featuring: (1) dynamic routing for latent subgraph representation learning, (2) Chebyshev polynomial-based spectral convolution with subgraph-specific filters, and (3) multi-layer architecture with hierarchical feature aggregation. The design explicitly addresses frequency heterogeneity through adaptive spectral filtering across localized subgraph structures.

IV-B1 Dynamic Routing Mechanism

Given input node features Hn×d𝐻superscript𝑛𝑑H\in\mathbb{R}^{n\times d}italic_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT, we learn K𝐾Kitalic_K latent subgraph representations through an iterative routing process. Let CK×d𝐶superscript𝐾𝑑C\in\mathbb{R}^{K\times d}italic_C ∈ blackboard_R start_POSTSUPERSCRIPT italic_K × italic_d end_POSTSUPERSCRIPT denote the trainable prototype vectors. The routing probabilities are computed as:

sik(0)=exp(hi,ck)j=1Kexp(hi,cj)superscriptsubscript𝑠𝑖𝑘0subscript𝑖subscript𝑐𝑘superscriptsubscript𝑗1𝐾subscript𝑖subscript𝑐𝑗s_{ik}^{(0)}=\frac{\exp(\langle h_{i},c_{k}\rangle)}{\sum_{j=1}^{K}\exp(% \langle h_{i},c_{j}\rangle)}italic_s start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = divide start_ARG roman_exp ( ⟨ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT roman_exp ( ⟨ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ ) end_ARG (10)

where ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ denotes the dot product, hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th row of H𝐻Hitalic_H, and cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the k𝑘kitalic_k-th prototype. We refine the routing probabilities through T𝑇Titalic_T iterations of normalization and prototype adjustment:

ck(t)=i=1Nsik(t1)hii=1Nsik(t1)+ϵsuperscriptsubscript𝑐𝑘𝑡superscriptsubscript𝑖1𝑁superscriptsubscript𝑠𝑖𝑘𝑡1subscript𝑖superscriptsubscript𝑖1𝑁superscriptsubscript𝑠𝑖𝑘𝑡1italic-ϵc_{k}^{(t)}=\frac{\sum_{i=1}^{N}s_{ik}^{(t-1)}h_{i}}{\sum_{i=1}^{N}s_{ik}^{(t-% 1)}+\epsilon}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t - 1 ) end_POSTSUPERSCRIPT + italic_ϵ end_ARG (11)
sik(t)=exp(hi,ck(t))j=1Kexp(hi,cj(t))superscriptsubscript𝑠𝑖𝑘𝑡subscript𝑖superscriptsubscript𝑐𝑘𝑡superscriptsubscript𝑗1𝐾subscript𝑖superscriptsubscript𝑐𝑗𝑡s_{ik}^{(t)}=\frac{\exp(\langle h_{i},c_{k}^{(t)}\rangle)}{\sum_{j=1}^{K}\exp(% \langle h_{i},c_{j}^{(t)}\rangle)}italic_s start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = divide start_ARG roman_exp ( ⟨ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ⟩ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT roman_exp ( ⟨ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ⟩ ) end_ARG (12)

The final prototype vectors are updated with momentum:

CβC+(1β)C(T)𝐶𝛽𝐶1𝛽superscript𝐶𝑇C\leftarrow\beta C+(1-\beta)C^{(T)}italic_C ← italic_β italic_C + ( 1 - italic_β ) italic_C start_POSTSUPERSCRIPT ( italic_T ) end_POSTSUPERSCRIPT (13)

where β𝛽\betaitalic_β is the momentum coefficient set to 0.9 in our implementation.

IV-B2 Chebyshev Graph Convolution

For spectral graph filtering, we employ Chebyshev polynomial approximation of graph spectral convolution[30]. Given the normalized graph Laplacian L~=ID1/2AD1/2~𝐿𝐼superscript𝐷12𝐴superscript𝐷12\tilde{L}=I-D^{-1/2}AD^{-1/2}over~ start_ARG italic_L end_ARG = italic_I - italic_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_A italic_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT, the m𝑚mitalic_m-th order Chebyshev basis is computed recursively:

T0(L~)Xsubscript𝑇0~𝐿𝑋\displaystyle T_{0}(\tilde{L})Xitalic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over~ start_ARG italic_L end_ARG ) italic_X =Xabsent𝑋\displaystyle=X= italic_X (14)
T1(L~)Xsubscript𝑇1~𝐿𝑋\displaystyle T_{1}(\tilde{L})Xitalic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_L end_ARG ) italic_X =L~Xabsent~𝐿𝑋\displaystyle=\tilde{L}X= over~ start_ARG italic_L end_ARG italic_X (15)
Tm(L~)Xsubscript𝑇𝑚~𝐿𝑋\displaystyle T_{m}(\tilde{L})Xitalic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over~ start_ARG italic_L end_ARG ) italic_X =2L~Tm1(L~)XTm2(L~)Xabsent2~𝐿subscript𝑇𝑚1~𝐿𝑋subscript𝑇𝑚2~𝐿𝑋\displaystyle=2\tilde{L}T_{m-1}(\tilde{L})X-T_{m-2}(\tilde{L})X= 2 over~ start_ARG italic_L end_ARG italic_T start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_L end_ARG ) italic_X - italic_T start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT ( over~ start_ARG italic_L end_ARG ) italic_X (16)

The subgraph-specific spectral convolution is then formulated as:

Zk=m=0Mθk,mTm(L~)Xsubscript𝑍𝑘superscriptsubscript𝑚0𝑀subscript𝜃𝑘𝑚subscript𝑇𝑚~𝐿𝑋Z_{k}=\sum_{m=0}^{M}\theta_{k,m}T_{m}(\tilde{L})Xitalic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over~ start_ARG italic_L end_ARG ) italic_X (17)

where θk,md×dsubscript𝜃𝑘𝑚superscript𝑑𝑑\theta_{k,m}\in\mathbb{R}^{d\times d}italic_θ start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT are learnable parameters for subgraph k𝑘kitalic_k and polynomial order m𝑚mitalic_m.

IV-B3 Subgraph-aware Spectral Filtering

The final node representations are obtained by combining the subgraph-specific convolutions weighted by the routing probabilities:

Hl+1=σ(k=1Kdiag(s:,kl)Zkl)superscript𝐻𝑙1𝜎superscriptsubscript𝑘1𝐾diagsuperscriptsubscript𝑠:𝑘𝑙superscriptsubscript𝑍𝑘𝑙H^{l+1}=\sigma\left(\sum_{k=1}^{K}\text{diag}(s_{:,k}^{l})Z_{k}^{l}\right)italic_H start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_σ ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT diag ( italic_s start_POSTSUBSCRIPT : , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) (18)

where σ𝜎\sigmaitalic_σ denotes the activation function and s:,klsuperscriptsubscript𝑠:𝑘𝑙s_{:,k}^{l}italic_s start_POSTSUBSCRIPT : , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT represents the routing probabilities for subgraph k𝑘kitalic_k at layer l𝑙litalic_l.

IV-B4 Multi-layer Architecture

The complete SSGNC model with L𝐿Litalic_L layers can be formulated as:

H(0)=MLPin(X)superscript𝐻0subscriptMLPin𝑋H^{(0)}=\text{MLP}_{\text{in}}(X)italic_H start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = MLP start_POSTSUBSCRIPT in end_POSTSUBSCRIPT ( italic_X ) (19)
H(l)=Dropout(σ(SS-Conv(l)(G,H(l1))))superscript𝐻𝑙Dropout𝜎superscriptSS-Conv𝑙𝐺superscript𝐻𝑙1H^{(l)}=\text{Dropout}\left(\sigma\left(\text{SS-Conv}^{(l)}(G,H^{(l-1)})% \right)\right)italic_H start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = Dropout ( italic_σ ( SS-Conv start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ( italic_G , italic_H start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT ) ) ) (20)
Y=MLPout(Concat(H(1),,H(L)))𝑌subscriptMLPoutConcatsuperscript𝐻1superscript𝐻𝐿Y=\text{MLP}_{\text{out}}\left(\text{Concat}(H^{(1)},\dots,H^{(L)})\right)italic_Y = MLP start_POSTSUBSCRIPT out end_POSTSUBSCRIPT ( Concat ( italic_H start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_H start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ) (21)

where SS-Conv(l)superscriptSS-Conv𝑙\text{SS-Conv}^{(l)}SS-Conv start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT denotes our subgraph-aware spectral convolution layer at depth l𝑙litalic_l, and Concat represents feature concatenation.

V Experiments

V-A Datasets and Baselines

Datasets. We utilize four benchmarks spanning social networks and e-commerce: Reddit, Weibo, YelpChi, and Amazon. These encompass diverse graph structures including user-subreddit interactions, fraud patterns, and multi-relational reviews. Node features incorporate linguistic attributes (LIWC), text embeddings (FastText), and behavioral metadata.[21].

Baseline Models. We experiment on five GAD models: GCN, BernNet[31], AMNet[7], BWGNN[6], and GHRN[32]. For uncertainty quantification, three conformal predictors are evaluated: CP-TPS [15], CP-APS [16], and CP-RAPS [17]. The variants differ in non-conformity scoring: TPS prioritizes top-class probabilities, APS accumulates softmax values, while RAPS penalizes low-confidence class inclusion.

V-B Evaluations

To assess CRC-SGAD, we adopt six metrics[26]: Coverage (true label inclusion probability), Inefficiency (average set size), and Ambiguity (frequency of indeterminate 0,1 outputs). Singleton Rate tracks single-point predictions, while set-based FPR and FNR quantify error types. These enable rigorous multi-dimensional assessment of statistical reliability and practical utility in uncertainty-aware systems.

V-C Main Results

V-C1 Performance Analysis

Our experiments reveal fundamental limitations in existing conformal prediction approaches. Notably, both the approaches anchored in TPS and RAPS are incapable of ensuring control over the FPR and FNR. While the APS-based method appears to achieve dual error rates below 0.01, this performance comes at the expense of prohibitively large prediction sets. Specifically, average prediction set sizes exceed 1.7 across benchmark datasets. Such inflated uncertainty quantification renders the method impractical for real-world applications.

Conversely, the proposed CRC-SGAD framework more effectively addresses this critical trade-off. The method achieves strict false negative rate control (FNR <<< 0.01, FPR <<< 0.01) while maintaining prediction Inefficiency with average set sizes reduced by 15.3-52.8% compared to APS baselines.

V-C2 Effectiveness of Sub-Modules

We rigorously validate the effectiveness of our proposed SSGNC framework across five benchmark datasets and five GAD methods. We find the proposed spectral calibration module achieves an average 13.4% reduction in Singleton ratio across all experimental configurations, while maintaining risk control guarantees. This statistically significant improvement demonstrates the efficacy of our subgraph-aware calibration framework in enhancing the operational efficiency of prediction sets without compromising risk control objectives.

TABLE II: The improvement of Singleton ratio after SSGNC
amazon yelp reddit weibo
GCN 9.48%\uparrow 45.1%\uparrow 1.17%\uparrow 2.00%\uparrow
BernNet 3.70%\uparrow 4.73%\uparrow 66.9%\uparrow 9.56%\uparrow
AMNet 5.80%\uparrow 3.70%\uparrow 8.31%\uparrow 0.22%\uparrow
BWGNN 0.00%\uparrow 2.29%\uparrow 44.9%\uparrow 4.43%\uparrow
GHRN 0.00%\uparrow 3.15%\uparrow 52.5%\uparrow 0.11%\uparrow

V-C3 Parameter Sensitivity Analysis

We conduct a systematic evaluation of the BWGNN model on the Yelp dataset to investigate the impact of predefined risk thresholds and the number of SSGNC prototypes on the empirical FPR, FNR, and singleton ratio in prediction sets. As shown in Table III, our framework demonstrates robust error rate control through adaptive calibration of prediction sets, consistently satisfying the target error rates across all predefined FNR and FPR configurations. The analysis of prototype quantity reveals a non-monotonic relationship with singleton ratio: as the number of prototypes increases from 3 to 7, the singleton ratio first rises (peaking at 5 prototypes) and then declines.

TABLE III: Parameter sensitivity analysis of BWGNN model on Yelp dataset.
(FNR, FPR) (0.1, 0.1) (0.05, 0.1) (0.1, 0.05)
Prototype Number 3 5 7 5 5
FNR 0.100 0.100 0.100 0.048 0.100
FPR 0.098 0.098 0.098 0.098 0.049
Singleton Ratio 0.572 0.587 0.576 0.443 0.537

VI Conclusion

In conclusion, our study addresses the critical need for class-aware uncertainty quantification and calibration in imbalanced graph anomaly detection. Our findings reveal that existing GAD models often yield poorly calibrated confidence estimates, which further deteriorate under structural perturbations. Our CRC-SGAD framework achieves provable statistical guarantees on FNR and FPR while incorporating a subgraph-aware spectral graph neural calibrator that optimizes prediction set sizes without compromising statistical validity. These results collectively highlight the framework’s potential to advance reliable graph anomaly detection systems in safety-critical applications.

References

  • [1] X. Huang, Y. Yang, Y. Wang, C. Wang, Z. Zhang, J. Xu, L. Chen, and M. Vazirgiannis, “Dgraph: A large-scale financial dataset for graph anomaly detection,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 22 765–22 777.
  • [2] G. Zhang, S. Zhang, and G. Yuan, “Bayesian graph local extrema convolution with long-tail strategy for misinformation detection,” ACM Trans. Knowl. Discov. Data, vol. 18, no. 4, 2024.
  • [3] Y. Liu, X. Ao, Z. Qin, J. Chi, J. Feng, H. Yang, and Q. He, “Pick and choose: A gnn-based imbalanced learning approach for fraud detection,” in Proceedings of the Web Conference 2021, 2021, p. 3168–3177.
  • [4] Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, “Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, p. 315–324.
  • [5] F. Shi, Y. Cao, Y. Shang, Y. Zhou, C. Zhou, and J. Wu, “H2-fdetector: A gnn-based fraud detector with homophilic and heterophilic connections,” in Proceedings of the ACM Web Conference 2022, 2022, p. 1486–1494.
  • [6] J. Tang, J. Li, Z. Gao, and J. Li, “Rethinking graph neural networks for anomaly detection,” in Proceedings of the 39th International Conference on Machine Learning, vol. 162, 2022, pp. 21 076–21 089.
  • [7] Z. Chai, S. You, Y. Yang, S. Pu, J. Xu, H. Cai, and W. Jiang, “Can abnormality be detected by graph neural networks?” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, 2022, pp. 1945–1951.
  • [8] H. Wang, Y. Dou, C. Chen, L. Sun, P. S. Yu, and K. Shu, “Attacking fake news detectors via manipulating news social engagement,” in Proceedings of the ACM Web Conference 2023, 2023, p. 3978–3986.
  • [9] J. Wu, X. Liu, D. Cheng, Y. Ouyang, X. Wu, and Y. Zheng, “Safeguarding fraud detection from attacks: A robust graph learning approach,” in Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, 2024, pp. 7500–7508.
  • [10] P. Zhu, Z. Pan, Y. Liu, J. Tian, K. Tang, and Z. Wang, “A general black-box adversarial attack on graph-based fake news detectors,” in Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, 2024, pp. 568–576.
  • [11] J. Chen, Y. Wu, X. Xu, Y. Chen, H. Zheng, and Q. Xuan, “Fast gradient attack on network embedding,” arXiv preprint arXiv:1809.02797, 2018.
  • [12] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan, “Hiding individuals and communities in a social network,” Nature Human Behaviour, vol. 2, no. 2, pp. 139–147, 2018.
  • [13] L. Wu, B. Lei, D. Xu, and D. Zhou, “Towards reliable rare category analysis on graphs via individual calibration,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, p. 2629–2638.
  • [14] A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster, “Conformal risk control,” in The Twelfth International Conference on Learning Representations, 2024.
  • [15] A. N. Angelopoulos and S. Bates, “Conformal prediction: A gentle introduction,” Foundations and Trends® in Machine Learning, vol. 16, no. 4, pp. 494–591, 2023.
  • [16] Y. Romano, M. Sesia, and E. Candes, “Classification with valid and adaptive coverage,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 3581–3591.
  • [17] A. N. Angelopoulos, S. Bates, M. Jordan, and J. Malik, “Uncertainty sets for image classifiers using conformal prediction,” in International Conference on Learning Representations, 2021.
  • [18] X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and L. Akoglu, “A comprehensive survey on graph anomaly detection with deep learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 012–12 038, 2023.
  • [19] M. Jin, Y. Liu, Y. Zheng, L. Chi, Y.-F. Li, and S. Pan, “Anemone: Graph anomaly detection with multi-scale contrastive learning,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, p. 3122–3126.
  • [20] X. Ma, F. Liu, J. Wu, J. Yang, S. Xue, and Q. Z. Sheng, “Rethinking unsupervised graph anomaly detection with deep learning: Residuals and objectives,” IEEE Transactions on Knowledge and Data Engineering, vol. 37, no. 2, pp. 881–895, 2025.
  • [21] J. Tang, F. Hua, Z. Gao, P. Zhao, and J. Li, “Gadbench: Revisiting and benchmarking supervised graph anomaly detection,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 29 628–29 653.
  • [22] W. Zhuo, Z. Liu, B. Hooi, B. He, G. Tan, R. Fathony, and J. Chen, “Partitioning message passing for graph fraud detection,” in The Twelfth International Conference on Learning Representations, 2024.
  • [23] F. Wang, Y. Liu, K. Liu, Y. Wang, S. Medya, and P. S. Yu, “Uncertainty in graph neural networks: A survey,” arXiv preprint arXiv:2403.07185, 2024.
  • [24] X. Wang, H. Liu, C. Shi, and C. Yang, “Be confident! towards trustworthy graph neural networks via confidence calibration,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 23 768–23 779.
  • [25] M. Stadler, B. Charpentier, S. Geisler, D. Zügner, and S. Günnemann, “Graph posterior network: Bayesian predictive uncertainty for node classification,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 18 033–18 048.
  • [26] S. H. Zargarbashi, S. Antonelli, and A. Bojchevski, “Conformal prediction sets for graph neural networks,” in Proceedings of the 40th International Conference on Machine Learning, vol. 202, 2023, pp. 12 292–12 318.
  • [27] K. Huang, Y. Jin, E. Candes, and J. Leskovec, “Uncertainty quantification over graph with conformalized graph neural networks,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 26 699–26 721.
  • [28] Y. Zhang, S. Pal, M. Coates, and D. Ustebay, “Bayesian graph convolutional neural networks for semi-supervised classification,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 5829–5836, 2019.
  • [29] Q. Wang, S. Wang, D. Zhuang, H. Koutsopoulos, and J. Zhao, “Uncertainty quantification of spatiotemporal travel demand with probabilistic graph neural networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 8, pp. 8770–8781, 2024.
  • [30] M. He, Z. Wei, and J.-R. Wen, “Convolutional neural networks on graphs with chebyshev approximation, revisited,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 7264–7276.
  • [31] M. He, Z. Wei, z. Huang, and H. Xu, “Bernnet: Learning arbitrary graph spectral filters via bernstein approximation,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 14 239–14 251.
  • [32] Y. Gao, X. Wang, X. He, Z. Liu, H. Feng, and Y. Zhang, “Addressing heterophily in graph anomaly detection: A perspective of graph spectrum,” in Proceedings of the ACM Web Conference 2023, 2023, p. 1528–1538.