DE-HNN: An effective neural model for Circuit Netlist representation


 


Zhishang Luo1                        Truong Son Hy2                        Puoya Tabaghi1                        Donghyeon Koh4                        Michael Defferrard4

Elahe Rezaei3                        Ryan Carey3                        Rhett Davis5                        Rajeev Jain3                        Yusu Wang1

1 University of California San Diego                        2 Indiana State University

3 Qualcomm Technologies, Inc.                        4 Qualcomm Wireless GmbH                        5 North Carolina State University

Abstract

The run-time for optimization tools used in chip design has grown with the complexity of designs to the point where it can take several days to go through one design cycle which has become a bottleneck. Designers want fast tools that can quickly give feedback on a design. Using the input and output data of the tools from past designs, one can attempt to build a machine learning model that predicts the outcome of a design in significantly shorter time than running the tool. The accuracy of such models is affected by the representation of the design data, which is usually a netlist that describes the elements of the digital circuit and how they are connected. Graph representations for the netlist together with graph neural networks have been investigated for such models. However, the characteristics of netlists pose several challenges for existing graph learning frameworks, due to the large number of nodes and the importance of long-range interactions between nodes. To address these challenges, we represent the netlist as a directed hypergraph and propose a Directional Equivariant Hypergraph Neural Network (DE-HNN) for the effective learning of (directed) hypergraphs. Theoretically, we show that our DE-HNN can universally approximate any node or hyperedge based function that satisfies certain permutation equivariant and invariant properties natural for directed hypergraphs. We compare the proposed DE-HNN with several State-of-the-art (SOTA) machine learning models for (hyper)graphs and netlists, and show that the DE-HNN significantly outperforms them in predicting the outcome of optimized place-and-route tools directly from the input netlists. Our source code and the netlists data used are publicly available at https://github.com/YusuLab/chips.git.

1 Introduction

Chip design is a complicated process involving numerous steps, many of which involve solving hard optimization problems. Just consider the stage of the place and route of a synthesized netlist: Here the input is a netlist consisting of cells and nets, where cells refer to functional units such as logic gates, and nets refer to connections between cells. The goal is to produce a layout of this netlist in a specific 2D region, where gates are placed and connections among them are realized by wires laid out across multiple layers (called “routed”), all while aiming to optimize multiple key properties (e.g, minimizing total wirelength and reducing congested “hotspots”). This place-and-route stage is highly nontrivial to solve for large netlists, and requires a time-consuming process in practice with multiple stages and iterations.

There therefore arises the need for data-driven methods to predict properties of a design directly without the time-consuming place and routing process. To this end, graph neural networks become natural choices, given that the netlists are often represented as a graph or a hypergraph. In this paper, we aim to develop an efficient and effective graph learning architecture to predict post-routing properties (e.g., wirelength or congestion) for a synthesized netlist accurately.

The past decade has witnessed a tremendous growth of graph learning models. Two most popular families are (i) message-passing neural networks (MPNNs) (Gilmer et al.,, 2017; Jegelka,, 2022), and (ii) transformer based approaches (e.g, survey (Müller et al.,, 2023)). However, netlists present several challenges for existing graph learning architectures: (i) their size can be massive, from hundreds of thousands to millions of nets, (ii) long-range interactions are important (e.g., properties of interest might be caused by long paths and other long-range interactions), and (iii) properties of post-routing netlists seem to depend on complex information of graph topology beyond simple statistics such as in-/out-degrees (distributions).

Unfortunately, it is challenging for popular MPNNs to capture long-range interactions, due to issues such as over-smoothing of graph signals (Chen et al., 2020a, ) and oversquashing bottlenecks (Topping et al.,, 2022). MPNN’s ability in capturing graph motifs (e.g cycles and trees) and higher-order structures is also limited (Xu et al.,, 2019; Jegelka,, 2022; Hy et al.,, 2019). Transformer-based graph neural models appear to be more effective in capturing long-range interactions (Dwivedi et al.,, 2022; Ngo et al.,, 2023). However, each transformer layer typically takes time quadratic to the number of nodes. While there are sparse transformers (Katharopoulos et al.,, 2020; Tay et al.,, 2022), their representation power is also reduced. Furthermore, the primary ways for a graph transformer to capture input graph topology have been either via initial position/structure encoding of nodes, or via the use of certain pairwise graph distances to “reweight” the attention (Zhang et al.,, 2023). In general, it is not clear how sensitive a transformer based model is to features in input graph topology (e.g., specific paths which can be important to netlists properties).

Our work.  In this paper, similar to (Xie et al.,, 2021), we model a netlist as a directed hypergraph and present a novel Directional Equivariant Hypergraph Neural Networks (DE-HNN) for the effective learning of (directed) hypergraphs. In particular, DE-HNN can be used to predict properties (e.g., congestion or net-wirelength) of a post-routed netlist directly from an input netlist before performing the lengthy place-and-route process. Our DE-HNN incorporates several new ideas to address the aforementioned challenges posed by netlists. Our contributions are as follows:

  • We advocate for the modeling of a netlist as a directed hypergraph. Indeed, a net usualy consists of a driver gate/cell c𝑐citalic_c, togehter with a set S𝑆Sitalic_S of “sinks”; see Section 2 and Figure 1. Recognizing the difference between the driver gate and sinks in the timing of a routed net, inspired by (Xie et al.,, 2021), we represent a net as a directed-hyperedge (c,S)𝑐𝑆(c,S)( italic_c , italic_S ) to separate the roles of driver and sinks cells.

  • We propose a learning model DE-HNN for directed hypergraphs. Theoretically, we show (Theorem 1) that our DE-HNN can universally approximate any node or hyperedge based function that satisfies certain permutation equivariant and invariant properties natural for directed hypergraphs.

  • On the practical front, to mitigate the issue of large size of and long-range interactions in input netlists, we use a hierarchy of virtual nodes (VNs), which provides additional “bridges” to allow the integration of both local and global information while still maintaining original graph topology (unless in a graph pooling approach); see Figure 2 and Section 3.

  • To make our initial node features more informative, in addition to using Laplacian eigenvectors to provide position encoding as in (Kim et al.,, 2022; Rampášek et al.,, 2022), we also use a topological summary called persistent homology (Edelsbrunner and Harer,, 2010; Dey and Wang,, 2022), which can be used to encode the “shape” of graph motif around each node in a multi-scale manner (e.g., (Zhao et al.,, 2020; Yan et al.,, 2021)).

  • We compare our DE-HNN with several SOTA machine learning models for (hyper)graphs and netlists. Our model significantly outperforms them in predicting different properties of post-routed netlists directly from input netlists. We provide careful ablation studies to demosntrate the utilities of the use of directed hyperedge, (hierarchical) VNs and persistence-based topological summaries. Finally, we remark that ML research for chip design currently suffers from the scarcity of open-source benchmark datasets111Previous netlist property prediction work sometimes releases the input netlist designs, which our paper also uses. However, they do not release the resulting properties nor the post place-and-route netlists due to the use of commercial tools.. We hope our datasets (which will be made publicly available at https://github.com/YusuLab/chips.git) can help bridge this gap. These netlists (of sizes from 400K to 1.3M) can also serve as benchmark for long-range graph interactions for machine learning researchers.

While in this paper, we design DE-HNN with the goal of netlists representation and learning, our architecture as well as its theoretical results are general and applicable to any directed hypergraphs.

Related work on machine learning models for netlists.  Earlier machine learning (ML) approaches for netlist property (e.g, congestion) prediction assume that the placement of cells (logic gates) are already given, that is, the input are placed, but not yet routed. They then convert the input to a 2D image or other 2D grid-based representations to predict the final congestion map over this region using models such as convolutional neural networks (Tabrizi et al.,, 2018; Chen et al., 2020b, ; Xie et al.,, 2018; Al-Hyari et al.,, 2021; Liang et al.,, 2020; Chai et al.,, 2023; Alawieh et al.,, 2020; Chen et al.,, 2022; Wang et al.,, 2022). However, the placement information is itself very time-consuming to obtain. Furthermore, representing the input as a 2D image makes it hard to capture local and global connectivity information in the netlists.

Since a circuit is represented more accurately as a (hyper)graph, recent work deploys graph neural networks (GNNs) for congestion prediction. The work of (Kirby et al.,, 2019) constructs a homogeneous graph representation of netlist in which each node corresponds to a cell, and if two cells are connected by a net then there exists an edge between those nodes in the graph, and then applies Graph Attention Networks (GAT) (Veličković et al.,, 2018). Later follow-up work includes using node embedding computed from partitioned subgraph (to capture more global graph structure) (Ghose et al.,, 2021) and using dual graph with both cell and net features (Xie et al.,, 2021). Note that this convertion of a net to a clique can lead to very large sized cliques, and also loses net-specific topological information. The SOTA approach for netlists representation is (Yang et al.,, 2022), which introduces a heterogeneous (i.e., different edge types) graph construction, called circuit graph, in which both cells and nets are represented as nodes of a bipartite graph.

All these approaches still have the issues of capturing long-range interaction essentially for netlists. As we describe in “Our contributions”, our DE-HNN deploys a suitable architecture, hierarchical virtual nodes, as well as informative persistent homology features, to build an effective graph learning model for netlists (and other directed hypergraphs). Note that similar to (Kirby et al.,, 2019; Yang et al.,, 2022), our DE-HNN performs learning on netlists without placement information. If placement informtion is available, they can easily be added to initial node position encoding.

2 Modeling netlists as directed hypergraphs

Circuit netlists.

A circuit netlist is a textual representation of electronic components, such as logical boolean gates, and the connections between them. A (pre-placed, also called synthesized) netlist {\mathcal{H}}caligraphic_H consists of a collection of cells (logic gates) 𝒞={𝖼1,,𝖼n}𝒞subscript𝖼1subscript𝖼𝑛{\mathcal{C}}=\{{\mathsf{c}}_{1},\ldots,{\mathsf{c}}_{n}\}caligraphic_C = { sansserif_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , sansserif_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, and a set of nets 𝒩={σ1,,σm}𝒩subscript𝜎1subscript𝜎𝑚{\mathcal{N}}=\{{\sigma}_{1},\ldots,{\sigma}_{m}\}caligraphic_N = { italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }. Each cell (gate) has a certain number of input pins and an output pin. The number of input pins is decided by the type of this gate (e.g., an AND gate takes two inputs). For every gate, its output will flow into the input pins of a collection of other gates. This information is captured by the concept of net, where a net σ𝒩𝜎𝒩{\sigma}\in{\mathcal{N}}italic_σ ∈ caligraphic_N consists of the output pin of a certain cell, called its driver cell denoted by 𝗏σ𝒞subscript𝗏𝜎𝒞{\mathsf{v}}_{\sigma}\in{\mathcal{C}}sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∈ caligraphic_C, together with all those sink cells, denoted by 𝖲σ𝒞subscript𝖲𝜎𝒞{\mathsf{S}}_{\sigma}\subseteq{\mathcal{C}}sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ⊆ caligraphic_C, where the signal from this output pin will flow into. In other words, a net can be represented by a tuple σ=(𝗏σ,𝖲σ)𝜎subscript𝗏𝜎subscript𝖲𝜎{\sigma}=({{\mathsf{v}}}_{\sigma},{\mathsf{S}}_{\sigma})italic_σ = ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ). See Figure 1 (a).

Given such a netlist, standard chip design pipelines will first lay it out in the physical space (placement). Then in the routing stage, the connection from output pin of one cell to the input of other cells are mapped to the routing channels within the chip’s physical floorplan. Among other properties, one wishes to minimize the total wirelength of each net, and to reduce routing “congestions”, which occurs when the number of edges to be routed in a specific region of the floorplan exceeds the available routing capacity. See Figure 4 in the Supplement for an illustration of a placed-and-routed netlist.

Directed hypergraphs.  The standard hypergraph H𝐻Hitalic_H is a tuple (V,Σ)𝑉Σ(V,\Sigma)( italic_V , roman_Σ ), where V𝑉Vitalic_V is a set of nodes, ΣΣ\Sigmaroman_Σ is a set of hyperedge, and each eΣ𝑒Σe\in\Sigmaitalic_e ∈ roman_Σ is a subset of V𝑉Vitalic_V; i.e., eV𝑒𝑉e\subseteq Vitalic_e ⊆ italic_V. A directed hypergraph H𝐻{\overrightarrow{{H}}}over→ start_ARG italic_H end_ARG is a tuple (V,Σ)𝑉Σ(V,{\overrightarrow{\Sigma}})( italic_V , over→ start_ARG roman_Σ end_ARG ), where each directed hyperedge σΣ𝜎Σ{\sigma}\in{\overrightarrow{\Sigma}}italic_σ ∈ over→ start_ARG roman_Σ end_ARG consists of an ordered pair σ=(vσ,Sσ)𝜎subscript𝑣𝜎subscript𝑆𝜎{\sigma}=(v_{\sigma},S_{\sigma})italic_σ = ( italic_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) with vσVsubscript𝑣𝜎𝑉v_{\sigma}\in Vitalic_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∈ italic_V and SσVsubscript𝑆𝜎𝑉S_{\sigma}\subseteq Vitalic_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ⊆ italic_V. It is easy to see that a netlist (𝒞,𝒩)𝒞𝒩({\mathcal{C}},{\mathcal{N}})( caligraphic_C , caligraphic_N ) can be naturally viewed as a directed hypergraph where we have: cell \Leftrightarrow node, and net \Leftrightarrow directed hyperedge. See Figure 1 (b).

In what follows, we often use cells / nodes, as well as nets / directed hyperedges, interchangeably. In fact, for simplicity, we often use the terms “nodes” and “nets” as they are more concise. We will also refer to 𝗏σsubscript𝗏𝜎{\mathsf{v}}_{\sigma}sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and 𝖲σsubscript𝖲𝜎{\mathsf{S}}_{\sigma}sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT from a directed hyperedge σ=(𝗏σ,𝖲σ)𝜎subscript𝗏𝜎subscript𝖲𝜎{\sigma}=({\mathsf{v}}_{\sigma},{\mathsf{S}}_{\sigma})italic_σ = ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) as its driver and its sinks, respectively, just like in a net σ𝜎{\sigma}italic_σ.

Given a net σΣ𝜎Σ\sigma\in{\overrightarrow{\Sigma}}italic_σ ∈ over→ start_ARG roman_Σ end_ARG, we say that it contains a node vV𝑣𝑉v\in Vitalic_v ∈ italic_V, denoted by vσ𝑣𝜎v\in\sigmaitalic_v ∈ italic_σ, if v𝑣vitalic_v is either the driver, or a sink of σ𝜎\sigmaitalic_σ. Given a node vV𝑣𝑉v\in Vitalic_v ∈ italic_V, we say that a net σ𝜎\sigmaitalic_σ is incident on v𝑣vitalic_v if σ𝜎\sigmaitalic_σ contains v𝑣vitalic_v. The collection of nets incident to v𝑣vitalic_v is called the incident-net-set of v𝑣vitalic_v, denoted by (v)={σΣvσ}𝑣conditional-setsuperscript𝜎Σ𝑣superscript𝜎{\mathcal{I}}(v)=\{\sigma^{\prime}\in{\overrightarrow{\Sigma}}\mid v\in\sigma^% {\prime}\}caligraphic_I ( italic_v ) = { italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ over→ start_ARG roman_Σ end_ARG ∣ italic_v ∈ italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }. For example, in Figure 1 (b), the incident-net-set of v3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is (v3)={σ1,σ2,σ5}subscript𝑣3subscript𝜎1subscript𝜎2subscript𝜎5{\mathcal{I}}(v_{3})=\{{\sigma}_{1},{\sigma}_{2},{\sigma}_{5}\}caligraphic_I ( italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = { italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT }.

Refer to caption Refer to caption
(a) (b)
Figure 1: (a) A netlist with 7777 cells 𝒞={𝖼1,,𝖼7}𝒞subscript𝖼1subscript𝖼7{\mathcal{C}}=\{{\mathsf{c}}_{1},\ldots,{\mathsf{c}}_{7}\}caligraphic_C = { sansserif_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , sansserif_c start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT } and 5 nets. For example, the output of gate c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT flows into cells c3,c5,subscript𝑐3subscript𝑐5c_{3},c_{5},italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , and c7subscript𝑐7c_{7}italic_c start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT, giving rise to the net σ=(c2,{c3,c5,c7})𝜎subscript𝑐2subscript𝑐3subscript𝑐5subscript𝑐7{\sigma}=(c_{2},\{c_{3},c_{5},c_{7}\})italic_σ = ( italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , { italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT } ). That is, the driver cell of σ𝜎{\sigma}italic_σ is 𝗏σ=c2subscript𝗏𝜎subscript𝑐2{\mathsf{v}}_{{\sigma}}=c_{2}sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, while its sink-set being 𝖲σ={c3,c5,c7}subscript𝖲𝜎subscript𝑐3subscript𝑐5subscript𝑐7{\mathsf{S}}_{\sigma}=\{c_{3},c_{5},c_{7}\}sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = { italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT }. (b) The corresponding directed hypergraph with 7777 nodes and 5555 hyperedges. Each node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to cell 𝖼isubscript𝖼𝑖{\mathsf{c}}_{i}sansserif_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and each hyperedge is marked as a shaded region.

In the chip design literature, a netlist is oftentimes represented either (1) as a (directed) graph where two cells are connected if they belong to a common net; or (2) as a standard hypergraph where a net is a hyperedge consisting of the union of the driver cell and all sink cells. The former can lead to huge cliques (as some nets can consist of large number of cells) and also lose sensitivity to the net topology. In the learning context, the work of Xie et al., (2021) first separated the role of the driver and sinks of a net in the representation of a netlist and provided justification for this choice. Their final representation is still a graph representation that is intuitively a directed version of the so-called line graph for a hypergraph. However, converting a hypergraph to a line graph is a lossy process. The directed hypergraph provides a more informative and cleaner representation: it both preserves the full net information and differentiates between drivers and sinks.

3 DE-HNN: a neural network for directed hypergraphs

In this section, we first describe a basic neural network (NN) model in Section 3.1, which we refer to as base-DE-HNN, for the representation learning of directed hypergraphs. We provide theoretical justification of this model in Section 3.2. Then in Section 3.3, we describe how to augment this base model to make it more effective at capturing long-range interactions as well as the multi-scale graph topology.

3.1 Base-DE-HNN

Our base-DE-HNN uses message-passing mechanisms. However, it differs from standard MPNN (message-passing neural networks) (Gilmer et al.,, 2017) in how the messages are aggregated and updated, so as to process node and net based features, as well as to respect the direction of hyperedges.

More specifically, base-DE-HNN consists of L𝐿Litalic_L layers. Consider an input directed hypergraph H=(V,Σ)𝐻𝑉Σ{\overrightarrow{{H}}}=(V,{\overrightarrow{\Sigma}})over→ start_ARG italic_H end_ARG = ( italic_V , over→ start_ARG roman_Σ end_ARG ). For the \ellroman_ℓ-th layer, each node vV𝑣𝑉v\in Vitalic_v ∈ italic_V (resp. each net/directed hyperedge σΣ𝜎Σ{\sigma}\in{\overrightarrow{\Sigma}}italic_σ ∈ over→ start_ARG roman_Σ end_ARG) will maintain a node feature (resp. net feature) denoted as m(v)superscript𝑚𝑣m^{\ell}(v)italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v ) (resp. M(σ)superscript𝑀𝜎M^{\ell}({\sigma})italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_σ )). For simplicity, assume that m(v),M(σ)dsuperscript𝑚𝑣superscript𝑀𝜎superscriptsubscript𝑑m^{\ell}(v),M^{\ell}({\sigma})\in\mathbb{R}^{d_{\ell}}italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v ) , italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_σ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are dsubscript𝑑d_{\ell}italic_d start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT-dimensional vectors. Assume first that our final goal is to predict net properties. The base-DE-HNN will compute m(v)superscript𝑚𝑣m^{\ell}(v)italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v ) and M(σ)superscript𝑀𝜎M^{\ell}({\sigma})italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_σ ) using feature representations from the (11\ell-1roman_ℓ - 1)-th layers by the following two steps:

[Node Update]: First, the features of each node (cell) vV𝑣𝑉v\in Vitalic_v ∈ italic_V is updated using features of the set of those nets containing it, that is, via the features of those nets in the incident-net-set (v)𝑣{\mathcal{I}}(v)caligraphic_I ( italic_v ) of v𝑣vitalic_v as follows:

m(v)=Aggσv({{M1(σ)}}σ(v)),superscript𝑚𝑣subscriptsuperscriptAgg𝜎𝑣subscriptsuperscript𝑀1superscript𝜎superscript𝜎𝑣m^{\ell}(v)=\mathrm{Agg}^{\ell}_{{\sigma}\rightarrow v}({\{\!\{}M^{\ell-1}(% \sigma^{\prime}){\}\!\}}_{\sigma^{\prime}\in{\mathcal{I}}(v)}),italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v ) = roman_Agg start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ → italic_v end_POSTSUBSCRIPT ( { { italic_M start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v ) end_POSTSUBSCRIPT ) , (1)

where {{}}{\{\!\{}\cdot{\}\!\}}{ { ⋅ } } denotes a multiset as some neighboring nets could have identical feature representations. The function AggσvsubscriptAgg𝜎𝑣\mathrm{Agg}_{{\sigma}\rightarrow v}roman_Agg start_POSTSUBSCRIPT italic_σ → italic_v end_POSTSUBSCRIPT operates on a multiset and should be invariant to the order of neighboring nets of v𝑣vitalic_v in (v)𝑣{\mathcal{I}}(v)caligraphic_I ( italic_v ). Similar to the Deep Set architecture (Zaheer et al.,, 2017) which can hancle such permutation invariance in multisets, we implement the function AggσvsubscriptAgg𝜎𝑣\mathrm{Agg}_{{\sigma}\rightarrow v}roman_Agg start_POSTSUBSCRIPT italic_σ → italic_v end_POSTSUBSCRIPT by:

m(v)=σ(v)MLP1(M1(σ)),superscript𝑚𝑣subscriptsuperscript𝜎𝑣superscriptsubscriptMLP1superscript𝑀1superscript𝜎m^{\ell}(v)=\sum_{\sigma^{\prime}\in{\mathcal{I}}(v)}\text{MLP}_{1}^{\ell}\big% {(}M^{\ell-1}(\sigma^{\prime})\big{)},italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v ) = ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v ) end_POSTSUBSCRIPT MLP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_M start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) , (2)

where MLP1subscriptMLP1\text{MLP}_{1}MLP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT stands for a multi-layer perceptron. For example, the update of node feature for v4subscript𝑣4v_{4}italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT in Figure 1 (b) is m(v4)=MLP1(M1(σ1))+MLP1(M1(σ3))superscript𝑚subscript𝑣4superscriptsubscriptMLP1superscript𝑀1subscript𝜎1superscriptsubscriptMLP1superscript𝑀1subscript𝜎3m^{\ell}(v_{4})=\text{MLP}_{1}^{\ell}\big{(}M^{\ell-1}(\sigma_{1})\big{)}+% \text{MLP}_{1}^{\ell}\big{(}M^{\ell-1}(\sigma_{3})\big{)}italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) = MLP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_M start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) + MLP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_M start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ) as (v4)={σ1,σ3}subscript𝑣4subscript𝜎1subscript𝜎3{\mathcal{I}}(v_{4})=\{{\sigma}_{1},{\sigma}_{3}\}caligraphic_I ( italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) = { italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT }. It is easy to see that the update in Eqn (2) satisfies the needed permutation invariance.

[Net Update]: Next, the features of each net σ=(𝗏σ,𝖲σ)𝜎subscript𝗏𝜎subscript𝖲𝜎{\sigma}=({\mathsf{v}}_{\sigma},{\mathsf{S}}_{\sigma})italic_σ = ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) is updated using the new node features for those nodes contained in σ𝜎{\sigma}italic_σ. Since the net (hyperedge) σ𝜎{\sigma}italic_σ is directed, we wish to separate the roles of its driver node 𝗏σsubscript𝗏𝜎{\mathsf{v}}_{\sigma}sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and the set of sinks 𝖲σsubscript𝖲𝜎{\mathsf{S}}_{\sigma}sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, that is,

M(σ)=Aggvσ(m(𝗏σ),{{m(v)}}v𝖲σ)superscript𝑀𝜎subscriptsuperscriptAgg𝑣𝜎superscript𝑚subscript𝗏𝜎subscriptsuperscript𝑚superscript𝑣superscript𝑣subscript𝖲𝜎M^{\ell}({\sigma})=\mathrm{Agg}^{\ell}_{v\rightarrow{\sigma}}(m^{\ell}({% \mathsf{v}}_{\sigma}),{\{\!\{}m^{\ell}(v^{\prime}){\}\!\}}_{v^{\prime}\in{% \mathsf{S}}_{\sigma}})italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_σ ) = roman_Agg start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v → italic_σ end_POSTSUBSCRIPT ( italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) , { { italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (3)

However, the update should not depend on the ordering of nodes in the sink set 𝖲σsubscript𝖲𝜎{\mathsf{S}}_{\sigma}sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, i.e., AggvσsubscriptAgg𝑣𝜎\mathrm{Agg}_{v\rightarrow{\sigma}}roman_Agg start_POSTSUBSCRIPT italic_v → italic_σ end_POSTSUBSCRIPT needs to be permutation invariant w.r.t. its second parameter, the multiset {{m(v))}}v𝖲(σ){\{\!\{}m^{\ell}(v^{\prime})){\}\!\}}_{v^{\prime}\in{\mathsf{S}}(\sigma)}{ { italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) } } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S ( italic_σ ) end_POSTSUBSCRIPT. We use the following to implement Eqn (3) to guarantee the needed permutation invariance of AggvσsubscriptAgg𝑣𝜎\mathrm{Agg}_{v\rightarrow{\sigma}}roman_Agg start_POSTSUBSCRIPT italic_v → italic_σ end_POSTSUBSCRIPT w.r.t. the ordering of nodes in 𝖲σsubscript𝖲𝜎{\mathsf{S}}_{\sigma}sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT:

M(σ)=superscript𝑀𝜎absent\displaystyle M^{\ell}({\sigma})=italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_σ ) = MLP3[m(𝗏σ)(v𝖲σMLP2(m(v)))]subscriptsuperscriptMLP3delimited-[]direct-sumsuperscript𝑚subscript𝗏𝜎subscriptsuperscript𝑣subscript𝖲𝜎subscriptsuperscriptMLP2superscript𝑚superscript𝑣\displaystyle\,\text{MLP}^{\ell}_{3}\bigg{[}m^{\ell}({\mathsf{v}}_{{\sigma}})% \oplus\bigg{(}\sum_{v^{\prime}\in{\mathsf{S}}_{\sigma}}\text{MLP}^{\ell}_{2}(m% ^{\ell}(v^{\prime}))\bigg{)}\bigg{]}MLP start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT [ italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) ⊕ ( ∑ start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT MLP start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) ] (4)

where MLP2subscriptMLP2\text{MLP}_{2}MLP start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and MLP3subscriptMLP3\text{MLP}_{3}MLP start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are multi-layer perceptrons, and direct-sum\oplus denotes vector concatenation. For example, in Figure 1, the update of σ1=(v1,{v3,v4})subscript𝜎1subscript𝑣1subscript𝑣3subscript𝑣4{\sigma}_{1}=(v_{1},\{v_{3},v_{4}\})italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , { italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT } ) is M(σ1)=MLP3(m(v1)(MLP2(m(v3))+MLP2(m(v4)))M^{\ell}({\sigma}_{1})=\text{MLP}^{\ell}_{3}\big{(}m^{\ell}(v_{1})\oplus(\text% {MLP}^{\ell}_{2}(m^{\ell}(v_{3}))+\text{MLP}^{\ell}_{2}(m^{\ell}(v_{4}))\big{)}italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = MLP start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⊕ ( MLP start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ) + MLP start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_m start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) ) ).

We note that if the target task is to predict node-level features (instead of net-level features), then we will apply dual update rules where the roles of nets and nodes will be swapped. Finally, in our implementation, we also add residuals (i.e, node / net features from the previous level) to each node / net features during the updates. There are L𝐿Litalic_L such layers, and in the end, if the given task is a regression task, then another linear layer ψ𝜓\psiitalic_ψ is applied to the values mL()superscript𝑚𝐿m^{L}(\cdot)italic_m start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( ⋅ ) (resp. ML()superscript𝑀𝐿M^{L}(\cdot)italic_M start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( ⋅ )) to generate the final node-based (resp. net-based) regression function. During the training process, the loss function in this case is the standard mean squared error; that is, if it is a node-based regression with ground-true function y:V:y𝑉\mathrm{y}:V\to\mathbb{R}roman_y : italic_V → blackboard_R, then we have:

(𝜽)=1|V|vV[ψ(mL(v))y(v)]2,𝜽1𝑉subscript𝑣𝑉superscriptdelimited-[]𝜓superscript𝑚𝐿𝑣y𝑣2\mathcal{L}(\bm{\theta})=\frac{1}{|V|}\sum_{v\in V}[\psi(m^{L}(v))-\mathrm{y}(% v)]^{2},caligraphic_L ( bold_italic_θ ) = divide start_ARG 1 end_ARG start_ARG | italic_V | end_ARG ∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT [ italic_ψ ( italic_m start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( italic_v ) ) - roman_y ( italic_v ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (5)

where 𝜽𝜽\bm{\theta}bold_italic_θ denotes the set of all learnable parameters in the entire base-DE-HNN model. Node or net classification tasks are handled similarly but with the cross-entropy loss.

3.2 Theoretical analysis of DE-HNN

We now provide some theoretical guarantee of our base-DE-HNN model. For simplicity, in what follows we assume that our target (regression) functions are net-valued functions222The net-valued function is more natural for properties such as net wirelength and congestion. (or simply net-functions) of the form F:Σ:𝐹ΣF:{\overrightarrow{\Sigma}}\to\mathbb{R}italic_F : over→ start_ARG roman_Σ end_ARG → blackboard_R. Our main result below holds for node-valued functions via a symmetric argument.

Let us consider one update stage at a fixed layer [1,L]1𝐿\ell\in[1,L]roman_ℓ ∈ [ 1 , italic_L ]. From the net-function perspective, the goal of the update stage is the following: At the beginning of this stage, we start with input net features μσ:=M1(σ)assignsubscript𝜇𝜎superscript𝑀1𝜎\mu_{\sigma}:=M^{\ell-1}({\sigma})italic_μ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT := italic_M start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_σ ), for all σΣ𝜎Σ{\sigma}\in{\overrightarrow{\Sigma}}italic_σ ∈ over→ start_ARG roman_Σ end_ARG. In the end, we obtain a set of new features μσ:=M(σ)assignsubscriptsuperscript𝜇𝜎superscript𝑀𝜎\mu^{*}_{\sigma}:=M^{\ell}({\sigma})italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT := italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( italic_σ ) for each σΣ𝜎Σ{\sigma}\in{\overrightarrow{\Sigma}}italic_σ ∈ over→ start_ARG roman_Σ end_ARG. If we view this feature update as a function on net features, then ideally, for any fixed σΣ𝜎Σ{\sigma}\in{\overrightarrow{\Sigma}}italic_σ ∈ over→ start_ARG roman_Σ end_ARG, its new feature should depend on the features of all its neighboring nets which are those nets that share at least one node with σ𝜎{\sigma}italic_σ. However, note that the neighbors of σ𝜎{\sigma}italic_σ are naturally classified to two families:

(type-A)

those in the set (𝗏σ)subscript𝗏𝜎{\mathcal{I}}({\mathsf{v}}_{\sigma})caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) (recall (v)𝑣{\mathcal{I}}(v)caligraphic_I ( italic_v ) is the set of nets that contains node v𝑣vitalic_v), which are connected to σ𝜎{\sigma}italic_σ via the driver node 𝗏σsubscript𝗏𝜎{\mathsf{v}}_{\sigma}sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT of σ𝜎{\sigma}italic_σ; and

(type-B)

those in v𝖲σ(v)subscriptsuperscript𝑣subscript𝖲𝜎superscript𝑣\bigcup_{v^{\prime}\in{\mathsf{S}}_{\sigma}}{\mathcal{I}}(v^{\prime})⋃ start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where each set (v)superscript𝑣{\mathcal{I}}(v^{\prime})caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) are those nets connected to σ𝜎{\sigma}italic_σ via a sink node vsuperscript𝑣v^{\prime}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of σ𝜎{\sigma}italic_σ (i.e., v𝖲σsuperscript𝑣subscript𝖲𝜎v^{\prime}\in{\mathsf{S}}_{\sigma}italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT). Note that {(v)}v𝖲σsubscriptsuperscript𝑣superscript𝑣subscript𝖲𝜎\Big{\{}{\mathcal{I}}(v^{\prime})\Big{\}}_{v^{\prime}\in{\mathsf{S}}_{\sigma}}{ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a set of sets of neighboring (type-B) nets of σ𝜎{\sigma}italic_σ.

For example, in Figure 1, for σ5={v3,{v5,v7}}subscript𝜎5subscript𝑣3subscript𝑣5subscript𝑣7\sigma_{5}=\{v_{3},\{v_{5},v_{7}\}\}italic_σ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT = { italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , { italic_v start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT } }, its (type-A) neighbors are (v3)={σ1,σ2,σ5}subscript𝑣3subscript𝜎1subscript𝜎2subscript𝜎5{\mathcal{I}}(v_{3})=\{{\sigma}_{1},{\sigma}_{2},{\sigma}_{5}\}caligraphic_I ( italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = { italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT }, while (type-B) is a set of sets {{σ2,σ4,σ5},{σ2,σ5}}subscript𝜎2subscript𝜎4subscript𝜎5subscript𝜎2subscript𝜎5\{\{{\sigma}_{2},{\sigma}_{4},{\sigma}_{5}\},\{{\sigma}_{2},{\sigma}_{5}\}\}{ { italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT } , { italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT } }. It is natural to model the desired update function for the directed hyper-edges as follows, which differentiate (type-A) and (type-B) neighbors of σ𝜎{\sigma}italic_σ:

μσsubscriptsuperscript𝜇𝜎\displaystyle\mu^{*}_{\sigma}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT =({{μσ}}σ(𝗏σ),{{{{μσ}}σ(v)}}v𝖲σ)absentsubscriptsubscript𝜇superscript𝜎superscript𝜎subscript𝗏𝜎subscriptsubscriptsubscript𝜇superscript𝜎superscript𝜎superscript𝑣superscript𝑣subscript𝖲𝜎\displaystyle=\mathcal{F}\Big{(}{\{\!\{}\mu_{{\sigma}^{\prime}}{\}\!\}}_{{% \sigma}^{\prime}\in{\mathcal{I}}({\mathsf{v}}_{\sigma})},\Big{\{}\!\Big{\{}{\{% \!\{}\mu_{{\sigma}^{\prime}}{\}\!\}}_{{\sigma}^{\prime}\in{\mathcal{I}}(v^{% \prime})}\Big{\}}\!\Big{\}}_{v^{\prime}\in{\mathsf{S}}_{\sigma}}\Big{)}= caligraphic_F ( { { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , { { { { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT } } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (6)

Note that the first variable, {{μσ}}σ(𝗏σ)subscriptsubscript𝜇superscript𝜎superscript𝜎subscript𝗏𝜎{\{\!\{}\mu_{{\sigma}^{\prime}}{\}\!\}}_{{\sigma}^{\prime}\in{\mathcal{I}}({% \mathsf{v}}_{\sigma})}{ { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT, consists of the set of input feature representations of (type-A) neighbors of σ𝜎{\sigma}italic_σ, while the second variable, {{{{μσ}}σ(v)}}v𝖲σsubscriptsubscriptsubscript𝜇superscript𝜎superscript𝜎superscript𝑣superscript𝑣subscript𝖲𝜎\Big{\{}\!\Big{\{}{\{\!\{}\mu_{{\sigma}^{\prime}}{\}\!\}}_{{\sigma}^{\prime}% \in{\mathcal{I}}(v^{\prime})}\Big{\}}\!\Big{\}}_{v^{\prime}\in{\mathsf{S}}_{% \sigma}}{ { { { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT } } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT, is a set of sets333For concision, we use “set” instead of “multiset” here. (of net features), constituting multisets of feature representations of those (type-B) neighbors of σ𝜎{\sigma}italic_σ. The function \mathcal{F}caligraphic_F should be invariant not only to the order within each individual set (u)𝑢{\mathcal{I}}(u)caligraphic_I ( italic_u ) for some cell u𝑢uitalic_u, but also invariant to the order of nodes in the sink-set 𝖲σsubscript𝖲𝜎{\mathsf{S}}_{\sigma}sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, i.e., the order of sets {μσ}σ(v)subscriptsubscript𝜇superscript𝜎superscript𝜎superscript𝑣\{\mu_{{\sigma}^{\prime}}\}_{{\sigma}^{\prime}\in{\mathcal{I}}(v^{\prime})}{ italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT’s within the outer-set in the second variable of {\mathcal{F}}caligraphic_F. We refer to such a function \mathcal{F}caligraphic_F as nested-permutation invariant.

Now recall that in our base-DE-HNN, at each layer we perform update of node-feature by features of its incident nets as described in Eqn (2), followed by the update of net-features by the new features of those nodes contained in σ𝜎{\sigma}italic_σ as specified in Eqn (4). In other words, one can think of the update of net features from M1()superscript𝑀1M^{\ell-1}(\cdot)italic_M start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( ⋅ ) to M()superscript𝑀M^{\ell}(\cdot)italic_M start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT ( ⋅ ) to be the composition of updates in Eqns (2) and (4). By construction, it is easy to see that the composition of these two udpate steps indeed gives rise to a nested-permutation invariant function. However, can any nested-permutation invariant net-function be represented (or approximated) by such a composition of two steps? A priori, the answer to this opposite direction is not clear at all as the iterative updates factored through the node features appears more restrictive. Our main theorem below shows that the answer is in fact positive.

Theorem 1 (Simplified).

Let {\mathcal{F}}caligraphic_F be any continuous, nested-permutation invariant, net-value function as in Eqn (6). For simplicity, assume both input nets and output of M𝑀Mitalic_M take values in a compact set dsuperscript𝑑\mathcal{B}\subset\mathbb{R}^{d}caligraphic_B ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, a connected compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Then we have that {\mathcal{F}}caligraphic_F can be rewritten as the following sum-decomposition

σ::for-all𝜎absent\displaystyle\forall\sigma:~{}∀ italic_σ : ({μσ}σ(𝗏σ),{{μσ}σ(v)}v𝖲σ)subscriptsubscript𝜇superscript𝜎superscript𝜎subscript𝗏𝜎subscriptsubscriptsubscript𝜇superscript𝜎superscript𝜎superscript𝑣superscript𝑣subscript𝖲𝜎\displaystyle{\mathcal{F}}\Big{(}\{\mu_{{\sigma}^{\prime}}\}_{{\sigma}^{\prime% }\in{\mathcal{I}}({\mathsf{v}}_{\sigma})},\Big{\{}\{\mu_{{\sigma}^{\prime}}\}_% {{\sigma}^{\prime}\in{\mathcal{I}}(v^{\prime})}\Big{\}}_{v^{\prime}\in{\mathsf% {S}}_{\sigma}}\Big{)}caligraphic_F ( { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , { { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
=\displaystyle=~{}= ρ(σ(𝗏σ)ϕ1(μσ),v𝖲σϕ2(σ(v)ϕ1(μσ)))𝜌subscriptsuperscript𝜎subscript𝗏𝜎subscriptitalic-ϕ1subscript𝜇superscript𝜎subscriptsuperscript𝑣subscript𝖲𝜎subscriptitalic-ϕ2subscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1subscript𝜇superscript𝜎\displaystyle\rho\Big{(}\sum_{\sigma^{\prime}\in{\mathcal{I}}({\mathsf{v}}_{% \sigma})}\phi_{1}(\mu_{\sigma^{\prime}}),\sum_{v^{\prime}\in{\mathsf{S}}_{% \sigma}}\phi_{2}\big{(}\sum_{\sigma^{\prime}\in{\mathcal{I}}(v^{\prime})}\phi_% {1}(\mu_{\sigma^{\prime}})\big{)}\Big{)}italic_ρ ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) ) (7)

where ϕ1:dd:subscriptitalic-ϕ1superscript𝑑superscriptsuperscript𝑑\phi_{1}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d^{\prime}}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, ϕ2:dd′′:subscriptitalic-ϕ2superscriptsuperscript𝑑superscriptsuperscript𝑑′′\phi_{2}:\mathbb{R}^{d^{\prime}}\rightarrow\mathbb{R}^{d^{\prime\prime}}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and ρ𝜌\rhoitalic_ρ are continuous functions.

Recall that for the \ellroman_ℓ-th layer, the input feature for any net σsuperscript𝜎{\sigma}^{\prime}italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is μσ=M1(σ)subscript𝜇𝜎superscript𝑀1superscript𝜎\mu_{{\sigma}}=M^{\ell-1}({\sigma}^{\prime})italic_μ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_M start_POSTSUPERSCRIPT roman_ℓ - 1 end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Now compare the right-hand side of Eqn (7) with Eqns (2) and (4): it is easy to see that by using MLP1subscriptMLP1\text{MLP}_{1}MLP start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT from Eqn (2) to approximate the continuous function ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and using MLP2subscriptMLP2\text{MLP}_{2}MLP start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and MLP3subscriptMLP3\text{MLP}_{3}MLP start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT from Eqn (4) to approximate continuous functions ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and ρ𝜌\rhoitalic_ρ respectively, we then have that our iterative updates using Eqns (2) and (4) approximate any desired update of the net features via a nested-permutation invariant function as in Eqn (6). In other words, the iterative message-passing update steps in our base-DE-HNN provides an universal approximation of the desired nested-permutation invariant update functions over nets. The proof of this theorem and more discussions can be found in the Supplement.

3.3 Augmenting base-DE-HNN to DE-HNN

The base-DE-HNN provides an effective way to update features for both nodes and hyperedges in an iterative manner. We now describe further augmentation strategies for the resulting DE-HNN to capture long-range interactions as well as to be more senstive to the multi-scale graph topology.

Hierarchy of virtual nodes.

The standard message-passing GNNs have difficulty to capture long-range interaction due to issues such as over-smoothing, over-squashing and under-reaching. Most GNNs used in practice have few layers. Transformer-type models for graphs may alleviate the issue (Dwivedi et al.,, 2022), but each self-attention layer requires quadratic computation, which does not scale to large graphs. Furthermore, it is not clear how to effectively encode graph structures for transformers, even with the use of initial position encoding (Müller et al.,, 2023) or reweighting based on graph distances (Zhang et al.,, 2023). Intuitively, we want to keep the simple and efficient message-passing framework over the input (very sparse) hypergraph, but also have a way to propagate long-range information among nodes that are far away from each other (in terms of graph distances). We will do so via the use of virtual nodes.

Specifically, a virtual node (VN) is an additional node we add that is connected to all input nodes. This effectively reduces the graph diameter to 2222, and has been a popular strategy in graph learning literature; e.g. (Gilmer et al.,, 2017). However, note that messages cannot be directly passed between two nodes. Instead, information of all nodes have to be aggregated at the virtual node before being passed back to all nodes. Nevertheless, (Cai et al.,, 2023) shows that a message-passing GNN augmented by a single VN can already approximate Lineaer Transformer (Katharopoulos et al.,, 2020) and Performer (Choromanski et al.,, 2021) (as well as general transformers to some extent) and bring significant empirical gains.

Refer to caption
Figure 2: A two-level hierarchy of virtual nodes (VNs).

While adding a single VN will only linearly increase the number of edges, the VN itself will have a high degree and potentially become a computatoinal bottleneck. Furthermore, since the features of all nodes have to be aggregated at the virtual node, the aggregated messages will lose sensitivity to individual node features. The benefits of adding a single VN thus diminishes as the graph becomes larger. We instead propose to use a hierarchy of VNs as follows (see Figure 2).

  • Given an input hypergraph H=(V,Σ)𝐻𝑉Σ{\overrightarrow{{H}}}=(V,{\overrightarrow{\Sigma}})over→ start_ARG italic_H end_ARG = ( italic_V , over→ start_ARG roman_Σ end_ARG ), we first use Metis (Karypis and Kumar,, 1998) to partition its node set to k𝑘kitalic_k disjoint subsets V=V1V2Vk𝑉subscript𝑉1subscript𝑉2subscript𝑉𝑘V=V_{1}\cup V_{2}\cup\cdots\cup V_{k}italic_V = italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∪ ⋯ ∪ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. In our experiments, we keep the sizes of Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTs roughly balanced, and therefore the value of k𝑘kitalic_k varies as input graph size changes.

  • We introduce a VN ωisubscript𝜔𝑖{\omega}_{i}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each subset Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for i[1,k]𝑖1𝑘i\in[1,k]italic_i ∈ [ 1 , italic_k ], and ωisubscript𝜔𝑖{\omega}_{i}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is connected to all nodes in Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. These VNs are first-level VNs. Note that the total number of edges added this way is only n=|V|𝑛𝑉n=|V|italic_n = | italic_V |.

  • We further create a super-VN, denoted by ω0subscript𝜔0{\omega}_{0}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which connects to all the first-level VNs. This introduce an additional k𝑘kitalic_k number of edges.

In what follows, we refer to nodes from input hypergraphs as standard nodes. During the updating of node/edge features, we will use heterogeneous updates, where the aggregation functions at the first-level VNs and super-VN are different from that at standard nodes. Different from graph pooling, the use of additional VNs keep the original hypergraph topology, while they act as additional “bridges” and allow information flow both at global (among far-apart nodes via VNs) and local scales (along original edges).

As the input hypergraph becomes even larger, we could use multiple layers in our hierarchy of VNs. Nevertheless, we observe that two layers already yield good performance in our current test cases. If nodes are placed, one could partition the node set by spatial locations (e.g., by decomposing the rectangular region) instead of using Metis.

Initial positional / structural encodings.

Finally, we aim to encode meaningful multi-scale features for each node / hyperedge. To this end, for each node, we add the Laplacian positional encoding consisting of the function value of this node from the first s𝑠sitalic_s (s=10𝑠10s=10italic_s = 10 in our experiments) eigenvectors of the undirected version of input hypergraph.

Furthermore, to better capture the “shape” of the neighborhood of each node v𝑣vitalic_v in a more discriminative and multi-scale manner, similar to (Yan et al.,, 2021; Zhao et al.,, 2020), we use the so-called extended persistence diagram (PD) summary induced by the shortest path distance function within the r𝑟ritalic_r-hop neighborhood of each node (r=6𝑟6r=6italic_r = 6 in our experiments) as an initial structural encoding for each node. Note that it is known (Tian and Wang,, 2019) that PDs constructed this way can encode rich graph information of the r𝑟ritalic_r-hop neighborhood of each node v𝑣vitalic_v (viewed as a local motif around v𝑣vitalic_v), such as clustering coefficients, number of nodes at distance ar𝑎𝑟a\leq ritalic_a ≤ italic_r to v𝑣vitalic_v, number of independent cycles, etc. See the Supplement for more details. Indeed, as our ablation study and results in Supplement show, adding PDs improve both our and previous graph learning models.

4 Experiments

4.1 Datasets

We used 12 of the Superblue circuits from (Viswanathan et al.,, 2011, 2012) to evaluate our proposed models and baselines. The Superblue circuits are some of the most complex yet publicly available circuits used in previous work about VLSI placement and routing. The size of these netlists range from 400K to 1.3M nodes, with similar number of nets. Note that these hypergraphs are very sparse, although there are a very small fraction of nets with large sizes. See the Supplement for more detailed statistics of these netlists, including the size of each design as well as the statistics of target properties.

We pulled our designs from the RosettaStone (Kahng et al.,, 2022) repository’s Skywater 130 nm technology (Edwards,, 2020) benchmark set. We then used a commercial physical design tool to perform placement optimization for each design with a utilization ratio of 0.7. We exported global-routing congestion information in the form of demand and capacity for each global-route cell (GRC) in each routing layer.

4.2 Setup

Baselines.

We compare with a range of SOTA baselines: (i) Graph Convolutional Networks (GCN) (Kipf and Welling,, 2017); (ii) a SOTA variant of Graph Attention Networks (GATv2) (Brody et al.,, 2022); (iii) Hypergraphs Message Passing Neural Network (HMPNN) (Heydari and Livi,, 2022); (iv) Hypergraph (Neural) Networks with Hyperedge Neurons (HNHN) (Dong et al.,, 2020); (v) A multiset function framework for Hypergraph Neural Network (AllSet) (Chien et al.,, 2022); (v) the SOTA graph-based model specifically defined for netlist property predictions, NetlistGNN (Yang et al.,, 2022). See the Supplement for more details on their architecture, including model complexity. We also compare with the hypergraph convolutional operator (HyperConv) (Bai et al.,, 2021) and Linear Transformers (Katharopoulos et al.,, 2020). As both models’ performances on average are not as good as other baseline models, we include their results only in the Supplement.

We compare these baselines with our DE-HNN model, which we refer to as full-DE-HNN in results to differentiate with base-DE-HNN model, whose performance we also include for reference. Note that base-DE-HNN is without persistence diagrams (PDs) as initial features nor virtual nodes (VNs). We implement our DE-HNN and the baselines with PyTorch (Paszke et al.,, 2019) and PyTorch-geometric (Fey and Lenssen,, 2019).

Features.  For each cell, its initial features include: type, width, height of gate, degree, degree distribution of a local neighborhood (summarized into a vector), top-10 Laplacian eigenvectors, and persistence diagram (PD) features (see Supplement). For all methods other than Linear Transformer, the initial net features are the nets’ degrees. For Linear Transformer, net features are obtained by averaging the features of those nodes contained in it so as to provide more topology information. (see Supplement). For a fair comparison, the same initial cell/net features are used for all models (other than our base-DE-HNN, which do not have PDs as it is a base model without any augmentation). One might wonder whether adding persistence diagrams (PDs) features are beneficial for both our method and baselines. Indeed as we show later in Ablation study and in Supplement, using PDs improve model performance for both our methods and baselines. For example, it improves NetlistGNN by 3.6%percent3.63.6\%3.6 % on average in terms of demand regression in the single-design setting.

Prediction tasks.  We test three different tasks in our experiments, including both net- and node-based tasks. These tasks cover the types of experiments performed by previous netlist prediction approaches.

  • Net-based wirelength regression: We use half-perimeter wirelength (HPWL), a common estimator used for wirelength calculation, to calculate the wirelength of each net (Mirhoseini et al.,, 2021). Similar to (Yang et al.,, 2022), we take the log2subscript2\log_{2}roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of the wirelength to reduce the range.

  • Net-based demand regression: We predict the net-based demand. Congestion happens when demands exceeds capacity. There is no concensus on how to define congestion (difference or ratio) and we thus directly predict demand.

  • Cell-based congestion classification: Similar to (Yang et al.,, 2022) and (Wang et al.,, 2022), we classify the cell-based congestion values (computed as the ratio of demand/capacity) into (a) [0, 0.9], not-congested; and (b) [0.9, infinfimum\infroman_inf]; congested.

Our experiments have two settings:

  • Single-design: We train and evaluate on each individual design separately. For each graph in a design, we apply 4-fold cross-validation to randomly split the nodes into 4 subsamples with same sizes (25%/25%/25%/25%). We report the average performance across all 12 designs with cross-validation applied to each design. The distribution of target values and the average performance from cross-validation for each design can be found in the Supplement.

  • Cross-design: We aim to evaluate the ability of the models to generalize to unseen netlist topologies. Following previous work (Yang et al.,, 2022), we use 10 designs, Superblue1,2,3,5,6,7,9,11,14,16, for training, superblue18 for validation, and superblue19 for testing.

4.3 Results

Table 1: Net-based wirelength regression. Last row “Improvement” refers to the improvement of our full DE-HNN model over the best baseline approach for each metric.
Model Single-Design Cross-Design
RMSE \downarrow MAE \downarrow Pearson \uparrow RMSE \downarrow MAE \downarrow Pearson \uparrow
GCN 1.7621.7621.7621.762 1.2761.2761.2761.276 0.7500.7500.7500.750 1.6911.6911.6911.691 1.2761.2761.2761.276 0.7460.7460.7460.746
GATv2 1.8121.8121.8121.812 1.3301.3301.3301.330 0.6870.6870.6870.687 1.7171.7171.7171.717 1.2811.2811.2811.281 0.7370.7370.7370.737
AllSet 1.7181.7181.7181.718 1.2641.2641.2641.264 0.7600.7600.7600.760 1.8371.8371.8371.837 1.3481.3481.3481.348 0.6950.6950.6950.695
HMPNN 1.8411.8411.8411.841 1.3681.3681.3681.368 0.7100.7100.7100.710 1.7851.7851.7851.785 1.3351.3351.3351.335 0.7100.7100.7100.710
HNHN 1.8521.8521.8521.852 1.3681.3681.3681.368 0.7170.7170.7170.717 1.7541.7541.7541.754 1.3331.3331.3331.333 0.7010.7010.7010.701
NetlistGNN 1.7731.7731.7731.773 1.3201.3201.3201.320 0.7400.7400.7400.740 1.7621.7621.7621.762 1.3241.3241.3241.324 0.7180.7180.7180.718
base DE-HNN 1.7511.7511.7511.751 1.2691.2691.2691.269 0.7480.7480.7480.748 1.7311.7311.7311.731 1.2911.2911.2911.291 0.7300.7300.7300.730
full DE-HNN 1.6891.689\bm{1.689}bold_1.689 1.2451.245\bm{1.245}bold_1.245 0.7700.770\bm{0.770}bold_0.770 1.6771.677\bm{1.677}bold_1.677 1.2421.242\bm{1.242}bold_1.242 0.7540.754\bm{0.754}bold_0.754
Improvement 1.7%percent1.7\bm{1.7\%}bold_1.7 bold_% 1.6%percent1.6\bm{1.6\%}bold_1.6 bold_% 1.3%percent1.3\bm{1.3\%}bold_1.3 bold_% 1.9%percent1.9\bm{1.9\%}bold_1.9 bold_% 2.6%percent2.6\bm{2.6\%}bold_2.6 bold_% 1.8%percent1.8\bm{1.8\%}bold_1.8 bold_%
Table 2: Net-based demand regression for each design.
Model Single-Design Cross-Design
RMSE \downarrow MAE \downarrow Pearson \uparrow RMSE \downarrow MAE \downarrow Pearson \uparrow
GCN 9.3219.3219.3219.321 6.1636.1636.1636.163 0.5700.5700.5700.570 6.5716.5716.5716.571 5.0245.0245.0245.024 0.3650.3650.3650.365
GATv2 9.3429.3429.3429.342 6.1186.1186.1186.118 0.5610.5610.5610.561 6.6236.6236.6236.623 5.1375.1375.1375.137 0.3630.3630.3630.363
AllSet 9.0729.0729.0729.072 5.7455.7455.7455.745 0.6320.6320.6320.632 6.1206.1206.1206.120 4.8204.8204.8204.820 0.3450.3450.3450.345
HMPNN 9.3429.3429.3429.342 6.1186.1186.1186.118 0.5610.5610.5610.561 6.9796.9796.9796.979 5.3565.3565.3565.356 0.3060.3060.3060.306
HNHN 9.3429.3429.3429.342 6.1186.1186.1186.118 0.5610.5610.5610.561 6.3906.3906.3906.390 4.8704.8704.8704.870 0.3580.3580.3580.358
NetlistGNN 9.0639.0639.0639.063 5.8395.8395.8395.839 0.6230.6230.6230.623 8.3288.3288.3288.328 6.8396.8396.8396.839 0.3670.3670.3670.367
base DE-HNN 8.9978.9978.9978.997 5.7645.7645.7645.764 0.6300.6300.6300.630 6.7786.7786.7786.778 5.0855.0855.0855.085 0.3370.3370.3370.337
full DE-HNN 8.3818.381\bm{8.381}bold_8.381 5.3345.334\bm{5.334}bold_5.334 0.6830.683\bm{0.683}bold_0.683 6.0376.037\bm{6.037}bold_6.037 4.6704.670\bm{4.670}bold_4.670 0.3720.372\bm{0.372}bold_0.372
Improvement 7.5%percent7.5\bm{7.5\%}bold_7.5 bold_% 7.2%percent7.2\bm{7.2\%}bold_7.2 bold_% 8.1%percent8.1\bm{8.1\%}bold_8.1 bold_% 1.4%percent1.4\bm{1.4\%}bold_1.4 bold_% 4.1%percent4.1\bm{4.1\%}bold_4.1 bold_% 1.4%percent1.4\bm{1.4\%}bold_1.4 bold_%
Table 3: Cell-based congestion classification.
Model Single-Design Cross-Design
Precision \uparrow Recall \uparrow F_score \uparrow Precision \uparrow Recall \uparrow F_score \uparrow
GCN 0.7610.7610.7610.761 0.8570.8570.8570.857 0.8020.8020.8020.802 0.6330.6330.6330.633 0.9970.9970.9970.997 0.7730.7730.7730.773
GATv2 0.8100.8100.8100.810 0.8640.8640.8640.864 0.8350.8350.8350.835 0.6300.6300.6300.630 0.9990.999\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bm{0.999}bold_0.999 0.7650.7650.7650.765
AllSet 0.7820.7820.7820.782 0.8370.8370.8370.837 0.8040.8040.8040.804 0.6450.6450.6450.645 0.9640.9640.9640.964 0.7730.7730.7730.773
HMPNN 0.7740.7740.7740.774 0.8260.8260.8260.826 0.7920.7920.7920.792 0.6330.6330.6330.633 0.9990.999\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\bm{0.999}bold_0.999 0.7720.7720.7720.772
HNHN 0.7920.7920.7920.792 0.8690.8690.8690.869 0.8260.8260.8260.826 0.6480.6480.6480.648 0.9390.9390.9390.939 0.7670.7670.7670.767
NetlistGNN 0.8120.8120.8120.812 0.8600.8600.8600.860 0.8310.8310.8310.831 0.6470.6470.6470.647 0.9530.9530.9530.953 0.7710.7710.7710.771
base DE-HNN 0.8240.8240.8240.824 0.8600.8600.8600.860 0.8400.8400.8400.840 0.6530.6530.6530.653 0.9900.9900.9900.990 0.7740.7740.7740.774
full DE-HNN 0.8330.833\bm{0.833}bold_0.833 0.8760.876\bm{0.876}bold_0.876 0.8530.853\bm{0.853}bold_0.853 0.6600.660\bm{0.660}bold_0.660 0.9860.9860.9860.986 0.7800.780\bm{0.780}bold_0.780
Improvement 2.6%percent2.6\bm{2.6\%}bold_2.6 bold_% 0.8%percent0.8\bm{0.8\%}bold_0.8 bold_% 2.2%percent2.2\bm{2.2\%}bold_2.2 bold_% 1.7%percent1.7\bm{1.7\%}bold_1.7 bold_% \bm{-}bold_- 1.0%percent1.0\bm{1.0\%}bold_1.0 bold_%

The test performance of all baselines and our methods (base-DE-HNN and full-DE-HNN) are shown in Tables 1, 2 and 3. For the regression tasks, to be comprehensive, we use three metrics: the Mean Squared Error (MSE), the Mean Average Error (MAE), and the Pearson correlation (Pearson). For classification tasks, we report the Precision, Recall, and F-Score. Note that for MSE and MAE, the smaller the value is the better; while for Pearson, Precision, Recall, and F-Score, the larger the better. In all tables, we highlight the best performance results in red, while the best among all baseline models are colored blue (unless a baseline result is the best, in which case it will be red-colored). In the last row of these tables, we show the Improvement of our full-DE-HNN model over the best of all baselines for each metric; note that our improvement over any individual baseline can only be better than this improvement. Our full-DE-HNN model outperforms all baselines, sometimes significantly. For example, compared to NetlistGNN (Yang et al.,, 2022), the previous SOTA for netlist representatin learning, our improvement on average is around 5.3%percent5.35.3\%5.3 % for wirelength prediction, and 8.6%percent8.68.6\%8.6 % for demand prediction, both in terms of MAE. See the Supplement for the full results, including the test performances for each design in the single-design setting.

Ablation study.  We carried out an ablation study to understand the effects of the different strategies employed in our DE-HNN. In particular, the factors we wish to test are: (a) the use of direction in modeling nets, (b) the use of persistence diagrams (PDs) as features, and (c) the use of single and hierarchical VNs (virtual nodes). To this end, we compare the performance of the following versions: (a) base-E-HNN stands for treating a net as a standard hyperedge (thus no direction), and using neither PD features nor VNs. (b) base-DE-HNN is the base model for directed hypergraph (described in Section 3.1) with neither PDs nor VNs. In other words, the difference between base-DE-HNN and base-E-HNN shows the effect of adding directions. (c) base-DE-HNN+PD is the base model with only PDs. Hence the difference between (c) and (b) is to show the effect of PDs. (d) base-DE-HNN+PD+single VN is the base model with PD and a single global VN. (e) Finally, full-DE-HNN is our full model with PDs and a two-level hierarchy of VNs. The results for net-based demand regression and cell-based congestion classification are shown in Figure 3. Full results are found in the Supplements, Other metrics and tasks show a similar behavior. For example, for demand prediction, from base-E-HNN, to adding directions, PDs, single VN, and finally two-level VNs, performance improves over the previous version by 2.5%percent2.52.5\%2.5 %, 2.6%percent2.62.6\%2.6 %, 1.0%percent1.01.0\%1.0 % and 3.4%percent3.43.4\%3.4 %, respectively. Overall, full-DE-HNN improves over the base-DE-HNN by around 6.8%percent6.86.8\%6.8 %, while its improvement over base-E-HNN (base model with no direction) is 9.2%percent9.29.2\%9.2 %.

Refer to caption
Figure 3: Ablation study for net-based demand regression (left, RMSE) and cell-based congestion classification (right, F-score).

5 Conclusion

In this paper, we presented an effective model for representation learning on directed hypergraphs. We considered learning on netlists, the hypergraph representation of circuits in chip design. This has great importance in practice but so far ML approaches for netlist representation learning has been limited, partly due to their huge sizes, long-range interactions, as well as the scarsity of benchmark datasets. We introduced several strategies to capture long-range interactions, graph motifs, and to consider direction in a hyperedge. Our model significantly outperforms a range of SOTA methods over a collection of chip designs. Our datasets will be publicly available, which we hope will facilitate further research on ML for chip design applications, pushing the ability of methods to capture long-range interactions. Finally, while we significantly outperformed other approaches, we note that in general, improvements in the cross-design setting are less prominent, potentially due to large variations in circuit designs. It will be interesting to further explore this direction.

Acknowledgements

This work is partially supported by NSF under grants CCF-2112665 and CCF-2310411, as well as by a gift fund from QualComm. The first, second and last authors would like to extend our sincere thanks to Prof. Andrew B. Kahng for the many insightful discussions and contributions in the early stage of this work, especially in exploring and experimenting the use of topological summaries for property prediction of netlists.

References

  • Adams et al., (2017) Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., and Ziegelmeier, L. (2017). Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8):1–35.
  • Al-Hyari et al., (2021) Al-Hyari, A., Szentimrey, H., Shamli, A., Martin, T., Gréwal, G., and Areibi, S. (2021). A deep learning framework to predict routability for fpga circuit placement. ACM Trans. Reconfigurable Technol. Syst., 14(3).
  • Alawieh et al., (2020) Alawieh, M. B., Li, W., Lin, Y., Singhal, L., Iyer, M. A., and Pan, D. Z. (2020). High-definition routing congestion prediction for large-scale fpgas. In Asia and South Pacific Design Automation Conference, pages 26–31.
  • Bai et al., (2021) Bai, S., Zhang, F., and Torr, P. H. (2021). Hypergraph convolution and hypergraph attention. Pattern Recognition, 110:107637.
  • Brody et al., (2022) Brody, S., Alon, U., and Yahav, E. (2022). How attentive are graph attention networks? In Int. Conf. on Learning Representations.
  • Cai et al., (2023) Cai, C., Hy, T. S., Yu, R., and Wang, Y. (2023). On the connection between MPNN and graph transformer. In Int. Conf. on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 3408–3430. PMLR.
  • Chai et al., (2023) Chai, Z., Zhao, Y., Liu, W., Lin, Y., Wang, R., and Huang, R. (2023). Circuitnet: An open-source dataset for machine learning in vlsi cad applications with improved domain-specific evaluation metric and learning strategies. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 1–1.
  • (8) Chen, D., Lin, Y., Li, W., Li, P., Zhou, J., and Sun, X. (2020a). Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3438–3445.
  • (9) Chen, J., Kuang, J., Zhao, G., Huang, D. J. H., and Young, E. F. Y. (2020b). Pros: A plug-in for routability optimization applied in the state-of-the-art commercial eda tool using deep learning. In Int. Conf on Computer-Aided Design.
  • Chen et al., (2022) Chen, X., Di, Z., Wu, W., Wu, Q., Shi, J., and Feng, Q. (2022). Detailed routing short violation prediction using graph-based deep learning model. IEEE Trans. Circuits and Systems II: Express Briefs, 69(2):564–568.
  • Chien et al., (2022) Chien, E., Pan, C., Peng, J., and Milenkovic, O. (2022). You are allset: A multiset function framework for hypergraph neural networks. In International Conference on Learning Representations.
  • Choromanski et al., (2021) Choromanski, K. M., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J. Q., Mohiuddin, A., Kaiser, L., Belanger, D. B., Colwell, L. J., and Weller, A. (2021). Rethinking attention with performers. In Int. Conf. on Learning Representations.
  • Cohen-Steiner et al., (2009) Cohen-Steiner, D., Edelsbrunner, H., and Harer, J. (2009). Extending persistence using poincaré and lefschetz duality. Foundations of Computational Mathematics, 9(1):79–103.
  • Dey and Wang, (2022) Dey, T. K. and Wang, Y. (2022). Computational Topology for Data Analysis. Cambridge University Press. 452 pages.
  • Dong et al., (2020) Dong, Y., Sawin, W., and Bengio, Y. (2020). Hnhn: Hypergraph networks with hyperedge neurons.
  • Dwivedi et al., (2022) Dwivedi, V. P., Rampášek, L., Galkin, M., Parviz, A., Wolf, G., Luu, A. T., and Beaini, D. (2022). Long range graph benchmark. In Advances in Neural Information Processing Systems, volume 35.
  • Edelsbrunner and Harer, (2010) Edelsbrunner, H. and Harer, J. (2010). Computational Topology : an Introduction. American Mathematical Society.
  • Edwards, (2020) Edwards, R. T. (2020). Google/skywater and the promise of the open pdk. In Workshop on Open-Source EDA Technology.
  • Fey and Lenssen, (2019) Fey, M. and Lenssen, J. E. (2019). Fast graph representation learning with pytorch geometric. Int. Conf. on Learning Representations.
  • Fitzpatrick, (1989) Fitzpatrick, P. (1989). Klaus deimling, nonlinear functional analysis.
  • Ghose et al., (2021) Ghose, A., Zhang, V., Zhang, Y., Li, D., Liu, W., and Coates, M. (2021). Generalizable cross-graph embedding for gnn-based congestion prediction. In IEEE/ACM Int. Conf. Computer Aided Design, ICCAD, page 1–9.
  • Gilmer et al., (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). Neural message passing for quantum chemistry. In Int. Conf. on Machine Learning, ICML, page 1263–1272.
  • Heydari and Livi, (2022) Heydari, S. and Livi, L. (2022). Message Passing Neural Networks for Hypergraphs, page 583–592. Springer Nature Switzerland.
  • Hy et al., (2019) Hy, T. S., Trivedi, S., Pan, H., Anderson, B. M., and Kondor, R. (2019). Covariant compositional networks for learning graphs. In Proc. Int. Workshop on Mining and Learning with Graphs (MLG).
  • Jegelka, (2022) Jegelka, S. (2022). Theory of graph neural networks: Representation and learning. Int. Congress of Mathematicians, ICM.
  • Kahng et al., (2022) Kahng, A. B., Kim, M., Kim, S., and Woo, M. (2022). Rosettastone: Connecting the past, present, and future of physical design research. IEEE Design and Test, 39(5):70–78.
  • Karypis et al., (1999) Karypis, G., Aggarwal, R., Kumar, V., and Shekhar, S. (1999). Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Trans. Very Large Scale Integr. Syst., 7(1):69–79.
  • Karypis and Kumar, (1998) Karypis, G. and Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359–392.
  • Katharopoulos et al., (2020) Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. (2020). Transformers are RNNs: Fast autoregressive transformers with linear attention. In Int. Conf. on Machine Learning, ICML.
  • Kim et al., (2022) Kim, J., Nguyen, D., Min, S., Cho, S., Lee, M., Lee, H., and Hong, S. (2022). Pure transformers are powerful graph learners. In Advances in Neural Information Processing Systems, volume 35.
  • Kipf and Welling, (2017) Kipf, T. N. and Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Int. Conf. on Learning Representations.
  • Kirby et al., (2019) Kirby, R., Godil, S., Roy, R., and Catanzaro, B. (2019). Congestionnet: Routing congestion prediction using deep graph neural networks. In IFIP/IEEE Int. Conf. Very Large Scale Integration, VLSI-SoC, pages 217–222.
  • Liang et al., (2020) Liang, R., Xiang, H., Pandey, D., Reddy, L., Ramji, S., Nam, G.-J., and Hu, J. (2020). Drc hotspot prediction at sub-10nm process nodes using customized convolutional network. In Int. Symposium on Physical Design, ISPD, page 135–142.
  • Mirhoseini et al., (2021) Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W., Songhori, E. M., Wang, S., Lee, Y.-J., Johnson, E., Pathak, O., Nazi, A., Pak, J., Tong, A., Srinivasa, K., Hang, W., Tuncer, E., Le, Q. V., Laudon, J., Ho, R., Carpenter, R., and Dean, J. (2021). A graph placement methodology for fast chip design. Nature, 594:207 – 212.
  • Müller et al., (2023) Müller, L., Morris, C., Galkin, M., and Rampášek, L. (2023). Attending to Graph Transformers. Arxiv preprint.
  • Ngo et al., (2023) Ngo, N. K., Hy, T. S., and Kondor, R. (2023). Multiresolution graph transformers and wavelet positional encoding for learning long-range and hierarchical structures. The Journal of Chemical Physics, 159(3):034109.
  • Paszke et al., (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, volume 32.
  • Rampášek et al., (2022) Rampášek, L., Galkin, M., Dwivedi, V. P., Luu, A. T., Wolf, G., and Beaini, D. (2022). Recipe for a general, powerful, scalable graph transformer. In Advances in Neural Information Processing Systems, volume 35, pages 14501–14515.
  • Sutherland, (2009) Sutherland, W. A. (2009). Introduction to metric and topological spaces. Oxford University Press.
  • Tabaghi and Wang, (2023) Tabaghi, P. and Wang, Y. (2023). Universal Representation of Permutation-Invariant Functions on Vectors and Tensors. arXiv preprint arXiv:2310.13829.
  • Tabrizi et al., (2018) Tabrizi, A. F., Rakai, L., Darav, N. K., Bustany, I., Behjat, L., Xu, S., and Kennings, A. (2018). A machine learning framework to identify detailed routing short violations from a placed netlist. In ACM/ESDA/IEEE Design Automation Conference, DAC), pages 1–6.
  • Tay et al., (2022) Tay, Y., Dehghani, M., Bahri, D., and Metzler, D. (2022). Efficient transformers: A survey. ACM Comput. Surv., 55(6).
  • Tian and Wang, (2019) Tian, M. and Wang, Y. (2019). A limit theorem for the 1111st Betti number of layer-1111 subgraphs in random graphs. ArXiv preprint.
  • Topping et al., (2022) Topping, J., Giovanni, F. D., Chamberlain, B. P., Dong, X., and Bronstein, M. M. (2022). Understanding over-squashing and bottlenecks on graphs via curvature. In Int. Conf. on Learning Representations.
  • Veličković et al., (2018) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. (2018). Graph attention networks. In Int. Conf. on Learning Representations.
  • Viswanathan et al., (2012) Viswanathan, N., Alpert, C., Sze, C., Li, Z., and Wei, Y. (2012). The DAC 2012 routability-driven placement contest and benchmark suite. In Annual Design Automation Conference, DAC, page 774–782.
  • Viswanathan et al., (2011) Viswanathan, N., Alpert, C. J., Sze, C., Li, Z., Nam, G.-J., and Roy, J. A. (2011). The ISPD-2011 routability-driven placement contest and benchmark suite. In Int. Symposium on Physical Design, ISPD, page 141–146.
  • Wagstaff et al., (2019) Wagstaff, E., Fuchs, F., Engelcke, M., Posner, I., and Osborne, M. A. (2019). On the limitations of representing functions on sets. In Int. Conf. on Machine Learning, pages 6487–6494. PMLR.
  • Wang et al., (2022) Wang, B., Shen, G., Li, D., Hao, J., Liu, W., Huang, Y., Wu, H., Lin, Y., Chen, G., and Heng, P. A. (2022). Lhnn: Lattice hypergraph neural network for vlsi congestion prediction. In ACM/IEEE Design Automation Conference, DAC, page 1297–1302.
  • Xie et al., (2018) Xie, Z., Huang, Y.-H., Fang, G.-Q., Ren, H., Fang, S.-Y., Chen, Y., and Hu, J. (2018). Routenet: Routability prediction for mixed-size designs using convolutional neural network. In IEEE/ACM Int. Conf on Computer-Aided Design, ICCAD, pages 1–8.
  • Xie et al., (2021) Xie, Z., Liang, R., Xu, X., Hu, J., Duan, Y., and Chen, Y. (2021). Net2: A graph attention network method customized for pre-placement net length estimation. In Asia and South Pacific Design Automation Conference, ASPDAC, page 671–677.
  • Xu et al., (2019) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2019). How powerful are graph neural networks? In Int. Conf. on Learning Representations.
  • Yan et al., (2021) Yan, Z., Ma, T., Gao, L., Tang, Z., and Chen, C. (2021). Link prediction with persistent homology: An interactive view. In Int. Conf. on Machine Learning, pages 11659–11669. PMLR.
  • Yang et al., (2022) Yang, S., Yang, Z., Li, D., Zhang, Y., Zhang, Z., Song, G., and Hao, J. (2022). Versatile multi-stage graph neural network for circuit representation. In Advances in Neural Information Processing Systems, volume 35.
  • Zaheer et al., (2017) Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. (2017). Deep sets. In Advances in Neural Information Processing Systems, volume 30.
  • Zhang et al., (2023) Zhang, B., Luo, S., Wang, L., and He, D. (2023). Rethinking the expressive power of GNNs via graph biconnectivity. In The Eleventh Int. Conf. on Learning Representations.
  • Zhao et al., (2020) Zhao, Q., Ye, Z., Chen, C., and Wang, Y. (2020). Persistence enhanced graph neural network. In Int. Conf. on Artificial Intelligence and Statistics, pages 2896–2906. PMLR.

 

Supplement of the AISTATS submission #1797,
Title: “DE-HNN: An effective neural model for Circuit Netlist representation”


 


Appendix A More Background

Refer to caption
Figure 4: Visulization of a circuit netlits in the post place-and-route stage: Each cell (cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) is positioned within the physical layout of the chip and interconnected with other components following the net maps (σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT).

A.1 Circuit netlists

A circuit netlist is a textual representation of electronic components, such as logical boolean gates, and the connections between them. Figure 4 provides an example illustrating a circuit netlist consisting of five components interconnected by five nets. After laying out the netlist in the physical space (placement), during the routing stage, the edges of the netlist are mapped to the routing channels within the chip’s physical floorplan (indicated by dashed lines in Figure 4). Routing congestion occurs when the number of edges to be routed in a specific region of the floorplan exceeds the available routing capacity.





A.2 Persistence homology based features

Persistent homology is one of the most important development in the field of topological data analysis in the past two decades. It can encode meaningful topological features in a multi-scale manner and has already been applied to numerous applications; see books (Edelsbrunner and Harer,, 2010; Dey and Wang,, 2022). Intuitively, given a domain X𝑋Xitalic_X, a p𝑝pitalic_p-th homology class (or informally, a p𝑝pitalic_p-th homological feature) essentially captures a family of equivalent p𝑝pitalic_p-dimensional “holes” in X𝑋Xitalic_X; for example, 0-, 1- and 2-dimensional homology captures connected components, equivalent classes of loops and of 2-dimensional voids, respectively. The p𝑝pitalic_p-th homology group Hp(X)subscriptH𝑝𝑋\mathrm{H}_{p}(X)roman_H start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X ) (using 2subscript2\mathbb{Z}_{2}blackboard_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT coefficients) characterizes the space of p𝑝pitalic_p-th homological features of X𝑋Xitalic_X, and its rank rank(Hp(X))𝑟𝑎𝑛𝑘subscriptH𝑝𝑋rank(\mathrm{H}_{p}(X))italic_r italic_a italic_n italic_k ( roman_H start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_X ) ) corresponds to the number of independent p𝑝pitalic_p-th homological features. For example, rank(H0(X))𝑟𝑎𝑛𝑘subscriptH0𝑋rank(\mathrm{H}_{0}(X))italic_r italic_a italic_n italic_k ( roman_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X ) ) denotes the number of connected components of space X𝑋Xitalic_X. If the domain X𝑋Xitalic_X is a graph, then its 1111-st homology group is homeomorphic the space of cycles (loops), and rank(H1(X))𝑟𝑎𝑛𝑘subscriptH1𝑋rank(\mathrm{H}_{1}(X))italic_r italic_a italic_n italic_k ( roman_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X ) ) is simply the number of independent cycles.

Persistent homology is a modern extension of homology, where instead of a single space X𝑋Xitalic_X, we now inspect a sequence of growing spaces X1X2Xm=Xsubscript𝑋1subscript𝑋2subscript𝑋𝑚𝑋X_{1}\subseteq X_{2}\subseteq\cdots X_{m}=Xitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊆ ⋯ italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_X, called a filtration 444Note that persistent homology has been extended for much broader families of filtrations then the growing sequence we describe here; see (Edelsbrunner and Harer,, 2010; Dey and Wang,, 2022).. Intuitively, we can view this filtration as an evolution of the space X𝑋Xitalic_X. As the space grows, we track its corresponding topological features (as captured by the homology groups introduced above). Sometimes new topological features (e.g., a new component, or a hole) will appear, and sometimes they will disappear (e.g, a hole is filled and thus “disappear”). The persistent homology (PH) captures such creation and death of topological features, and outputs a feature summary called persistent diagram (PD), consisting of the birth-time and death-time of classes of topological features during this evolution. In short, the PDs provide a multiscale summary for the topological features of X𝑋Xitalic_X from the perspective of filtration X1X2Xmsubscript𝑋1subscript𝑋2subscript𝑋𝑚X_{1}\subseteq X_{2}\subseteq\cdots\subseteq X_{m}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊆ ⋯ ⊆ italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. In the past few years, there have been a series of approaches to vectorize PDs or to kernelize them for ML applications; see Chapter 13 of (Dey and Wang,, 2022). In our work, we use the so-called persistence images (Adams et al.,, 2017) to convert a PD to a finite dimensional vector to be used as part of the node feature.

Persistence diagram (PD) induced from Netlists. We compute the persistence diagram (PD) based on the following directed graph representation of the netlist (instead of using the directed hypergraph representation, for easier computation of persistence).

In particular, given a netlist, we create a directed graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ) where each node in V𝑉Vitalic_V corresponds to a cell in the netlist, while a directed edge is formed between the driver cell of some net with each of the sink in that net; that is, each net σ=(𝗏σ,𝖲σ)𝜎subscript𝗏𝜎subscript𝖲𝜎\sigma=({\mathsf{v}}_{\sigma},{\mathsf{S}}_{\sigma})italic_σ = ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) gives rise to a set of directed edges {(𝗏σ,v)}v𝖲σsubscriptsubscript𝗏𝜎𝑣𝑣subscript𝖲𝜎\{({\mathsf{v}}_{\sigma},v)\}_{v\in{\mathsf{S}}_{\sigma}}{ ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , italic_v ) } start_POSTSUBSCRIPT italic_v ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT. We refer to this graph as the star-graph induced by the input netlist.

Now, for each node v𝑣vitalic_v in the star-graph G𝐺Gitalic_G, we create its k𝑘kitalic_k-ring out-flow neighborhood Gvinsubscriptsuperscript𝐺𝑖𝑛𝑣G^{in}_{v}italic_G start_POSTSUPERSCRIPT italic_i italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT which is simply the subgraph in G𝐺Gitalic_G spanned by all nodes reachable from v𝑣vitalic_v by a path of at most k𝑘kitalic_k hops. Symmetrically, the k𝑘kitalic_k-ring in-flow neighborhood Gvoutsubscriptsuperscript𝐺𝑜𝑢𝑡𝑣G^{out}_{v}italic_G start_POSTSUPERSCRIPT italic_o italic_u italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT of v𝑣vitalic_v is the subgraph of G𝐺Gitalic_G spanned by all nodes such that v𝑣vitalic_v is reachable within k𝑘kitalic_k hops. These two Gvsubscript𝐺𝑣G_{v}italic_G start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPTs form the local motifs around v𝑣vitalic_v. We wish to obtain a feature vector summary for Gvinsubscriptsuperscript𝐺𝑖𝑛𝑣G^{in}_{v}italic_G start_POSTSUPERSCRIPT italic_i italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT and Gvoutsubscriptsuperscript𝐺𝑜𝑢𝑡𝑣G^{out}_{v}italic_G start_POSTSUPERSCRIPT italic_o italic_u italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT. To this end, similar to (Yan et al.,, 2021; Zhao et al.,, 2020), we use the so-called 00th extended persistence diagram PDv𝑃subscriptsuperscript𝐷𝑣PD^{*}_{v}italic_P italic_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT (Cohen-Steiner et al.,, 2009) of Gvsubscriptsuperscript𝐺𝑣G^{*}_{v}italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT (where {in,out}*\in\{in,out\}∗ ∈ { italic_i italic_n , italic_o italic_u italic_t }) induced by the shortest path distance function to v𝑣vitalic_v as its summary. For our specific graph settig, using results of (Tian and Wang,, 2019), one can show that PDv𝑃subscriptsuperscript𝐷𝑣PD^{*}_{v}italic_P italic_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT computed this way, is a concise summary that encodes rich information around v𝑣vitalic_v, including the number of triangles incident to v𝑣vitalic_v, clustering coefficient of v𝑣vitalic_v, number of nodes at distance rk𝑟𝑘r\leq kitalic_r ≤ italic_k away from v𝑣vitalic_v, number of crossing edges from distance-r𝑟ritalic_r nodes to distance (r+1𝑟1r+1italic_r + 1) nodes, certain “shortest” system of cycle basis passing through v𝑣vitalic_v of Gvsubscriptsuperscript𝐺𝑣G^{*}_{v}italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, and so on. Finally, each PD (for in-flow neighborhood and out-flow neighborhood of v𝑣vitalic_v) is vectorized to persistent images as in (Adams et al.,, 2017), and we further vectorize (flatten) each image matrix all together to generate a new vector as the persistent representation vector for each node.

A.3 Metis Partitioning

Metis (Karypis and Kumar,, 1998) is currently State-of-the-art (SOTA) algorithm in chip design applications to provide balanced k-way partitions for large graphs efficiently. We choose Metis due to its practical performance, that it produces multiple clusters of balanced sizes, and the ease it is to control the size of the clusters. Besides Metis, hMetis (Karypis et al.,, 1999) developed on the base of Metis for hypergraph parititioning is an alternative choice. In our paper we used Metis in case we need partitioning for other graph based methods we compare our method with, and Metis will give us consistent partitioning. Nevertheless, we expect that hMetis can be used without much impact to the performance of our pipeline.

Appendix B Theoretical Analysis

B.1 Preliminaries

Let 𝔻𝔻\mathbb{D}blackboard_D be a domain, such as ,\mathbb{Q},\mathbb{R}blackboard_Q , blackboard_R, and dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Consider the set function f:2𝔻codom(f):𝑓superscript2𝔻codom𝑓f:2^{\mathbb{D}}\rightarrow\mathrm{codom}(f)italic_f : 2 start_POSTSUPERSCRIPT blackboard_D end_POSTSUPERSCRIPT → roman_codom ( italic_f ) where 𝔻𝔻\mathbb{D}blackboard_D is a countable domain. Then, we have

X𝔻:f(X)=ρΦ(X),Φ(X)=xXϕ(x),:for-all𝑋𝔻formulae-sequence𝑓𝑋𝜌Φ𝑋Φ𝑋subscript𝑥𝑋italic-ϕ𝑥\displaystyle\forall X\subseteq\mathbb{D}:f(X)=\rho\circ\Phi(X),\ \ \Phi(X)=% \sum_{x\in X}\phi(x),∀ italic_X ⊆ blackboard_D : italic_f ( italic_X ) = italic_ρ ∘ roman_Φ ( italic_X ) , roman_Φ ( italic_X ) = ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT italic_ϕ ( italic_x ) , (8)

where ϕ:𝔻codom(ϕ):italic-ϕ𝔻codomitalic-ϕ\phi:\mathbb{D}\rightarrow\mathrm{codom}(\phi)\subset\mathbb{R}italic_ϕ : blackboard_D → roman_codom ( italic_ϕ ) ⊂ blackboard_R, and ρ:codom(ϕ)codom(f):𝜌codomitalic-ϕcodom𝑓\rho:\mathrm{codom}(\phi)\rightarrow\mathrm{codom}(f)italic_ρ : roman_codom ( italic_ϕ ) → roman_codom ( italic_f ). This is the so-called sum-decomposable representation of f𝑓fitalic_f via \mathbb{R}blackboard_R in terms of (ϕ,ρ)italic-ϕ𝜌(\phi,\rho)( italic_ϕ , italic_ρ ) basis functions. We refer to ambient space of codom(ϕ)codomitalic-ϕ\mathrm{codom}(\phi)roman_codom ( italic_ϕ ) (in Eqn (8), it is \mathbb{R}blackboard_R) as the model’s latent space (Zaheer et al.,, 2017). There is an extended sum-decomposable representation of f𝑓fitalic_f on multisets whose elements are drawn from an uncountable domain (e.g., dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT) by restricting the size of input multisets. Recently, Tabaghi and Wang, (2023) proposed an encoder ϕ:dcodom(ϕ)2dN:italic-ϕsuperscript𝑑codomitalic-ϕsuperscript2𝑑𝑁\phi:\mathbb{R}^{d}\rightarrow\mathrm{codom}(\phi)\subset\mathbb{R}^{2dN}italic_ϕ : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → roman_codom ( italic_ϕ ) ⊂ blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT such that for any continuous multiset function f:𝕏N,codom(f):𝑓subscript𝕏𝑁codom𝑓f:\mathbb{X}_{N,\mathcal{B}}\rightarrow\mathrm{codom}(f)italic_f : blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT → roman_codom ( italic_f ) we have,

X𝕏N,:f(X)=ρΦ(X),:for-all𝑋subscript𝕏𝑁𝑓𝑋𝜌Φ𝑋\forall X\in\mathbb{X}_{N,\mathcal{B}}:f(X)=\rho\circ\Phi(X),∀ italic_X ∈ blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT : italic_f ( italic_X ) = italic_ρ ∘ roman_Φ ( italic_X ) ,

where 𝕏N,subscript𝕏𝑁\mathbb{X}_{N,\mathcal{B}}blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT is the set of multisets of size N𝑁Nitalic_N with elements drawn from \mathcal{B}caligraphic_B — a compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT — and Φ(X)=xXϕ(x)Φ𝑋subscript𝑥𝑋italic-ϕ𝑥\Phi(X)=\sum_{x\in X}\phi(x)roman_Φ ( italic_X ) = ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT italic_ϕ ( italic_x ). The latent space dimension of this representation is 2dN2𝑑𝑁2dN2 italic_d italic_N. As a result of this representation, Φ:𝕏N,codom(Φ):Φsubscript𝕏𝑁codomΦ\Phi:\mathbb{X}_{N,\mathcal{B}}\rightarrow\mathrm{codom}(\Phi)roman_Φ : blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT → roman_codom ( roman_Φ ) is an injective map where codom(Φ)2dNcodomΦsuperscript2𝑑𝑁\mathrm{codom}(\Phi)\subset\mathbb{R}^{2dN}roman_codom ( roman_Φ ) ⊂ blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT. Furthermore, Tabaghi and Wang, (2023) show that ρ𝜌\rhoitalic_ρ is continuous over the latent space 2dNsuperscript2𝑑𝑁\mathbb{R}^{2dN}blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT.

At their core, sum-decomposable models rely on a bijection between multisets and encoded features, that is, X=αΦ(X)𝑋𝛼Φ𝑋X=\alpha\circ\Phi(X)italic_X = italic_α ∘ roman_Φ ( italic_X ) for any multiset X𝕏N,𝑋subscript𝕏𝑁X\in\mathbb{X}_{N,\mathcal{B}}italic_X ∈ blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT and a bijective map α𝛼\alphaitalic_α. In the following proposition, we first generalize this result to multisets with varied sizes.

Proposition 1.

Let \mathcal{B}caligraphic_B be a connected compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. There exists a function ϕ:d2dN:italic-ϕsuperscript𝑑superscript2𝑑𝑁\phi:\mathbb{R}^{d}\rightarrow\mathbb{R}^{2dN}italic_ϕ : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT such that

X𝕏N,:X=α(xXϕ(x))=αΦ(X):for-all𝑋subscript𝕏absent𝑁𝑋𝛼subscript𝑥𝑋italic-ϕ𝑥𝛼Φ𝑋\forall X\in\mathbb{X}_{\leq N,\mathcal{B}}:X=\alpha\bigg{(}\sum_{x\in X}\phi(% x)\bigg{)}=\alpha\circ\Phi(X)∀ italic_X ∈ blackboard_X start_POSTSUBSCRIPT ≤ italic_N , caligraphic_B end_POSTSUBSCRIPT : italic_X = italic_α ( ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT italic_ϕ ( italic_x ) ) = italic_α ∘ roman_Φ ( italic_X )

where 𝕏N,subscript𝕏absent𝑁\mathbb{X}_{\leq N,\mathcal{B}}blackboard_X start_POSTSUBSCRIPT ≤ italic_N , caligraphic_B end_POSTSUBSCRIPT is all multisets of size Nabsent𝑁\leq N≤ italic_N with elements from \mathcal{B}caligraphic_B, α𝛼\alphaitalic_α is a continuous map over 2dNsuperscript2𝑑𝑁\mathbb{R}^{2dN}blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT.

Proof.

As mentioned earlier, Tabaghi and Wang, (2023) proposed a continuous encoding function ϕ:d2dN:superscriptitalic-ϕsuperscript𝑑superscript2𝑑𝑁\phi^{\prime}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{2dN}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT such that Φ(x)=xXϕ(x)superscriptΦ𝑥subscript𝑥𝑋superscriptitalic-ϕ𝑥\Phi^{\prime}(x)=\sum_{x\in X}\phi^{\prime}(x)roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) is an injective map over multisets with exactly N𝑁Nitalic_N elements, that is, (Φ)1Φ(X)=XsuperscriptsuperscriptΦ1superscriptΦ𝑋𝑋(\Phi^{\prime})^{-1}\circ\Phi^{\prime}(X)=X( roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X ) = italic_X where elements of multiset X𝑋Xitalic_X is drawn from dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and |X|=N𝑋𝑁|X|=N| italic_X | = italic_N 555Note that we trivially changed the notation from ϕitalic-ϕ\phiitalic_ϕ to ϕsuperscriptitalic-ϕ\phi^{\prime}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT..

Lemma 1.

The function ΦsuperscriptΦ\Phi^{\prime}roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a homeomorphism.

Proof.

The function ΦsuperscriptΦ\Phi^{\prime}roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is continous and injective by contruction. We want to show that (Φ)1superscriptsuperscriptΦ1(\Phi^{\prime})^{-1}( roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is continuous over codom(Φ)={Φ(X):X𝕏N,}codomsuperscriptΦconditional-setsuperscriptΦ𝑋𝑋subscript𝕏𝑁\mathrm{codom}(\Phi^{\prime})=\{\Phi^{\prime}(X):X\in\mathbb{X}_{N,\mathcal{B}}\}roman_codom ( roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = { roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X ) : italic_X ∈ blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT } where \mathcal{B}caligraphic_B is a compact and connected subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The set 𝕏N,subscript𝕏𝑁\mathbb{X}_{N,\mathcal{B}}blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT is compact; refer to Lemma 5 in (Tabaghi and Wang,, 2023). Also, codom(Φ)2dNcodomsuperscriptΦsuperscript2𝑑𝑁\mathrm{codom}(\Phi^{\prime})\subseteq\mathbb{R}^{2dN}roman_codom ( roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⊆ blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT forms a metric space with 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT metric; Hence, it is also a Hausdorff space. By the inverse function theorem, ΦsuperscriptΦ\Phi^{\prime}roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT — a continuous bijection from a compact space to a Hausdorff space — has a continuous inverse; see Proposition 13.26 in (Sutherland,, 2009). ∎

To extend the result of Lemma 1 to multisets of variable sizes, we follow the same proof sketch as the one used for the one-dimensional case (Wagstaff et al.,, 2019). In particular, let xdsubscript𝑥superscript𝑑x_{\circ}\in\mathbb{R}^{d}\setminus\mathcal{B}italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ caligraphic_B where infxxx2>0subscriptinfimum𝑥subscriptnorm𝑥subscript𝑥20\inf_{x\in\mathcal{B}}\|x-x_{\circ}\|_{2}>0roman_inf start_POSTSUBSCRIPT italic_x ∈ caligraphic_B end_POSTSUBSCRIPT ∥ italic_x - italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0. Then, we define ϕ(x)=ϕ(x)ϕ(x)italic-ϕ𝑥superscriptitalic-ϕ𝑥superscriptitalic-ϕsubscript𝑥\phi(x)=\phi^{\prime}(x)-\phi^{\prime}(x_{\circ})italic_ϕ ( italic_x ) = italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) - italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT ). For any multiset with MN𝑀𝑁M\leq Nitalic_M ≤ italic_N elements from \mathcal{B}caligraphic_B, say X𝑋Xitalic_X, we have

X𝕏N,:Φ(X):for-all𝑋subscript𝕏absent𝑁Φ𝑋\displaystyle\forall X\in\mathbb{X}_{\leq N,\mathcal{B}}:\Phi(X)∀ italic_X ∈ blackboard_X start_POSTSUBSCRIPT ≤ italic_N , caligraphic_B end_POSTSUBSCRIPT : roman_Φ ( italic_X ) =xXϕ(x)=xXϕ(x)Mϕ(x)absentsubscript𝑥𝑋italic-ϕ𝑥subscript𝑥𝑋superscriptitalic-ϕ𝑥𝑀superscriptitalic-ϕsubscript𝑥\displaystyle=\sum_{x\in X}\phi(x)=\sum_{x\in X}\phi^{\prime}(x)-M\phi^{\prime% }(x_{\circ})= ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT italic_ϕ ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) - italic_M italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT )
=Φ(X{{x,,xNM}})+const,absentsuperscriptΦ𝑋subscriptsubscript𝑥subscript𝑥𝑁𝑀const\displaystyle=\Phi^{\prime}(X\cup\{\{\underbrace{x_{\circ},\ldots,x_{\circ}}_{% N-M}\}\})+\mathrm{const},= roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X ∪ { { under⏟ start_ARG italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_N - italic_M end_POSTSUBSCRIPT } } ) + roman_const ,

where const=Nϕ(x)const𝑁superscriptitalic-ϕsubscript𝑥\mathrm{const}=-N\phi^{\prime}(x_{\circ})roman_const = - italic_N italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT ). This is in the sum-decomposable form. Since ΦsuperscriptΦ\Phi^{\prime}roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is an injective map over 𝕏N,subscript𝕏𝑁\mathbb{X}_{N,\mathcal{B}}blackboard_X start_POSTSUBSCRIPT italic_N , caligraphic_B end_POSTSUBSCRIPT, ΦΦ\Phiroman_Φ is also an injective map over 𝕏N,subscript𝕏absent𝑁\mathbb{X}_{\leq N,\mathcal{B}}blackboard_X start_POSTSUBSCRIPT ≤ italic_N , caligraphic_B end_POSTSUBSCRIPT. Specifically, we can derive a closed-form expression for its inverse as follows:

(Φ1(Φ(X)const))superscriptsuperscriptΦ1Φ𝑋const\displaystyle\Big{(}{\Phi^{\prime}}^{-1}\circ(\Phi(X)-\mathrm{const}\big{)}% \Big{)}\cap\mathcal{B}( roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ ( roman_Φ ( italic_X ) - roman_const ) ) ∩ caligraphic_B =(Φ1Φ(X{{x,,xNM}}))absentsuperscriptsuperscriptΦ1superscriptΦ𝑋subscriptsubscript𝑥subscript𝑥𝑁𝑀\displaystyle=\Big{(}{\Phi^{\prime}}^{-1}\circ\Phi^{\prime}(X\cup\{\{% \underbrace{x_{\circ},\ldots,x_{\circ}}_{N-M}\}\})\Big{)}\cap\mathcal{B}= ( roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X ∪ { { under⏟ start_ARG italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_N - italic_M end_POSTSUBSCRIPT } } ) ) ∩ caligraphic_B
=(X{{x,,xNM}})absent𝑋subscriptsubscript𝑥subscript𝑥𝑁𝑀\displaystyle=(X\cup\{\{\underbrace{x_{\circ},\ldots,x_{\circ}}_{N-M}\}\})\cap% \mathcal{B}= ( italic_X ∪ { { under⏟ start_ARG italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT ∘ end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_N - italic_M end_POSTSUBSCRIPT } } ) ∩ caligraphic_B
=X.absent𝑋\displaystyle=X.= italic_X .

In other words, we have Φ1(U)=Φ1(Uconst)superscriptΦ1𝑈superscriptsuperscriptΦ1𝑈const\Phi^{-1}(U)={\Phi^{\prime}}^{-1}\big{(}U-\mathrm{const}\big{)}\cap\mathcal{B}roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_U ) = roman_Φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_U - roman_const ) ∩ caligraphic_B for all Ucodom(Φ)={Φ(X):X𝕏N,}𝑈codomΦconditional-setΦ𝑋𝑋subscript𝕏absent𝑁U\in\mathrm{codom}(\Phi)=\{\Phi(X):X\in\mathbb{X}_{\leq N,\mathcal{B}}\}italic_U ∈ roman_codom ( roman_Φ ) = { roman_Φ ( italic_X ) : italic_X ∈ blackboard_X start_POSTSUBSCRIPT ≤ italic_N , caligraphic_B end_POSTSUBSCRIPT }.

The function Φ:𝕏N,codom(Φ):Φsubscript𝕏absent𝑁codomΦ\Phi:\mathbb{X}_{\leq N,\mathcal{B}}\rightarrow\mathrm{codom}(\Phi)roman_Φ : blackboard_X start_POSTSUBSCRIPT ≤ italic_N , caligraphic_B end_POSTSUBSCRIPT → roman_codom ( roman_Φ ) is a continuous bijection. Its domain is compact because it is a finite union of comapct spaces, that is, 𝕏N,=n[N]𝕏n,subscript𝕏absent𝑁subscript𝑛delimited-[]𝑁subscript𝕏𝑛\mathbb{X}_{\leq N,\mathcal{B}}=\cup_{n\in[N]}\mathbb{X}_{n,\mathcal{B}}blackboard_X start_POSTSUBSCRIPT ≤ italic_N , caligraphic_B end_POSTSUBSCRIPT = ∪ start_POSTSUBSCRIPT italic_n ∈ [ italic_N ] end_POSTSUBSCRIPT blackboard_X start_POSTSUBSCRIPT italic_n , caligraphic_B end_POSTSUBSCRIPT. Furthermore, its codomain forms a metric space with 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance; Hence, it is also a Hausdorff space. Therefore, by an inverse function theorem, ΦΦ\Phiroman_Φ has a continuous inverse; (Sutherland,, 2009). If we let α=Φ1𝛼superscriptΦ1\alpha={\Phi}^{-1}italic_α = roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, we arrive at the Proposition’s statement. ∎

Remark 1.

Proposition 1 gurantees the continuity of α𝛼\alphaitalic_α over a compact set codom(Φ)2dNcodomΦsuperscript2𝑑𝑁\mathrm{codom}(\Phi)\subset\mathbb{R}^{2dN}roman_codom ( roman_Φ ) ⊂ blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT. This function can be continuously extended to the ambient space 2dNsuperscript2𝑑𝑁\mathbb{R}^{2dN}blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N end_POSTSUPERSCRIPT; refer to the continuous extension theorem (Fitzpatrick,, 1989).

B.2 Proof of Theorem 1

As discussed in the main text, the general net-value function is nested-permutation invariant and take the form of Eqn (6) in the main text. In what follows, we provide the detailed version of Theorem 1 and show that such nested-permutation invariant function adapts a nested sum-decomposition aligned with our DE-HNN’s node and net update rules.

{restatable*}

[Detailed]theoremgoldbach Let {\mathcal{F}}caligraphic_F be any continuous, nested-permutation invariant, net-value function as in Eqn (6). For simplicity, assume both input nets and output of M𝑀Mitalic_M take values in a compact set dsuperscript𝑑\mathcal{B}\subset\mathbb{R}^{d}caligraphic_B ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, a connected compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. We also let Nv=max{|(v)|:v}subscript𝑁𝑣:𝑣for-all𝑣N_{v}=\max\{|{\mathcal{I}}(v)|:\forall v\}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = roman_max { | caligraphic_I ( italic_v ) | : ∀ italic_v } and Nσ=max{|𝖲(σ)|:σ}subscript𝑁𝜎:𝖲𝜎for-all𝜎N_{{\sigma}}=\max\{|{\mathsf{S}}({\sigma})|:\forall{\sigma}\}italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = roman_max { | sansserif_S ( italic_σ ) | : ∀ italic_σ }. Then we have that {\mathcal{F}}caligraphic_F can be expressed as the following sum-decomposition:

σ:({μσ}σ(𝗏σ),{{μσ}σ(v)}v𝖲σ)=ρ(σ(𝗏σ)ϕ1(μσ),v𝖲σϕ2(σ(v)ϕ1(μσ))),:for-all𝜎subscriptsubscript𝜇superscript𝜎superscript𝜎subscript𝗏𝜎subscriptsubscriptsubscript𝜇superscript𝜎superscript𝜎superscript𝑣superscript𝑣subscript𝖲𝜎𝜌subscriptsuperscript𝜎subscript𝗏𝜎subscriptitalic-ϕ1subscript𝜇superscript𝜎subscriptsuperscript𝑣subscript𝖲𝜎subscriptitalic-ϕ2subscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1subscript𝜇superscript𝜎\displaystyle\forall\sigma:{\mathcal{F}}\Big{(}\{\mu_{{\sigma}^{\prime}}\}_{{% \sigma}^{\prime}\in{\mathcal{I}}({\mathsf{v}}_{\sigma})},\Big{\{}\{\mu_{{% \sigma}^{\prime}}\}_{{\sigma}^{\prime}\in{\mathcal{I}}(v^{\prime})}\Big{\}}_{v% ^{\prime}\in{\mathsf{S}}_{\sigma}}\Big{)}=\rho\Big{(}\sum_{\sigma^{\prime}\in{% \mathcal{I}}({\mathsf{v}}_{\sigma})}\phi_{1}(\mu_{\sigma^{\prime}}),\sum_{v^{% \prime}\in{\mathsf{S}}_{\sigma}}\phi_{2}\big{(}\sum_{\sigma^{\prime}\in{% \mathcal{I}}(v^{\prime})}\phi_{1}(\mu_{\sigma^{\prime}})\big{)}\Big{)},∀ italic_σ : caligraphic_F ( { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , { { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = italic_ρ ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) ) ,

where ϕ1:dd(d,Nv):subscriptitalic-ϕ1superscript𝑑superscriptsuperscript𝑑𝑑subscript𝑁𝑣\phi_{1}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d^{\prime}(d,N_{v})}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT, ϕ2:d(d,Nv)d′′(d,Nv,Nσ):subscriptitalic-ϕ2superscriptsuperscript𝑑𝑑subscript𝑁𝑣superscriptsuperscript𝑑′′𝑑subscript𝑁𝑣subscript𝑁𝜎\phi_{2}:\mathbb{R}^{d^{\prime}(d,N_{v})}\rightarrow\mathbb{R}^{d^{\prime% \prime}(d,N_{v},N_{{\sigma}})}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT, and ρ:d(d,Nv)×d′′(d,Nv,Nσ)d:𝜌superscriptsuperscript𝑑𝑑subscript𝑁𝑣superscriptsuperscript𝑑′′𝑑subscript𝑁𝑣subscript𝑁𝜎superscript𝑑\rho:\mathbb{R}^{d^{\prime}(d,N_{v})}\times\mathbb{R}^{d^{\prime\prime}(d,N_{v% },N_{\sigma})}\rightarrow\mathbb{R}^{d}italic_ρ : blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT are continuous functions.

Proof.

The general net-value function takes the following form:

σ::for-all𝜎absent\displaystyle\forall\sigma:~{}∀ italic_σ : ({μσ}σ(𝗏σ),{{μσ}σ(v)}v𝖲σ).subscriptsubscript𝜇superscript𝜎superscript𝜎subscript𝗏𝜎subscriptsubscriptsubscript𝜇superscript𝜎superscript𝜎superscript𝑣superscript𝑣subscript𝖲𝜎\displaystyle{\mathcal{F}}\Big{(}\{\mu_{{\sigma}^{\prime}}\}_{{\sigma}^{\prime% }\in{\mathcal{I}}({\mathsf{v}}_{\sigma})},\Big{\{}\{\mu_{{\sigma}^{\prime}}\}_% {{\sigma}^{\prime}\in{\mathcal{I}}(v^{\prime})}\Big{\}}_{v^{\prime}\in{\mathsf% {S}}_{\sigma}}\Big{)}.caligraphic_F ( { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , { { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

We assume {\mathcal{F}}caligraphic_F is a continuous function that takes values in \mathcal{B}caligraphic_B, a connected compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and Nv=max{|(v)|:v}subscript𝑁𝑣:𝑣for-all𝑣N_{v}=\max\{|{\mathcal{I}}(v)|:\forall v\}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = roman_max { | caligraphic_I ( italic_v ) | : ∀ italic_v }. From Proposition 1, there exist continuous functions ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that

X𝕏Nv,:X=α1(xXϕ1(x)),:for-all𝑋subscript𝕏absentsubscript𝑁𝑣𝑋subscript𝛼1subscript𝑥𝑋subscriptitalic-ϕ1𝑥\forall X\in\mathbb{X}_{\leq N_{v},\mathcal{B}}:X=\alpha_{1}(\sum_{x\in X}\phi% _{1}(x)),∀ italic_X ∈ blackboard_X start_POSTSUBSCRIPT ≤ italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , caligraphic_B end_POSTSUBSCRIPT : italic_X = italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ) ,

where codom(ϕ1)2dNvcodomsubscriptitalic-ϕ1superscript2𝑑subscript𝑁𝑣\mathrm{codom}(\phi_{1})\subseteq\mathbb{R}^{2dN_{v}}roman_codom ( italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⊆ blackboard_R start_POSTSUPERSCRIPT 2 italic_d italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, that is, the latent dimension depends on d𝑑ditalic_d and Nvsubscript𝑁𝑣N_{v}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT which we denote as d(d,Nv)superscript𝑑𝑑subscript𝑁𝑣d^{{}^{\prime}}(d,N_{v})italic_d start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ). Therefore, we apply the result in Proposition 1 to sets of net values (of size at most Nvsubscript𝑁𝑣N_{v}italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT) and express {\mathcal{F}}caligraphic_F follows:

σ:({μσ}σ(𝗏σ),{{μσ}σ(v)}v𝖲σ):for-all𝜎subscriptsubscript𝜇superscript𝜎superscript𝜎subscript𝗏𝜎subscriptsubscriptsubscript𝜇superscript𝜎superscript𝜎superscript𝑣superscript𝑣subscript𝖲𝜎\displaystyle\forall{\sigma}:{\mathcal{F}}\Big{(}\{\mu_{{\sigma}^{\prime}}\}_{% {\sigma}^{\prime}\in{\mathcal{I}}({\mathsf{v}}_{\sigma})},\Big{\{}\{\mu_{{% \sigma}^{\prime}}\}_{{\sigma}^{\prime}\in{\mathcal{I}}(v^{\prime})}\Big{\}}_{v% ^{\prime}\in{\mathsf{S}}_{\sigma}}\Big{)}∀ italic_σ : caligraphic_F ( { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , { { italic_μ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =(α1(σ(𝗏σ)ϕ1(μσ)),{α1(σ(v)ϕ1(σ))}v𝖲σ)absentsubscript𝛼1subscriptsuperscript𝜎subscript𝗏𝜎subscriptitalic-ϕ1superscriptsubscript𝜇𝜎subscriptsubscript𝛼1subscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1superscript𝜎superscript𝑣subscript𝖲𝜎\displaystyle=\mathcal{F}\Big{(}\alpha_{1}\big{(}\sum_{\sigma^{\prime}\in{% \mathcal{I}}({\mathsf{v}}_{\sigma})}\phi_{1}(\mu_{\sigma}^{\prime})\big{)},% \Big{\{}\alpha_{1}\big{(}\sum_{\sigma^{\prime}\in{\mathcal{I}}(v^{\prime})}% \phi_{1}(\sigma^{\prime})\big{)}\Big{\}}_{v^{\prime}\in{\mathsf{S}}_{\sigma}}% \Big{)}= caligraphic_F ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) , { italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
=1(σ(𝗏σ)ϕ1(μσ),{σ(v)ϕ1(σ)}v𝖲σ)absentsubscript1subscriptsuperscript𝜎subscript𝗏𝜎subscriptitalic-ϕ1superscriptsubscript𝜇𝜎subscriptsubscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1superscript𝜎superscript𝑣subscript𝖲𝜎\displaystyle=\mathcal{F}_{1}\Big{(}\sum_{\sigma^{\prime}\in{\mathcal{I}}({% \mathsf{v}}_{\sigma})}\phi_{1}(\mu_{\sigma}^{\prime}),\Big{\{}\sum_{\sigma^{% \prime}\in{\mathcal{I}}(v^{\prime})}\phi_{1}(\sigma^{\prime})\Big{\}}_{v^{% \prime}\in{\mathsf{S}}_{\sigma}}\Big{)}= caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , { ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT )

where 1(x,Y)=def.(α1(x),{α1(y)}yY)\mathcal{F}_{1}(x,Y)\stackrel{{\scriptstyle\mathrm{def.}}}{{=}}\mathcal{F}(% \alpha_{1}(x),\{\alpha_{1}(y)\}_{y\in Y})caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x , italic_Y ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def . end_ARG end_RELOP caligraphic_F ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) , { italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_y ) } start_POSTSUBSCRIPT italic_y ∈ italic_Y end_POSTSUBSCRIPT ) for all xcodom(Φ1)={Φ1(Y)=yYϕ1(y):Y𝕏Nv,}𝑥codomsubscriptΦ1conditional-setsubscriptΦ1𝑌subscript𝑦𝑌subscriptitalic-ϕ1𝑦𝑌subscript𝕏absentsubscript𝑁𝑣x\in\mathrm{codom}(\Phi_{1})=\{\Phi_{1}(Y)=\sum_{y\in Y}\phi_{1}(y):Y\in% \mathbb{X}_{\leq N_{v},\mathcal{B}}\}italic_x ∈ roman_codom ( roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Y ) = ∑ start_POSTSUBSCRIPT italic_y ∈ italic_Y end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_y ) : italic_Y ∈ blackboard_X start_POSTSUBSCRIPT ≤ italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , caligraphic_B end_POSTSUBSCRIPT } and any set Y𝑌Yitalic_Y with elements in codom(Φ1)codomsubscriptΦ1\mathrm{codom}(\Phi_{1})roman_codom ( roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). Since both α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and \mathcal{F}caligraphic_F are continuous functions, then 1subscript1\mathcal{F}_{1}caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is also a continuous function. 666We can use the matching distance to define a metric on multisets; refer to (Tabaghi and Wang,, 2023) for details.

Next, we can apply the result of Proposition 1 to 1subscript1\mathcal{F}_{1}caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The elements of the set {σ(v)ϕ1(σ)}v𝖲σsubscriptsubscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1superscript𝜎superscript𝑣subscript𝖲𝜎\Big{\{}\sum_{\sigma^{\prime}\in{\mathcal{I}}(v^{\prime})}\phi_{1}(\sigma^{% \prime})\Big{\}}_{v^{\prime}\in{\mathsf{S}}_{\sigma}}{ ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT take values in codom(Φ1)codomsubscriptΦ1\mathrm{codom}(\Phi_{1})roman_codom ( roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) — a connected compact subset of d(d,Nv)superscriptsuperscript𝑑𝑑subscript𝑁𝑣\mathbb{R}^{d^{\prime}(d,N_{v})}blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT. If we assume |𝖲σ|Nσsubscript𝖲𝜎subscript𝑁𝜎|{\mathsf{S}}_{\sigma}|\leq N_{{\sigma}}| sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT | ≤ italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT for all σ𝜎{\sigma}italic_σ, Proposition 1 claims that there exist continuous functions α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT such that the following holds true:

σ:1(σ(𝗏σ)ϕ1(μσ),{σ(v)ϕ1(σ)}v𝖲σ):for-all𝜎subscript1subscriptsuperscript𝜎subscript𝗏𝜎subscriptitalic-ϕ1superscriptsubscript𝜇𝜎subscriptsubscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1superscript𝜎superscript𝑣subscript𝖲𝜎\displaystyle\forall{\sigma}:\mathcal{F}_{1}\Big{(}\sum_{\sigma^{\prime}\in{% \mathcal{I}}({\mathsf{v}}_{\sigma})}\phi_{1}(\mu_{\sigma}^{\prime}),\Big{\{}% \sum_{\sigma^{\prime}\in{\mathcal{I}}(v^{\prime})}\phi_{1}(\sigma^{\prime})% \Big{\}}_{v^{\prime}\in{\mathsf{S}}_{\sigma}}\Big{)}∀ italic_σ : caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , { ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) =1(σ(𝗏σ)ϕ1(μσ),α2(v𝖲σϕ2(σ(v)ϕ1(σ))))absentsubscript1subscriptsuperscript𝜎subscript𝗏𝜎subscriptitalic-ϕ1superscriptsubscript𝜇𝜎subscript𝛼2subscriptsuperscript𝑣subscript𝖲𝜎subscriptitalic-ϕ2subscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1superscript𝜎\displaystyle=\mathcal{F}_{1}\Bigg{(}\sum_{\sigma^{\prime}\in{\mathcal{I}}({% \mathsf{v}}_{\sigma})}\phi_{1}(\mu_{\sigma}^{\prime}),\alpha_{2}\Big{(}\sum_{v% ^{\prime}\in{\mathsf{S}}_{\sigma}}\phi_{2}\big{(}\sum_{\sigma^{\prime}\in{% \mathcal{I}}(v^{\prime})}\phi_{1}(\sigma^{\prime})\big{)}\Big{)}\Bigg{)}= caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) )
=ρ(σ(𝗏σ)ϕ1(μσ),v𝖲σϕ2(σ(v)ϕ1(σ)))absent𝜌subscriptsuperscript𝜎subscript𝗏𝜎subscriptitalic-ϕ1superscriptsubscript𝜇𝜎subscriptsuperscript𝑣subscript𝖲𝜎subscriptitalic-ϕ2subscriptsuperscript𝜎superscript𝑣subscriptitalic-ϕ1superscript𝜎\displaystyle=\rho\Bigg{(}\sum_{\sigma^{\prime}\in{\mathcal{I}}({\mathsf{v}}_{% \sigma})}\phi_{1}(\mu_{\sigma}^{\prime}),\sum_{v^{\prime}\in{\mathsf{S}}_{% \sigma}}\phi_{2}\big{(}\sum_{\sigma^{\prime}\in{\mathcal{I}}(v^{\prime})}\phi_% {1}(\sigma^{\prime})\big{)}\Bigg{)}= italic_ρ ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_I ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) )

where ρ(x,y)=def1(x,α2(y))superscriptdef𝜌𝑥𝑦subscript1𝑥subscript𝛼2𝑦\rho(x,y)\stackrel{{\scriptstyle\text{def}}}{{=}}\mathcal{F}_{1}(x,\alpha_{2}(% y))italic_ρ ( italic_x , italic_y ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG def end_ARG end_RELOP caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x , italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_y ) ) for all xcodom(Φ1)𝑥codomsubscriptΦ1x\in\mathrm{codom}(\Phi_{1})italic_x ∈ roman_codom ( roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and ycodom(Φ2)=def{zZϕ2(z):Zcodom(Φ1),|Z|Nσ}𝑦codomsubscriptΦ2superscriptdefconditional-setsubscript𝑧𝑍subscriptitalic-ϕ2𝑧formulae-sequence𝑍codomsubscriptΦ1𝑍subscript𝑁𝜎y\in\mathrm{codom}(\Phi_{2})\stackrel{{\scriptstyle\text{def}}}{{=}}\{\sum_{z% \in Z}\phi_{2}(z):Z\in\mathrm{codom}(\Phi_{1}),|Z|\leq N_{{\sigma}}\}italic_y ∈ roman_codom ( roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG def end_ARG end_RELOP { ∑ start_POSTSUBSCRIPT italic_z ∈ italic_Z end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_z ) : italic_Z ∈ roman_codom ( roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , | italic_Z | ≤ italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT }. The function ρ𝜌\rhoitalic_ρ is a composition of continuous functions 1subscript1\mathcal{F}_{1}caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Therefore, it is a continuous function. Also, we have codom(ϕ2)2d(d,Nv)Nσcodomsubscriptitalic-ϕ2superscript2superscript𝑑𝑑subscript𝑁𝑣subscript𝑁𝜎\mathrm{codom}(\phi_{2})\subseteq\mathbb{R}^{2d^{\prime}(d,N_{v})N_{{\sigma}}}roman_codom ( italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⊆ blackboard_R start_POSTSUPERSCRIPT 2 italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, that is, the latent dimension depends on d,Nv,𝑑subscript𝑁𝑣d,N_{v},italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , and Nσsubscript𝑁𝜎N_{\sigma}italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT which we denote as d′′(d,Nv,Nσ)superscript𝑑′′𝑑subscript𝑁𝑣subscript𝑁𝜎d^{\prime\prime}(d,N_{v},N_{\sigma})italic_d start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ). Finally, functions ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and ρ𝜌\rhoitalic_ρ are continuous over compact domains codom(Φ1)codomsubscriptΦ1\mathrm{codom}(\Phi_{1})roman_codom ( roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), and codom(Φ1)×codom(Φ2)codomsubscriptΦ1codomsubscriptΦ2\mathrm{codom}(\Phi_{1})\times\mathrm{codom}(\Phi_{2})roman_codom ( roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) × roman_codom ( roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Therefore, they can be continuously extended to d(d,Nv)superscriptsuperscript𝑑𝑑subscript𝑁𝑣\mathbb{R}^{d^{\prime}(d,N_{v})}blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT and d(d,Nv)×d′′(d,Nv,Nσ)superscriptsuperscript𝑑𝑑subscript𝑁𝑣superscriptsuperscript𝑑′′𝑑subscript𝑁𝑣subscript𝑁𝜎\mathbb{R}^{d^{\prime}(d,N_{v})}\times\mathbb{R}^{d^{\prime\prime}(d,N_{v},N_{% \sigma})}blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_d , italic_N start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT. ∎

Appendix C Experimental details

C.1 Dataset Statistics

We report the sizes of each design and their cell/net-degrees distribution in Table 5, and net-based demand/wirelength distributions in the Table 6. We can see that (hyper)graphs in our dataset are large and sparse, ranging from 400K to 1.3M nodes. with few outliers that have high degrees. We also summarize the more detailed distributions of net-based demand, net-based wirelength, and cell-based congestion in figure5.

C.2 Experiment Setup

We engineer the input features as Cell features and Net features. None of the following features contained placement information or are computed from placement information. We engineer cell features as follows:

  • We analyze the statistics in a design including the minimum and maximum of all cells’ width, height. Then, we normalize all these quantities to be in the range [0,1]01[0,1][ 0 , 1 ]. We concatenate them with the cell’s discrete orientation and cell’s degree (number of nets each cell connecting to), that results into the cell’s input feature vector.

  • We calculate the top-10 eigenvectors of the graph Laplacian of the heterogenous graph as the positional encoding (denoted as LapPE) for every cell and net.

  • We compute the persistence diagram (denoted as PD) for each cell-node v𝑣vitalic_v on a directed graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), as we mentioned earlier in A.2, as a common cell-node input feature vector for all baselines, to capture the local topological information.

  • We also compute the degree distribution for each cell-node v𝑣vitalic_v based on their k𝑘kitalic_k-ring undirected neighborhood Gvsubscript𝐺𝑣G_{v}italic_G start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT in the star-graph we introduced in A.2 to compute the persistence diagrams. However, we ignore the direction when we compute the degree distribution.

For the net features, for all models other than linear Transformer, we initialize the features of each net σ=(𝗏σ,𝖲σ)𝜎subscript𝗏𝜎subscript𝖲𝜎{\sigma}=({\mathsf{v}}_{\sigma},{\mathsf{S}}_{\sigma})italic_σ = ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) as the net’s degree (i.e. number of cells each net connecting to). For Linear Transformer, in order to provide better topology information, we instead initialize the features of each net σ=(𝗏σ,𝖲σ)𝜎subscript𝗏𝜎subscript𝖲𝜎{\sigma}=({\mathsf{v}}_{\sigma},{\mathsf{S}}_{\sigma})italic_σ = ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) (denoted as M(σ)𝑀𝜎M({\sigma})italic_M ( italic_σ )) as the average of all the features of the nodes this net contains, computed as,

M(σ)=1|𝖲σ|+1(m(𝗏σ)+v𝖲σm(v)).𝑀𝜎1subscript𝖲𝜎1𝑚subscript𝗏𝜎subscriptsuperscript𝑣subscript𝖲𝜎𝑚superscript𝑣M({\sigma})=\frac{1}{|{\mathsf{S}}_{\sigma}|+1}(m({\mathsf{v}}_{\sigma})+\sum_% {v^{\prime}\in{\mathsf{S}}_{\sigma}}m(v^{\prime})).italic_M ( italic_σ ) = divide start_ARG 1 end_ARG start_ARG | sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT | + 1 end_ARG ( italic_m ( sansserif_v start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ sansserif_S start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_m ( italic_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) . (9)

As also described in the main text, for GCN and GATv2, we use the bipartite graph representation of the input netlist, where there are two types of graph nodes: those corresponding to cells and those corresponding nets. If a cell is contained in a net, then there is an edge between their respective graph nodes. We use heterogenous message passing, where nodes corresponding to cells use a different message passing mechanism from nodes corresponding to nets.

For GCN, GATv2, HyperConv, and all other HNN based models, we used 4 layers with 64 dimension each layer as the setting. For Linear Transformer, we used 2 layers with 64 dimension each layer as the setting. For NetlistGNN we used 4 layers with node dimension 64 and net dimension 64 as the setting. For E-HNN/DE-HNN based models, we used 3 layers with node dimension 64.

We report the number of parameters, training time per epoch and total training epochs in the Table 4. We aim to use similar number of parameters, although currently full-DE-HNNdoes have the highest number of parameters, with NetlistGNN having the second largest. Interestingly, while they all take similar number of epochs to converge, our DE-HNN in fact is much faster per epoch, so overall takes less time to train. We used two NVIDIA A100-SXM4-80GB GPUs on Linux CentOS Stream system (Ver.8) for both single-design and cross-design experiments.

Model GCN GATv2 HyperConv Lin. Transformer NetlistGNN AllSet HMPNN HNHN base-DE-HNN full-DE-HNN
Num. of parameters 218113 251905 218625 203201 364743 348289 235137 217729 272165 383105
Time per epoch (single design) 2.77 2.91 1.06 3.72 1.13 0.79 0.69 0.62 0.49 0.65
Total num. of epochs (single design) 878 775 670 636 716 689 691 699 673 810
Time per epoch (cross design) 33.20 36.28 12.28 55.37 16.51 11.35 4.14 5.37 5.26 5.50
Total num. of epochs (cross design) 478 482 473 471 455 460 472 468 479 473
Table 4: Comparison of Model complexity for all the models we used. We compare the number of parameters, training time per epoch for single design (on average), total training epoches to converge for single design (on average), trainig time per epoch for cross design, and total training epoches to converge for cross design.
Design Cells Nets Cell-degree Net-degree
Min Max Mean STD Min Max Mean STD
Superblue1 797,938 821,523 0.0 1243 3.70 5.66 0.0 140605 3.58 155.85
Superblue2 951,166 985,117 0.0 1317 3.51 5.66 0.0 190487 3.39 192.22
Superblue3 901,254 925,667 0.0 2245 3.65 5.23 0.0 168630 3.55 175.91
Superblue5 727,341 803,681 0.0 1381 3.46 5.82 0.0 114259 3.13 128.03
Superblue6 998,122 1049,225 0.0 1689 3.57 4.03 0.0 179410 3.39 175.69
Superblue7 1319,052 1339,522 0.0 849 3.91 3.03 0.0 265765 3.85 230.24
Superblue9 810,812 830.308 0.0 1265 3.83 4.78 0.0 129541 3.74 202.42
Superblue11 923,355 954,144 0.0 1983 3.74 6.97 0.0 203194 3.63 294.36
Superblue14 604,921 627,036 0.0 1023 3.90 4.66 0.0 167911 3.76 300.30
Superblue16 671,284 696,983 0.0 1016 3.77 6.12 0.0 140741 3.63 238.58
Superblue18 459,495 468,888 0.0 1192 4.22 4.30 0.0 102047 4.14 150.74
Superblue19 495,234 510,258 0.0 1507 3.58 6.02 0.0 94682 3.48 135.31
Table 5: Dataset details & statistics. 1st column in the table shows the name of the design, 2nd column and 3rd column show the number of cells and nets in each design, 4th-7th columns show the distribution of cell-degrees for each design and 8th-11th columns show the distribution of net-degrees for each design.
Design Net-based demand Net-based wirelength Cell-based congestion
Min Max Mean STD Min Max Mean STD Min Max Mean STD
Superblue1 0.0 103.50 26.24 8.01 8.91 23.92 14.80 2.45 0.0 12.00 1.21 0.55
Superblue2 0.0 139.37 26.61 8.52 8.91 24.61 15.21 2.69 0.0 5.00 0.70 0.45
Superblue3 0.0 103.50 24.07 7.31 8.91 23.83 14.91 2.45 0.0 5.19 0.90 0.47
Superblue5 0.0 1203.25 41.62 32.00 8.91 24.06 15.68 2.96 0.0 78.92 1.27 0.94
Superblue6 0.0 980.33 33.83 27.45 8.91 23.83 14.88 2.57 0.0 74.15 1.35 1.15
Superblue7 0.0 495.25 23.89 5.34 8.91 23.93 14.89 2.48 0.0 43.38 1.09 0.33
Superblue9 0.0 79.50 22.47 8.02 8.91 23.83 14.71 2.42 0.0 5.0 0.74 0.46
Superblue11 0.0 75.00 20.05 6.54 8.91 24.17 15.11 2.38 0.0 8.0 0.93 0.36
Superblue14 0.0 401.41 23.42 9.11 8.91 23.67 15.06 2.48 0.0 46.85 1.06 0.65
Superblue16 0.0 1091.0 28.96 14.09 8.91 23.51 15.01 2.54 0.0 65.53 1.29 0.91
Superblue18 0.0 50.0 20.39 4.13 8.91 23.26 14.98 2.28 0.0 4.0 0.91 0.27
Superblue19 0.0 87.83 23.05 6.40 8.91 23.53 14.86 2.36 0.0 6.0 0.96 0.50
Table 6: Dataset details & statistics. This table shows the distribution of net-based demands for each design, distribution of net-based logged wirelength for each design and the distribution of cell-based congestion values for each design. Remind that cell-based congestion, as we described in the paper, we classify the cell-based congestion values (computed as the ratio of demand/capacity) into (a) [0, 0.9], not-congested; and (b) [0.9, infinfimum\infroman_inf]; congested.
Refer to caption
Refer to caption
Refer to caption
Figure 5: Net-based wirelength, net-based demand and cell-based congestion distributions of each design.

C.3 More experimental results

net-based wirelength regression
Model RMSE \downarrow MAE \downarrow Pearson \uparrow
base E-HNN 1.8181.8181.8181.818 1.3441.3441.3441.344 0.7310.7310.7310.731
base DE-HNN 1.7511.7511.7511.751 1.2691.2691.2691.269 0.7480.7480.7480.748
base DE-HNN+PD 1.7311.7311.7311.731 1.2571.2571.2571.257 0.7540.7540.7540.754
base DE-HNN+PD+S. VN 1.7241.7241.7241.724 1.2531.2531.2531.253 0.7580.7580.7580.758
full DE-HNN 1.6891.6891.6891.689 1.2451.2451.2451.245 0.7700.7700.7700.770
net-based demand regression
RMSE \downarrow MAE \downarrow Pearson \uparrow
9.2289.2289.2289.228 5.9595.9595.9595.959 0.5910.5910.5910.591
8.9978.9978.9978.997 5.7645.7645.7645.764 0.6300.6300.6300.630
8.7658.7658.7658.765 5.5265.5265.5265.526 0.6470.6470.6470.647
8.6878.6878.6878.687 5.5195.5195.5195.519 0.6580.6580.6580.658
8.3818.3818.3818.381 5.3345.3345.3345.334 0.6830.6830.6830.683
cell-based congestion classification
Precision \uparrow Recall \uparrow F_score \uparrow
0.8160.8160.8160.816 0.8640.8640.8640.864 0.8290.8290.8290.829
0.8240.8240.8240.824 0.8600.8600.8600.860 0.8400.8400.8400.840
0.8300.8300.8300.830 0.8690.8690.8690.869 0.8470.8470.8470.847
0.8320.8320.8320.832 0.8740.8740.8740.874 0.8510.8510.8510.851
0.8330.8330.8330.833 0.8760.8760.8760.876 0.8530.8530.8530.853
Table 7: Ablation Study: Full experimental results to show the improvements from base E-HNN to full DE-HNN.
net-based wirelength regression
Model RMSE \downarrow MAE \downarrow Pearson \uparrow
NetlistGNN with no PD 1.8181.8181.8181.818 1.3441.3441.3441.344 0.7310.7310.7310.731
NetlistGNN+PD 1.7731.7731.7731.773 1.3201.3201.3201.320 0.7400.7400.7400.740
Improvement 2.5% 1.8% 1.2%
GCN with no PD 1.8091.8091.8091.809 1.3261.3261.3261.326 0.7350.7350.7350.735
GCN+PD 1.7621.7621.7621.762 1.2761.2761.2761.276 0.7500.7500.7500.750
Improvement 1.9% 3.6% 5.2%
GATv2 with no PD 1.9201.9201.9201.920 1.4011.4011.4011.401 0.6590.6590.6590.659
GATv2+PD 1.8121.8121.8121.812 1.3301.3301.3301.330 0.6870.6870.6870.687
Improvement 0.7% 0.6% 1.6%
net-based demand regression
RMSE \downarrow MAE \downarrow Pearson \uparrow
9.2379.2379.2379.237 6.0606.0606.0606.060 0.5920.5920.5920.592
9.0639.0639.0639.063 5.8395.8395.8395.839 0.6230.6230.6230.623
1.9% 3.6% 5.2%
9.6989.6989.6989.698 6.4536.4536.4536.453 0.5470.5470.5470.547
9.3219.3219.3219.321 6.1636.1636.1636.163 0.5700.5700.5700.570
3.9% 4.5% 4.2%
9.7109.7109.7109.710 6.3926.3926.3926.392 0.5390.5390.5390.539
9.3429.3429.3429.342 6.1186.1186.1186.118 0.5610.5610.5610.561
3.8% 4.3% 4.1%
cell-based congestion classification
Precision \uparrow Recall \uparrow F_score \uparrow
0.8060.8060.8060.806 0.8550.8550.8550.855 0.8180.8180.8180.818
0.8120.8120.8120.812 0.8600.8600.8600.860 0.8310.8310.8310.831
0.7% 0.6% 1.6%
0.7460.7460.7460.746 0.8370.8370.8370.837 0.7840.7840.7840.784
0.7610.7610.7610.761 0.8570.8570.8570.857 0.8020.8020.8020.802
2.0% 2.4% 2.3%
0.8020.8020.8020.802 0.8560.8560.8560.856 0.8110.8110.8110.811
0.8100.8100.8100.810 0.8640.8640.8640.864 0.8350.8350.8350.835
1.0% 1.0% 3.0%
Table 8: Ablation Study: the effect of using persistence diagrams (PDs) to three best-performing baselines. For each method, the 3rd row shows the percentage of improvement after using PD as part of the input features. Note that in the main text, the results we reported are those baselines+PD.

In the main text, we have reported plots for ablation studies of different strategies used in our model. Here in Table 7 we report detailed numbers for all three tasks: net-based wirelength regression, net-based demand regression, and cell-based congestion classification across all the designs.

Besides ablation study over our models, as we mentioned in main paper, adding the persistence diagrams (PDs) features are beneficial for both our method and baselines. We have already shown the benefits for our model in the ablation studies in the main text and above. Here in Table 8, we show its benefits to the three best performing baseline models: NetlistGNN, GCN, GATv2. As we can see, using PDs improve all these models. Note that in the main text, when we report results of these baseline models, we are already reporting results with PDs.

We show more detailed experimental results of single-design experiments for net-based wirelength regression in Table 9, net-based regression in Table 10 and cell-based congestion classification in Table 11 for each design. We also show more results of cross-design experiments for net-based wirelength regression, net-based demand regression and cell-based congestion classification in Table 13. Similar to how we report the results in the main paper, we highlight the models’ results with the best performance in red, and highlight the models (other than our models) that with second best performance as blue. Interestingly, in the single design case, it appears that the methods AllSet and NetlistGNN often perform the best among the baselines for net-length and net-based demand regression tasks, while for the cell-based congestion classification, other models (e.g, GATv2) sometimes perform the best among baselines. In cross-design experiments, GCN and GATv2 sometimes emerge as winners. In the cases that a baseline model is better than our model, we highlight that baseline model’s results in red instead.

Preliminary exploration with placement information.

In this paper, we focus on the case when our input do not have placement information (i.e, coordinates of cells). We also conducted some preliminary experiments to examine the effect of adding cell placement information to initial node features, and use placement information based partitioning instead of Metis. We first partition the nodes in netlist circuit (after placement) into k𝑘kitalic_k bounding boxes Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with fixed width and height as 0.80.80.80.8 millimeters. The number k𝑘kitalic_k is depending on the width and height of each netlist circuit. With coordinates of cells added and bounding boxes based partitioning (for full-DE-HNN), we rerun both single-design and cross-design net-based demand regression experiments on superblue19. See Table 12 below. Placement information helps to improve the performance on average around 10% for MAE and RMSE, and the improvement is more substantial in terms of Pearson Correlation. We aim to explore more about how to effectively leverage placement information in the future works.

Design Model RMSE \downarrow MAE \downarrow Pearson \uparrow
Superblue1 GCN 1.731 1.243 0.751
GATv2 1.742 1.253 0.742
HyperConv 1.988 1.413 0.645
AllSet 1.700 1.235 0.757
HMPNN 1.779 1.291 0.727
HNHN 1.813 1.319 0.720
NetlistGNN 1.761 1.276 0.735
base DE-HNN 1.774 1.274 0.731
full DE-HNN 1.657 1.203 0.771
Improvement 2.5%percent\%% 2.6%percent\%% 1.8%percent\%%
Superblue2 GCN 2.011 1.487 0.714
GATv2 1.931 1.485 0.723
HyperConv 2.217 1.661 0.641
AllSet 1.817 1.353 0.775
HMPNN 1.981 1.480 0.723
HNHN 1.992 1.488 0.724
NetlistGNN 1.857 1.410 0.748
base DE-HNN 1.933 1.438 0.741
full DE-HNN 1.810 1.344 0.778
Improvement 0.4%percent\%% 0.7%percent\%% 0.4%percent\%%
Superblue3 GCN 1.741 1.273 0.738
GATv2 1.716 1.270 0.748
HyperConv 1.941 1.438 0.661
AllSet 1.685 1.239 0.759
HMPNN 1.752 1.300 0.734
HNHN 1.769 1.303 0.729
NetlistGNN 1.689 1.239 0.754
base DE-HNN 1.765 1.271 0.741
full DE-HNN 1.679 1.234 0.761
Improvement 0.4%percent\%% 0.4%percent\%% 0.3%percent\%%
Superblue5 GCN 1.892 1.428 0.801
GATv2 1.909 1.438 0.793
HyperConv 2.114 1.606 0.742
AllSet 1.842 1.361 0.810
HMPNN 2.224 1.754 0.706
HNHN 2.069 1.562 0.756
NetlistGNN 1.915 1.406 0.790
base DE-HNN 1.881 1.393 0.799
full DE-HNN 1.795 1.330 0.822
Improvement 2.6%percent\%% 2.3%percent\%% 1.5%percent\%%
Superblue6 GCN 1.811 1.341 0.740
GATv2 1.782 1.322 0.751
HyperConv 1.928 1.435 0.702
AllSet 1.733 1.260 0.766
HMPNN 1.851 1.360 0.730
HNHN 1.859 1.365 0.727
NetlistGNN 1.940 1.452 0.725
base DE-HNN 1.786 1.296 0.749
full DE-HNN 1.689 1.233 0.780
Improvement 2.5%percent\%% 2.1%percent\%% 1.8%percent\%%
Superblue7 GCN 1.842 1.315 0.710
GATv2 1.847 1.371 0.699
HyperConv 2.010 1.479 0.628
AllSet 1.753 1.295 0.733
HMPNN 1.889 1.409 0.681
HNHN 1.858 1.370 0.694
NetlistGNN 1.801 1.316 0.721
base DE-HNN 1.760 1.291 0.701
full DE-HNN 1.719 1.267 0.747
Improvement 1.9%percent\%% 2.2%percent\%% 1.9%percent\%%
Design Model RMSE \downarrow MAE \downarrow Pearson \uparrow
Superblue9 GCN 1.819 1.338 0.697
GATv2 1.853 1.381 0.683
HyperConv 1.966 1.435 0.639
AllSet 1.684 1.238 0.750
HMPNN 1.827 1.355 0.695
HNHN 1.821 1.335 0.701
NetlistGNN 1.805 1.325 0.701
base DE-HNN 1.765 1.287 0.720
full DE-HNN 1.663 1.226 0.757
Improvement 1.2%percent\%% 1.0%percent\%% 0.9%percent\%%
Superblue11 GCN 1.831 1.356 0.694
GATv2 1.814 1.349 0.702
HyperConv 1.986 1.468 0.633
AllSet 1.717 1.274 0.740
HMPNN 1.830 1.369 0.696
HNHN 1.834 1.353 0.697
NetlistGNN 1.780 1.291 0.707
base DE-HNN 1.741 1.290 0.731
full DE-HNN 1.690 1.257 0.751
Improvement 1.6%percent\%% 1.3%percent\%% 1.5%percent\%%
Superblue14 GCN 1.792 1.331 0.739
GATv2 1.794 1.361 0.738
HyperConv 2.116 1.547 0.605
AllSet 1.756 1.299 0.750
HMPNN 1.873 1.415 0.707
HNHN 1.895 1.404 0.703
NetlistGNN 1.812 1.363 0.731
base DE-HNN 1.816 1.336 0.730
full DE-HNN 1.728 1.282 0.760
Improvement 1.6%percent\%% 1.3%percent\%% 1.3%percent\%%
Superblue16 GCN 1.763 1.265 0.741
GATv2 1.741 1.267 0.751
HyperConv 2.047 1.446 0.639
AllSet 1.688 1.207 0.772
HMPNN 1.896 1.362 0.701
HNHN 1.816 1.312 0.731
NetlistGNN 1.773 1.274 0.736
base DE-HNN 1.705 1.218 0.767
full DE-HNN 1.656 1.194 0.782
Improvement 1.9%percent\%% 1.1%percent\%% 1.3%percent\%%
Superblue18 GCN 1.635 1.249 0.739
GATv2 1.701 1.246 0.714
HyperConv 1.937 1.345 0.681
AllSet 1.664 1.246 0.730
HMPNN 1.769 1.335 0.686
HNHN 1.752 1.314 0.696
NetlistGNN 1.625 1.213 0.752
base DE-HNN 1.653 1.263 0.735
full DE-HNN 1.632 1.219 0.743
Improvement 0.4%percent\%% 0.5%percent\%% 1.2%percent\%%
Superblue19 GCN 1.637 1.214 0.762
GATv2 1.634 1.198 0.765
HyperConv 1.910 1.405 0.666
AllSet 1.582 1.171 0.783
HMPNN 1.705 1.255 0.739
HNHN 1.752 1.288 0.727
NetlistGNN 1.635 1.205 0.765
base DE-HNN 1.641 1.208 0.763
full DE-HNN 1.557 1.153 0.792
Improvement 1.6%percent\%% 1.5%percent\%% 1.1%percent\%%
Table 9: Average results of single-design net-based hpwl(wirelength) regression for each design, based on 4-fold cross validations. Last row “Improvement” refers to the improvement of our full DE-HNN model over the best baseline approach for each metric.
Design Model RMSE \downarrow MAE \downarrow Pearson \uparrow
Superblue1 GCN 6.469 4.979 0.595
GATv2 6.409 4.964 0.612
HyperConv 6.662 4.951 0.224
AllSet 6.100 4.587 0.650
HMPNN 6.770 5.206 0.541
HNHN 6.394 4.825 0.610
Lin. Transformers 7.991 6.046 0.089
NetlistGNN 6.039 4.623 0.660
base DE-HNN 6.093 4.670 0.653
full DE-HNN 5.674 4.263 0.709
Improvement 6.0%percent\%% 7.1%percent\%% 7.4%percent\%%
Superblue2 GCN 6.556 5.245 0.641
GATv2 6.736 5.288 0.616
HyperConv 7.654 6.151 0.374
AllSet 6.430 4.981 0.659
HMPNN 6.982 5.501 0.579
HNHN 6.699 5.217 0.625
Lin. Transformers 8.251 6.356 0.313
NetlistGNN 6.259 4.932 0.682
base DE-HNN 6.399 5.035 0.663
full DE-HNN 5.966 4.637 0.718
Improvement 4.7%percent\%% 6.0%percent\%% 5.3%percent\%%
Superblue3 GCN 5.789 4.512 0.612
GATv2 5.837 4.558 0.565
HyperConv 6.468 5.149 0.265
AllSet 5.265 4.014 0.695
HMPNN 6.022 4.713 0.572
HNHN 5.686 4.338 0.631
Lin. Transformers 7.046 5.493 0.264
NetlistGNN 5.414 4.214 0.673
base DE-HNN 5.423 4.188 0.670
full DE-HNN 5.041 3.837 0.725
Improvement 4.3%percent\%% 4.4%percent\%% 4.3%percent\%%
Superblue5 GCN 27.169 14.867 0.504
GATv2 27.343 14.754 0.508
HyperConv 29.563 15.817 0.107
AllSet 27.881 14.632 0.490
HMPNN 28.753 15.485 0.439
HNHN 28.314 15.309 0.473
Lin. Transformers 32.614 16.889 0.076
NetlistGNN 27.586 14.470 0.515
base DE-HNN 27.205 14.129 0.536
full DE-HNN 26.684 13.512 0.565
Improvement 1.8%percent\%% 6.6%percent\%% 9.7%percent\%%
Superblue6 GCN 21.963 12.692 0.607
GATv2 21.492 12.119 0.629
HyperConv 25.615 13.356 0.128
AllSet 17.945 10.156 0.759
HMPNN 21.868 12.633 0.611
HNHN 17.735 10.094 0.767
Lin. Transformers 28.807 14.583 0.119
NetlistGNN 20.238 11.696 0.697
base DE-HNN 19.935 11.227 0.694
full DE-HNN 16.946 9.680 0.790
Improvement 4.4%percent\%% 4.1%percent\%% 3.0%percent\%%
Superblue7 GCN 4.243 3.064 0.600
GATv2 4.403 3.247 0.561
HyperConv 4.689 3.327 0.225
AllSet 4.201 2.991 0.621
HMPNN 4.527 3.246 0.541
HNHN 4.458 3.165 0.557
Lin. Transformers 5.245 3.752 0.139
NetlistGNN 4.115 2.986 0.634
base DE-HNN 4.110 2.957 0.631
full DE-HNN 3.971 2.860 0.662
Improvement 3.5%percent\%% 4.2%percent\%% 4.4%percent\%%
Design Model RMSE \downarrow MAE \downarrow Pearson \uparrow
Superblue9 GCN 6.871 5.156 0.520
GATv2 6.893 5.139 0.512
HyperConv 7.014 5.241 0.289
AllSet 6.184 4.576 0.640
HMPNN 7.152 5.367 0.467
HNHN 6.544 4.884 0.582
Lin. Transformers 8.007 5.934 0.092
NetlistGNN 6.511 4.796 0.589
base DE-HNN 2.990 2.228 0.696
full DE-HNN 5.685 4.237 0.709
Improvement 8.1%percent\%% 7.4%percent\%% 10.8%percent\%%
Superblue11 GCN 5.693 4.224 0.502
GATv2 5.684 4.203 0.504
HyperConv 6.243 4.523 0.105
AllSet 5.115 3.792 0.625
HMPNN 5.974 4.402 0.418
HNHN 5.277 3.882 0.592
Lin. Transformers 6.576 4.678 0.034
NetlistGNN 5.176 3.830 0.617
base DE-HNN 5.214 3.855 0.608
full DE-HNN 4.918 3.677 0.666
Improvement 3.9%percent\%% 3.0%percent\%% 6.6%percent\%%
Superblue14 GCN 7.261 4.827 0.584
GATv2 7.370 4.825 0.545
HyperConv 8.210 5.241 0.210
AllSet 7.162 4.565 0.619
HMPNN 7.687 4.992 0.540
HNHN 7.327 4.693 0.597
Lin. Transformers 8.853 6.168 0.176
NetlistGNN 6.872 4.444 0.642
base DE-HNN 6.874 4.453 0.639
full DE-HNN 6.533 4.211 0.684
Improvement 4.9%percent\%% 5.2%percent\%% 6.5%percent\%%
Superblue16 GCN 11.774 8.242 0.391
GATv2 12.853 8.283 0.377
HyperConv 16.501 9.486 0.175
AllSet 12.558 7.837 0.469
HMPNN 13.539 8.582 0.312
HNHN 12.720 8.082 0.446
Lin. Transformers 14.020 8.827 0.003
NetlistGNN 12.982 8.385 0.353
base DE-HNN 12.282 7.946 0.465
full DE-HNN 11.867 7.644 0.520
Improvement - 2.5%percent\%% 10.9%percent\%%
Superblue18 GCN 3.061 2.262 0.681
GATv2 3.102 2.285 0.672
HyperConv 4.013 2.915 0.255
AllSet 3.057 2.294 0.674
HMPNN 3.246 2.446 0.624
HNHN 3.208 2.377 0.637
Lin. Transformers 4.090 2.913 0.154
NetlistGNN 2.882 2.173 0.726
base DE-HNN 2.990 2.228 0.696
full DE-HNN 2.855 2.136 0.730
Improvement 0.9%percent\%% 1.7%percent\%% 0.6%percent\%%
Superblue19 GCN 5.034 3.734 0.616
GATv2 4.949 3.691 0.636
HyperConv 5.746 3.974 0.312
AllSet 4.682 3.474 0.685
HMPNN 5.294 3.980 0.571
HNHN 5.063 3.750 0.620
Lin. Transformers 6.315 4.565 0.127
NetlistGNN 4.683 3.520 0.681
base DE-HNN 4.946 3.720 0.632
full DE-HNN 4.429 3.317 0.723
Improvement 5.4%percent\%% 4.5%percent\%% 5.5%percent\%%
Table 10: Results of single-design net-based demand regression for each design.
Design Model Precision \uparrow Recall \uparrow F_score \uparrow
Superblue1 GCN 0.839 0.944 0.888
GATv2 0.867 0.944 0.904
HyperConv 0.873 0.966 0.917
AllSet 0.880 0.955 0.916
HMPNN 0.866 0.968 0.916
HNHN 0.868 0.969 0.916
Lin. Transformers 0.853 0.941 0.895
NetlistGNN 0.862 0.936 0.920
base DE-HNN 0.876 0.967 0.920
full DE-HNN 0.885 0.969 0.925
Improvement 1.6%percent\%% 0.5%percent\%% 0.5%percent\%%
Superblue2 GCN 0.741 0.657 0.697
GATv2 0.782 0.739 0.760
HyperConv 0.779 0.706 0.741
AllSet 0.727 0.664 0.694
HMPNN 0.730 0.587 0.649
HNHN 0.718 0.633 0.670
Lin. Transformers 0.752 0.530 0.621
NetlistGNN 0.765 0.614 0.682
base DE-HNN 0.796 0.717 0.755
full DE-HNN 0.797 0.767 0.782
Improvement 1.2%percent\%% 9.7%percent\%% 2.9%percent\%%
Superblue3 GCN 0.731 0.837 0.780
GATv2 0.768 0.840 0.798
HyperConv 0.770 0.815 0.792
AllSet 0.728 0.773 0.747
HMPNN 0.711 0.777 0.739
HNHN 0.706 0.777 0.737
Lin. Transformers 0.749 0.757 0.753
NetlistGNN 0.786 0.814 0.799
base DE-HNN 0.791 0.819 0.805
full DE-HNN 0.817 0.816 0.816
Improvement 0.7%percent\%% - 2.1%percent\%%
Superblue5 GCN 0.745 0.932 0.834
GATv2 0.783 0.923 0.848
HyperConv 0.827 0.935 0.878
AllSet 0.795 0.939 0.861
HMPNN 0.788 0.932 0.853
HNHN 0.786 0.932 0.852
Lin. Transformers 0.798 0.911 0.851
NetlistGNN 0.844 0.933 0.885
base DE-HNN 0.842 0.938 0.887
full DE-HNN 0.852 0.940 0.894
Improvement 4.9%percent\%% 1.3%percent\%% 1.0%percent\%%
Superblue6 GCN 0.837 0.921 0.877
GATv2 0.876 0.920 0.897
HyperConv 0.851 0.916 0.891
AllSet 0.817 0.940 0.874
HMPNN 0.809 0.965 0.879
HNHN 0.815 0.947 0.875
Lin. Transformers 0.833 0.906 0.868
NetlistGNN 0.819 0.928 0.889
base DE-HNN 0.859 0.928 0.892
full DE-HNN 0.885 0.930 0.906
Improvement 0.7%percent\%% - 1.0%percent\%%
Superblue7 GCN 0.839 0.980 0.904
GATv2 0.863 0.981 0.920
HyperConv 0.899 0.967 0.932
AllSet 0.888 0.956 0.921
HMPNN 0.874 0.962 0.916
HNHN 0.875 0.955 0.913
Lin. Transformers 0.792 0.870 0.891
NetlistGNN 0.868 0.918 0.923
base DE-HNN 0.900 0.938 0.887
full DE-HNN 0.908 0.969 0.937
Improvement 5.6%percent\%% - 0.5%percent\%%
Design Model Precision \uparrow Recall \uparrow F_score \uparrow
Superblue9 GCN 0.684 0.556 0.613
GATv2 0.719 0.613 0.666
HyperConv 0.716 0.605 0.656
AllSet 0.675 0.505 0.577
HMPNN 0.612 0.418 0.495
HNHN 0.664 0.447 0.529
Lin. Transformers 0.649 0.551 0.596
NetlistGNN 0.778 0.568 0.656
base DE-HNN 0.740 0.647 0.690
full DE-HNN 0.695 0.653 0.673
Improvement - 6.5%percent\%% 1.1%percent\%%
Superblue11 GCN 0.634 0.896 0.743
GATv2 0.706 0.844 0.769
HyperConv 0.644 0.910 0.755
AllSet 0.620 0.946 0.749
HMPNN 0.627 0.927 0.748
HNHN 0.628 0.897 0.738
Lin. Transformers 0.671 0.691 0.680
NetlistGNN 0.691 0.914 0.787
base DE-HNN 0.677 0.863 0.759
full DE-HNN 0.719 0.850 0.789
Improvement 1.1%percent\%% - 0.3%percent\%%
Superblue14 GCN 0.763 0.891 0.822
GATv2 0.834 0.889 0.860
HyperConv 0.830 0.886 0.857
AllSet 0.809 0.863 0.835
HMPNN 0.819 0.858 0.838
HNHN 0.800 0.855 0.826
Lin. Transformers 0.735 0.764 0.749
NetlistGNN 0.827 0.870 0.787
base DE-HNN 0.856 0.876 0.866
full DE-HNN 0.856 0.902 0.878
Improvement 3.7%percent\%% 10.6%percent\%% 2.1%percent\%%
Superblue16 GCN 0.713 0.926 0.807
GATv2 0.864 0.928 0.894
HyperConv 0.855 0.833 0.844
AllSet 0.844 0.827 0.833
HMPNN 0.838 0.821 0.829
HNHN 0.831 0.819 0.822
Lin. Transformers 0.810 0.831 0.815
NetlistGNN 0.779 0.912 0.865
base DE-HNN 0.874 0.823 0.847
full DE-HNN 0.895 0.910 0.903
Improvement 8.5%percent\%% - 1.0%percent\%%
Superblue18 GCN 0.779 0.845 0.811
GATv2 0.798 0.848 0.822
HyperConv 0.790 0.888 0.836
AllSet 0.753 0.892 0.816
HMPNN 0.763 0.888 0.821
HNHN 0.749 0.881 0.810
Lin. Transformers 0.775 0.852 0.812
NetlistGNN 0.868 0.939 0.902
base DE-HNN 0.798 0.890 0.842
full DE-HNN 0.807 0.886 0.845
Improvement - - -
Superblue19 GCN 0.812 0.894 0.851
GATv2 0.857 0.899 0.878
HyperConv 0.879 0.877 0.878
AllSet 0.870 0.866 0.868
HMPNN 0.861 0.862 0.862
HNHN 0.865 0.835 0.848
Lin. Transformers 0.809 0.880 0.843
NetlistGNN 0.869 0.946 0.856
base DE-HNN 0.883 0.885 0.884
full DE-HNN 0.895 0.910 0.903
Improvement 0.4%percent\%% - 2.8%percent\%%
Table 11: Results of single-design cell-based congestion classification for each design.
Single-Design without placement Single-Design with placement Cross-Design without placement Cross-Design with placement
Model RMSE MAE Pearson RMSE (imp.) MAE (imp.) Pearson (imp.) RMSE MAE Pearson RMSE (imp.) MAE (imp.) Pearson (imp.)
GCN 5.034 3.734 0.616 4.495 (10.7%) 3.334 (10.7%) 0.717 (16.4%) 6.571 5.024 0.365 6.126 (6.8%) 4.709 (6.3%) 0.440 (20.5%)
GATv2 4.949 3.691 0.636 4.382 (11.4%) 3.112 (15.7%) 0.758 (19.2%) 6.623 5.137 0.363 5.812 (10.8%) 4.695 (8.6%) 0.442 (21.8%)
full DE-HNN 4.429 3.317 0.723 4.005 (9.6%) 2.987 (10.0%) 0.785 (8.6%) 6.037 4.670 0.372 5.795 (4.0%) 4.337 (7.1%) 0.452 (21.5%)
Table 12: Results of net-based demand regression for Superblue19. For each metric, the (imp.) refers to the improvements when placement information added.
Wirelength Regression
Design Model RMSE \downarrow MAE \downarrow Pearson \uparrow
Superblue19 GCN 1.691 1.276 0.746
GATv2 1.717 1.281 0.737
Lin. Transformer 2.159 1.588 0.521
NetlistGNN 1.762 1.324 0.718
HyperConv 2.390 1.788 0.558
Allset 1.837 1.348 0.695
HMPNN 1.785 1.335 0.710
HNHN 1.754 1.333 0.701
base DE-HNN 1.731 1.291 0.730
full DE-HNN 1.677 1.242 0.754
Improvement 1.9%percent\%% 2.6%percent\%% 1.8%percent\%%
Demand Regression
RMSE \downarrow MAE \downarrow Pearson \uparrow
6.571 5.024 0.365
6.623 5.137 0.363
6.564 4.819 0.086
8.328 6.839 0.367
8.569 5.294 0.241
6.120 4.820 0.345
6.979 5.356 0.306
6.390 4.870 0.358
6.778 5.085 0.337
6.037 4.670 0.372
1.4%percent\%% 4.1%percent\%% 1.4%percent\%%
Congestion Classification
Precision \uparrow Recall \uparrow F_score \uparrow
0.633 0.997 0.773
0.630 0.999 0.765
0.618 0.859 0.772
0.647 0.953 0.771
0.655 0.923 0.778
0.645 0.964 0.773
0.633 0.999 0.773
0.648 0.939 0.767
0.653 0.990 0.774
0.660 0.986 0.780
0.7%percent\%% - 0.3%percent\%%
Table 13: Results of cross-design net-based hpwl(wirelength) regression, net-based demand regression and cell-based congestion classification for different netlist design, including comparisons with other HNN models.