\NewEnviron

doweprint[1]

Edge Classification on Graphs:
New Directions in Topological Imbalance

Xueqi Cheng Vanderbilt University xueqi.cheng@vanderbilt.edu Yu Wang11footnotemark: 1 Vanderbilt University yu.wang.1@vanderbilt.edu Yunchao (Lance) Liu Vanderbilt University yunchao.liu@vanderbilt.edu Yuying Zhao Vanderbilt University yuying.zhao@vanderbilt.edu Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com  and  Tyler Derr Vanderbilt University tyler.derr@vanderbilt.edu
(June 2024)
Abstract.

Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement with the progress of GML. To address this gap, our study pioneers a comprehensive approach to edge classification. We identify a novel ‘Topological Imbalance Issue’, which arises from the skewed distribution of edges across different classes, affecting the local subgraph of each edge and harming the performance of edge classifications. Inspired by the recent studies in node classification that the performance discrepancy exists with varying local structural patterns, we aim to investigate if the performance discrepancy in topological imbalanced edge classification tasks can also be mitigated by characterizing the local class distribution variance. To overcome this challenge, we introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge. Our empirical studies confirm that TE effectively measures local class distribution variance, and indicate that prioritizing edges with high TE values can help address the issue of topological imbalance. Inspired by this observation, we develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to adaptively focus training on (synthetic) edges based on their TEs. While topological reweighting directly manipulates training edge weights according to TE, our wedge-based mixup interpolates synthetic edges between high TE wedges. To further enhance performance, we integrate these strategies into a novel topological imbalance strategy for edge classification: TopoEdge. Extensive experiments on real-world datasets demonstrate the efficacy of our proposed strategies. Additionally, our curated datasets and designed experimental settings establish a new benchmark for future edge classification research, particularly in addressing imbalance issues. Our code and data are available at https://github.com/XueqiC/TopoEdge.

Edge classification, topological imbalance, graph neural network
doi: XXXXXXX.XXXXXXXccs: Computing methodologies Supervised learning

1. Introduction

Graph Machine Learning (GML) has recently achieved unprecedented success in numerous real-world tasks over graph-structured data, including node/graph classification  (Yu et al., 2020; Wu et al., 2021; Errica et al., 2019) and link prediction (Zhang and Chen, 2018; Ai et al., 2022; Wang et al., 2023b). However, the edge classification task, which is crucial for many practical applications, such as categorizing protein interactions or social media connections (Kwak et al., 2010; Pandey et al., 2019; Jha et al., 2022), remains extensively under-explored within the GML community.

Previous approaches for edge classification in GML aim to classify edges based on the graph structure and node/edge features. Evolving from the pioneering heuristic-based edge classification method (Aggarwal et al., 2016) which assigns edge labels based on the labels of their similar counterparts, recent work has focused on learning powerful edge embeddings to better leverage the structural and attribute information in graphs. One direction is the shallow embedding techniques (Bielak et al., 2022; Wang et al., 2020b, 2023a) where the random walk and deep auto-encoding methods have been adopted to capture graph structural information. Another direction lies in applying Graph Neural Network (GNN) (Kipf and Welling, 2016; Veličković et al., 2017; Tang et al., 2019) to learn edge embeddings by either firstly learning node-level embeddings aggregated from their local subgraphs or directly generating edge embeddings (Kim et al., 2019; Gong and Cheng, 2019) Despite their effectiveness, they all neglect the imbalance in edge classification, which could significantly compromise the classification performance and restrict their practical utility. For example, failure to recognize fraudulent behaviors in e-commerce networks, which are much less commonly seen, may result in enormous monetary losses.

Even though numerous strategies have been developed for addressing the imbalance problem, most of them are designed for general machine learning tasks (Guo et al., 2008; Chawla et al., 2002) or specifically for node/graph classification (Wang et al., 2022; Zhao et al., 2021; Wang et al., 2021a) in GML with no tailored modification towards edge classification. As seen in Figure 1, applying one of the most well-established imbalance mitigation strategies, quantity reweighting, surprisingly results in inferior performance on three datasets across two GNN backbones to the original one. This motivates us to go beyond considering the edge imbalance issue merely from the quantitative perspective and more focus on the topological perspective. Actually, in node classification, previous works (Zhu et al., 2021; Mao et al., 2023) have discovered a strong correlation between the local topology pattern of a node and its GNNs’ classification performance, which leads to the performance discrepancy among nodes with varying local topology. In parallel, research in imbalanced node/graph classification (Chen et al., 2021; Song et al., 2022; Wang et al., 2022) also discovered a novel imbalance at the topology level and found its close correlation to the node/graph classification performance. Arising from the surprising findings in Figure 1 and further based on experiences from prior works (Zhu et al., 2021; Mao et al., 2023; Chen et al., 2021; Song et al., 2022; Wang et al., 2022), we hypothesize that the topology imbalance issue also exists in edge classification tasks and edges with varying local topology patterns would also exhibit varying classification performance.

To verify the above hypothesis, we first propose a metric, Topological Entropy (TE), to quantify the topological imbalance for edges based on the label distribution of incident edges in their local subgraphs. After empirically verifying the correlation between the training difficulty of edges and their TE values (Figure 2-3), we design a topologically reweighting strategy to adaptively weigh the training of edges based on their TE. In addition, to address the limited supervision provided by few minority edges and further enhance the generalizability of our proposed method, following the mixup strategy in (Zhai et al., 2022), we develop a novel topological wedge-based mixup strategy that generates synthetic edges within wedges that are selected based on high TE edges. Finally, we adaptively integrate the two strategies into a novel topological imbalance strategy for edge classification, TopoEdge, to further boost the performance. To demonstrate the effectiveness of our proposed strategies, we first construct a comprehensive imbalance edge classification evaluation benchmark consisting of diverse real-world edge classification datasets that exhibit skewed label distributions. The results reveal TopoEdge is able to significantly improve edge classification performance across various GNN backbones in various settings.

Our main contributions are summarized as follows:

  • Uncovering a Novel Research Paradigm in Edge Classification: Through a meticulous exploration into the domain of edge classification, we identified a notable and previously unexplored challenge: topological imbalance. To measure this imbalance with precision, we introduced the innovative Topological Entropy (TE) method, which quantitatively assesses the extent of topological imbalance present within edges’ local subgraphs.

  • Algorithmic Design to Address Topological Imbalance: Given the challenges posed by the topological imbalance in edge classification, we pioneered a comprehensive algorithmic solution. This first involves a topological reweighting technique, aiming to enhance the training of high TE edges. Complementing this, we devised a unique TE wedge-based mixup approach, which integrates the generation of synthetic training examples derived from high-entropy edges, thereby reinforcing the training efficacy and alleviating the identified imbalance. Ultimately, we combine the above two strategies into a novel Topological imbalance strategy, TopoEdge, for edge classification.

  • SOTA Performance on Real-world Datasets: To validate the practical implications of our method, we assembled an edge classification benchmark suite with datasets coming from diverse real-world scenarios/applications. Our experiments showcase the ability of the proposed TE-based solutions to help mitigate the imbalance issues in edge classification, and the proposed TopoEdge nearly always surpasses existing methods.

2. Notations and Preliminary

Let G=(𝒱,,𝐗,𝐒)𝐺𝒱𝐗𝐒G=(\mathcal{V},\mathcal{E},\mathbf{X},\mathbf{S})italic_G = ( caligraphic_V , caligraphic_E , bold_X , bold_S ) be an undirected attributed graph, where 𝒱={vi}i=1n𝒱superscriptsubscriptsubscript𝑣𝑖𝑖1𝑛\mathcal{V}=\{v_{i}\}_{i=1}^{n}caligraphic_V = { italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the set of n𝑛nitalic_n nodes (i.e., n=|𝒱|𝑛𝒱n=|\mathcal{V}|italic_n = | caligraphic_V |) and 𝒱×𝒱𝒱𝒱\mathcal{E}\subseteq\mathcal{V}\times\mathcal{V}caligraphic_E ⊆ caligraphic_V × caligraphic_V is the set of m𝑚mitalic_m observed training edges (i.e., m=||𝑚m=|\mathcal{E}|italic_m = | caligraphic_E |) with eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT denoting the edge between the node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. 𝐗n×d𝐗superscript𝑛𝑑\mathbf{X}\in\mathbb{R}^{n\times d}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT represents the node feature matrix with dimension d𝑑ditalic_d and 𝐒m×d𝐒superscript𝑚superscript𝑑\mathbf{S}\in\mathbb{R}^{m\times d^{\prime}}bold_S ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT represents the edge feature matrix with dimension dsuperscript𝑑d^{\prime}italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The observed m𝑚mitalic_m training edges compose the adjacency matrix 𝐀{0,1}n×n𝐀superscript01𝑛𝑛\mathbf{A}\in\{0,1\}^{n\times n}bold_A ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT with 𝐀ij=1subscript𝐀𝑖𝑗1\mathbf{A}_{ij}=1bold_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 if an observed edge exists between node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝐀ij=0subscript𝐀𝑖𝑗0\mathbf{A}_{ij}=0bold_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0 otherwise, which serves for the computational graph for message-passing. The diagonal matrix of node degree is notated as 𝐃n×n𝐃superscript𝑛𝑛\mathbf{D}\in\mathbb{Z}^{n\times n}bold_D ∈ blackboard_Z start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT with the degree of node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being di=𝐃ii=j=1n𝐀ijsubscript𝑑𝑖subscript𝐃𝑖𝑖superscriptsubscript𝑗1𝑛subscript𝐀𝑖𝑗d_{i}=\mathbf{D}_{ii}=\sum_{j=1}^{n}\mathbf{A}_{ij}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_D start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. The neighborhood node set of node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is denoted as 𝒩isubscript𝒩𝑖\mathcal{N}_{i}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In edge classification, given the edge class label matrix 𝐘{0,1}n×n×C𝐘superscript01𝑛𝑛𝐶\mathbf{Y}\in\{0,1\}^{n\times n\times C}bold_Y ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n × italic_n × italic_C end_POSTSUPERSCRIPT with C𝐶Citalic_C being the total number of edge classes and denote {k}k=1Csuperscriptsubscriptsubscript𝑘𝑘1𝐶\{\mathcal{E}_{k}\}_{k=1}^{C}{ caligraphic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT as the set of labeled edges in class k𝑘kitalic_k, we aim to train an edge classifier based on edges in the training set TrsuperscriptTr\mathcal{E}^{\text{Tr}}caligraphic_E start_POSTSUPERSCRIPT Tr end_POSTSUPERSCRIPT to correctly classify edges in the validation/test set Val/TestsuperscriptVal/Test\mathcal{E}^{\text{Val/Test}}caligraphic_E start_POSTSUPERSCRIPT Val/Test end_POSTSUPERSCRIPT across all the classes.

Refer to caption
(a)
Refer to caption
(b)
Figure 1. After equipping GNNs with off-the-shelf quantity reweight (Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT), GCN achieves worse overall Macro-F1 shown in (a) and similarly observed with GAT in (b).

3. Topological Imbalance and Topological Entropy

In this section, we first empirically demonstrate that considering the edge imbalance issue merely from the quantitative perspective may result in inferior performance in edge classification. This observation motivates the investigate of edge-level imbalance from the topology perspective. Inspired by prior works that have shown varying local subgraph patterns of nodes cause varying classification performance(Mao et al., 2023; Zhu et al., 2021; Chen et al., 2021; Song et al., 2022), we propose a new type of imbalance in edge classification, topological imbalance, which is defined as the imbalanced local subgraph patterns around each edge that can be quantified by local class distribution variance. Hence we develop a metric, Topological Entropy (TE), to quantify the local class distribution variance, and further verify its relationship with performance discrepancy. Based on the observations, we develop Topological Reweight to emphasize edges with higher TE, and empirical experiments show the effectiveness of our proposed method.

Refer to caption
Figure 2. We visualize the validation F1 score for majority and minority class of GCN and GCN+Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT across different edge categories on Epinions dataset. Specifically, edge categories are defined by the two node endpoints being categorized as mostly majority (M), mostly minority (m), or uncertain (U), and we visualize the distribution of edges within the dataset.

3.1. Motivation

To mitigate the imbalance issue in edge classification, the most direct approach is to implement strategies to mitigate quantity imbalance issues. Hence, we first integrate one of the well-established quantity imbalance techniques, quantity reweight (Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT(Huang et al., 2016), with a well-known Graph Neural Network (GNN) architecture, the Graph Convolutional Network (GCN) (Kipf and Welling, 2016), and then conduct edge classification across three datasets. More specifically, for edge eijTrsubscript𝑒𝑖𝑗superscript𝑇𝑟e_{ij}\in\mathcal{E}^{Tr}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT that belongs to class k𝑘kitalic_k (i.e., 𝐘ijk=1subscript𝐘𝑖𝑗𝑘1\mathbf{Y}_{ijk}=1bold_Y start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT = 1), the quantity reweight wijqsuperscriptsubscript𝑤𝑖𝑗𝑞w_{ij}^{q}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT that is applied to it during the training phase to emphasize the minority class is defined as follows:

(1) wijq=|Tr||{k}k=1C|superscriptsubscript𝑤𝑖𝑗𝑞superscriptTrsuperscriptsubscriptsubscript𝑘𝑘1𝐶w_{ij}^{q}=\frac{|\mathcal{E}^{\text{Tr}}|}{|\{\mathcal{E}_{k}\}_{k=1}^{C}|}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT = divide start_ARG | caligraphic_E start_POSTSUPERSCRIPT Tr end_POSTSUPERSCRIPT | end_ARG start_ARG | { caligraphic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | end_ARG

where |Tr|superscriptTr|\mathcal{E}^{\text{Tr}}|| caligraphic_E start_POSTSUPERSCRIPT Tr end_POSTSUPERSCRIPT | is the number of training edges, and |{k}k=1C|superscriptsubscriptsubscript𝑘𝑘1𝐶|\{\mathcal{E}_{k}\}_{k=1}^{C}|| { caligraphic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | is the number of labeled edges in class k𝑘kitalic_k.

As shown in Figure 1(a) and  1(b), the overall Macro-F1 decreases after equipping quantity reweight strategy to backbones, indicating that this strategy alone which merely addresses the global class distribution is insufficient to achieve consistent performance improvement across all classes. This observation compels a thorough examination of the performance disparity among edges through a local lens. To undertake this detailed investigation, as edges are participated by their end nodes, we propose categorizing edges according to the local class distribution (LCD) of their end nodes.

The categorization process begins with the identification of the edge’s end nodes and the computation of the ratio of majority edges within their immediate neighborhoods. Utilizing a predefined threshold, the process categorized each node as mostly majority (M), mostly minority (m), or uncertain (U) based on the corresponding majority ratio. Subsequently, the edge’s category is a combination of its end nodes’ categories, giving a concise representation of the edge’s local class context in the graph. For example, for edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, if visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s category is M, vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT category is m, then edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT’s category is set as Mm. This nuanced approach effectively captures the edge’s local topological context and allows us to analyze the graph’s structure in detail, revealing insights into the underlying connectivity patterns. To simplify, the algorithm is designed only for binary classification.

Utilizing the categorization framework delineated earlier, we conduct an in-depth analysis to assess the classification performance across different edge categories, examining both the original GCN model and its extension with Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. As illustrated in Figure 2, the incorporation of Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT predominantly benefits the minority class, albeit at the expense of diminishing the performance within the majority class for most categories.

More specifically, for the majority class, the most insignificant performance decrease is observed in MM, where the backbone model’s performance is nearly perfect, thus applying Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT for edges in this category will negligibly impact overall efficacy. However, in categories characterized by substantial variation in local class distribution (MU, Mm, UU, Um) — with the exception of UU, which comprises a minor quantity of majority edges (counts 10absent10\approx 10≈ 10) — we observe a pronounced decline in performance. Intriguingly, for category mm, the application of Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is conducive to performance enhancement. For the minority class, the most significant improvement is gained when the end nodes’ categories are both m𝑚mitalic_m, and then the improvement is decreased across categories MU, Mm, UU, Um, where the end nodes’ local distribution varies a lot. In mm, the Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT can no longer improve its performance since the model has reached its latent capacity and performs best.

Based on the above, to strategically design a method on top of Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT to increase the overall edge classification performance, we need to further emphasize the edges with large local class distribution variance. This leads to the next section where we design a novel metric to quantify the level of edges’ local class distribution variance.

3.2. Topological Entropy (TE)

Inspired by the recent work in imbalanced node classification (Chen et al., 2021; Song et al., 2022), the local distribution variance can be characterized using the label diffusion method. As label diffusion is based on the receptive field in the computational graph, it is a better approximation of the GNNs message passing scheme, hence the local distribution variance characterized by label diffusion will provide more insight than characterized purely based on label distribution in end nodes’ one-hop neighborhood.

For the metric to quantify the local distribution variance, first we obtain label encodings of each node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from its incident edges {eij|vj𝒩i}conditional-setsubscript𝑒𝑖𝑗subscript𝑣𝑗subscript𝒩𝑖\{e_{ij}|v_{j}\in\mathcal{N}_{i}\}{ italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } by:

(2) 𝐡i=𝔼vj𝒩i𝐘ij,subscript𝐡𝑖subscript𝔼similar-tosubscript𝑣𝑗subscript𝒩𝑖subscript𝐘𝑖𝑗\mathbf{h}_{i}=\mathbb{E}_{v_{j}\sim\mathcal{N}_{i}}{\mathbf{Y}_{ij}},bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_Y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ,

where 𝐡iCsubscript𝐡𝑖superscript𝐶\mathbf{h}_{i}\in\mathbb{R}^{C}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT denotes the probability distribution of the node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s association to the edge classes according to their incident edges. To consider higher-order neighborhood impact, we obtain the Lthsuperscript𝐿thL^{\text{th}}italic_L start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT-layer label encodings 𝐇Lsuperscript𝐇𝐿\mathbf{H}^{L}bold_H start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT by message-passing:

(3) 𝐇L=𝐀~𝐇L1,superscript𝐇𝐿~𝐀superscript𝐇𝐿1\mathbf{H}^{L}=\widetilde{\mathbf{A}}\mathbf{H}^{L-1},bold_H start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT = over~ start_ARG bold_A end_ARG bold_H start_POSTSUPERSCRIPT italic_L - 1 end_POSTSUPERSCRIPT ,

where 𝐀~=𝐃1𝐀~𝐀superscript𝐃1𝐀\widetilde{\mathbf{A}}=\mathbf{D}^{-1}\mathbf{A}over~ start_ARG bold_A end_ARG = bold_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_A is the normalized adjacency matrix and specifically, 𝐇0=𝐇superscript𝐇0𝐇\mathbf{H}^{0}=\mathbf{H}bold_H start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_H, coming from Eq. (2). Since we have the label distribution of L𝐿Litalic_L-hop subgraphs of the ending nodes vi,vjsubscript𝑣𝑖subscript𝑣𝑗v_{i},v_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of each edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, we then can directly average 𝐇iL,𝐇jLsubscriptsuperscript𝐇𝐿𝑖subscriptsuperscript𝐇𝐿𝑗\mathbf{H}^{L}_{i},\mathbf{H}^{L}_{j}bold_H start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_H start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to obtain the label distribution 𝐆ijLCsuperscriptsubscript𝐆𝑖𝑗𝐿superscript𝐶\mathbf{G}_{ij}^{L}\in\mathbb{R}^{C}bold_G start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT of the local subgraph around that edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT.

(4) 𝐆ijL=12(𝐡iL+𝐡jL),superscriptsubscript𝐆𝑖𝑗𝐿12subscriptsuperscript𝐡𝐿𝑖subscriptsuperscript𝐡𝐿𝑗\mathbf{G}_{ij}^{L}=\frac{1}{2}(\mathbf{h}^{L}_{i}+\mathbf{h}^{L}_{j}),bold_G start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_h start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_h start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,

where 𝐆ijkLsuperscriptsubscript𝐆𝑖𝑗𝑘𝐿\mathbf{G}_{ijk}^{L}bold_G start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT indicates the proportion of the L𝐿Litalic_L-hop local subgraph around the edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT belonging to the class k𝑘kitalic_k. In this way, if the end nodes of an edge have identical and certain class distributions (i.e., MM and mm), the distribution will be highly skewed, and the uncertainty will be small. For the edge with large class distribution variance (i.e., edges in MU, Mm, UU, and Um), the distribution will be more uniform, and the uncertainty will be large.

Hence, after we have a well-established metric 𝐆ijLsuperscriptsubscript𝐆𝑖𝑗𝐿\mathbf{G}_{ij}^{L}bold_G start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT to quantify the local topology pattern around each edge, we can further quantify its entropy to measure its class uncertainty, defined as:

Definition 0.

Topological Entropy (TE): The uncertainty of the class distribution of the Lthsuperscript𝐿thL^{\text{th}}italic_L start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT-layer local subgraph around each edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is calculated as:

(5) TEijL=k=1CGijkLlog(GijkL)superscriptsubscriptTE𝑖𝑗𝐿superscriptsubscript𝑘1𝐶superscriptsubscriptG𝑖𝑗𝑘𝐿superscriptsubscriptG𝑖𝑗𝑘𝐿\text{TE}_{ij}^{L}=-\sum_{k=1}^{C}\textbf{G}_{ijk}^{L}\log(\textbf{G}_{ijk}^{L})TE start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT = - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT G start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT roman_log ( G start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT )

In this way, the edges with large class distribution variance would have a higher Topological Entropy (TE) value.

Refer to caption
Figure 3. We visualize the empirical discrepancy (in terms of training accuracy) between majority and minority edges when grouped by their TE (along with the number of edges in each group) and using GCN on the Epinions dataset.

To demonstrate the accuracy of Topological Entropy (TE) in reflecting local class distribution variance, we undertook an empirical study on the Epinions dataset. The experiment aimed to confirm that edges within categories marked by low local distribution variance—specifically MM and mm from Figure 2—would exhibit lower TE values. As shown in Figure 3, majority edges are easier to train correctly than minority edges, which aligns with the patterns in Figure 2. Figure 3 also reveals that majority edges with lower TE values (i.e., TE 0.28absent0.28\leq 0.28≤ 0.28) are indeed easy to train correctly, affirming the relatively high validation F1 scores for majority edges in MM and mm as observed in Figure 2. Conversely, minority edges with low TE struggled in training, mirroring their lower validation F1 scores in Figure 2. Additionally, the distribution trends of majority and minority edges across varying TE values further corresponded with the patterns observed (the distribution of majority edges skewed towards lower TE values, and the distribution of minority edges is rather uniform) in Figure 2, confirming the effectiveness of TE in capturing the anticipated variance. These findings affirm the methodological soundness of TE, laying a solid foundation for subsequent methodological development.

4. Topological Reweight

Building upon Section 3.1, we expect to enhance the training weights of labeled edges that have larger TE values, thereby enabling those edges that are more representative of their corresponding classes and hence lead to better performance for unlabeled edges.

For any edge eijTrsubscript𝑒𝑖𝑗superscript𝑇𝑟e_{ij}\in\mathcal{E}^{Tr}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT, we design a TE reweight method to assign edge weight wijtsubscriptsuperscript𝑤𝑡𝑖𝑗w^{t}_{ij}italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT as:

(6) wijt,L=exp(TEijLt),subscriptsuperscript𝑤𝑡𝐿𝑖𝑗𝑒𝑥𝑝subscriptsuperscriptTE𝐿𝑖𝑗𝑡w^{t,L}_{ij}=exp(\frac{\text{TE}^{L}_{ij}}{t}),italic_w start_POSTSUPERSCRIPT italic_t , italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_e italic_x italic_p ( divide start_ARG TE start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_t end_ARG ) ,

where t𝑡titalic_t is the temperature controlling the sharpness of the weight distribution and as t0𝑡0t\rightarrow 0italic_t → 0, the weight distribution becomes more uniform and hence it approaches the unweighted version. Then the training loss L𝐿Litalic_L for edge classification equipped with TE reweight is:

(7) TE=1|Tr|eijTrwijt,LCE(𝐄ij,𝐘ij)superscriptTE1superscript𝑇𝑟subscriptsubscript𝑒𝑖𝑗superscript𝑇𝑟superscriptsubscript𝑤𝑖𝑗𝑡𝐿superscriptCEsubscript𝐄𝑖𝑗subscript𝐘𝑖𝑗\mathcal{L}^{\text{TE}}=\frac{1}{|\mathcal{E}^{Tr}|}\sum_{e_{ij}\in\mathcal{E}% ^{Tr}}{w_{ij}^{t,L}\mathcal{L}^{\text{CE}}(\mathbf{E}_{ij},\mathbf{Y}_{ij})}caligraphic_L start_POSTSUPERSCRIPT TE end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t , italic_L end_POSTSUPERSCRIPT caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT ( bold_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT )

where 𝐄ijsubscript𝐄𝑖𝑗\mathbf{E}_{ij}bold_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the embedding obtained from any GML model (e.g., a GNN) followed by a classifier and having CEsuperscriptCE\mathcal{L}^{\text{CE}}caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT denote the cross entropy loss. This is then the loss function of TE reweight (Rtsubscript𝑅𝑡R_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT).

Refer to caption
(a)
Refer to caption
(b)
Figure 4. After applying Rtqsubscript𝑅𝑡𝑞R_{tq}italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT which strategically emphasizes edges with high local distribution variance on top of Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, significant performance improvement can be observed for majority class (a) and minority class (b), with the most significant improvement can be observed for edges in relatively large class distribution variance categories (i.e., edges in MU, Mm, UU, and Um categories).

Based on the insight in Section 3.1, we adaptively integrate TE reweight with the previously introduced quantity reweight method into a novel Topological Reweight (Rtqsubscript𝑅𝑡𝑞R_{tq}italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT) strategy to include both the global class distribution and local class distribution variance into consideration. The final weight can be computed as:

(8) wij=θwijt,L+(1θ)wijqsubscript𝑤𝑖𝑗𝜃superscriptsubscript𝑤𝑖𝑗𝑡𝐿1𝜃superscriptsubscript𝑤𝑖𝑗𝑞w_{ij}=\theta w_{ij}^{t,L}+(1-\theta)w_{ij}^{q}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_θ italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t , italic_L end_POSTSUPERSCRIPT + ( 1 - italic_θ ) italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT

where θ𝜃\thetaitalic_θ is a hyperparameter that controls the trade-off in the contribution between topological reweight and quantitative reweight. Correspondingly the training loss becomes:

(9) T=1|Tr|eijTrwijCE(𝐄ij,𝐘ij)superscriptT1superscript𝑇𝑟subscriptsubscript𝑒𝑖𝑗superscript𝑇𝑟subscript𝑤𝑖𝑗superscriptCEsubscript𝐄𝑖𝑗subscript𝐘𝑖𝑗\mathcal{L}^{\text{T}}=\frac{1}{|\mathcal{E}^{Tr}|}\sum_{e_{ij}\in\mathcal{E}^% {Tr}}w_{ij}\mathcal{L}^{\text{CE}}(\mathbf{E}_{ij},\mathbf{Y}_{ij})caligraphic_L start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT ( bold_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT )

To validate the effectiveness of the proposed methods, we empirically compare the edge classification performance of quantitative reweight (Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT), TE reweight (Rtsubscript𝑅𝑡R_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT), and their combined version Topological reweight (Rtqsubscript𝑅𝑡𝑞R_{tq}italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT). First, we visualize the classification performance for majority and minority edges across different categories in Epinions dataset. As shown in Figure 4(a) and Figure 4(b), Rtqsubscript𝑅𝑡𝑞R_{tq}italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT can indeed improve the performance for edges in MU, Mm, UU, and Um, which aligns with our intuition to improve performance on edges with high local distribution variance.

To empirically measure the classification performance of different methods across all the classes, we follow other imbalance classification work (Chen et al., 2021; Liu et al., 2023; Sun et al., 2022; Ma et al., 2023) and adopt balanced accuracy (average accuracy across different classes) and Macro-F1 (average F1 score across different classes) as the performance metric. As shown in Table 1, both Rtsubscript𝑅𝑡R_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Rtqsubscript𝑅𝑡𝑞R_{tq}italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT consistently yield enhancements in terms of balanced accuracy and Macro-F1 scores. This outcome empirically substantiates the efficacy of the proposed reweight method against Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT where only performance improvement in minority classes can be observed. The fact that applying topology reweight that addresses the local structure disparity performs better than quantity reweight suggests that local class distribution is more important for performance improvement, which is reasonable because in GNN message passing nodes and edges only receive messages from the local subgraph. Also, combining quantity reweight with topology reweight adaptively can take advantage of the information in both global class distribution and local class distribution and mitigate both quantity and topological imbalance issues, hence it depicts the optimal performance.

5. Mixup based on TE

In Section 4, we present the Topological Reweight (Rtqsubscript𝑅𝑡𝑞R_{tq}italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT) mechanism to emphasize the edges according to their TEs. However, merely incorporating topological reweight may not address the limited supervision provided by few minority edges, and thus there is no guaranteed performance improvement (Zhai et al., 2022; Byrd and Lipton, 2019; Sagawa et al., 2019; Gulrajani and Lopez-Paz, 2020; Koh et al., 2021). Moreover, reweighting methods can lead to overfitting (Sagawa et al., 2019), with test performance potentially degrading over numerous epochs without intervention. One method that may address these issues and further enhance the generalizability of our proposed method is mixup, as it generates synthetic samples to change the original training distribution and provides sufficient supervision. Currently, there are some mixup methods in GML tasks  (Wang et al., 2021c; Kim et al., 2023; Han et al., 2022; Ma et al., 2024; Li et al., 2022), but none of them are designed to mitigate topological imbalance issues in edge classification. Therefore, to improve the generalizability and effectiveness of our proposed method in topological imbalance edge classification tasks, we designed a TE wedge-based mixup method, where we selected the edges with high entropy and its high entropy neighborhood edges to form a “high TE wedge”, then we generate new sample node by mixup between end nodes of wedges. The overview of the TE wedge-based mixup is shown in Figure 5.

Table 1. Balanced accuracy (B_acc) and Macro-F1 (Macro_F1) across three datasets using the GCN backbone model showing performance of Rqsubscript𝑅𝑞R_{q}italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, Rtsubscript𝑅𝑡R_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and Rtqsubscript𝑅𝑡𝑞R_{tq}italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT.
Dataset Metric GCN GCN+Rqsubscript𝑅𝑞+{R_{q}}+ italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT GCN+Rtsubscript𝑅𝑡+{R_{t}}+ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT GCN+Rtqsubscript𝑅𝑡𝑞+{R_{tq}}+ italic_R start_POSTSUBSCRIPT italic_t italic_q end_POSTSUBSCRIPT
Bitcoin-alpha B_acc 0.797 ±plus-or-minus\pm± 0.005 0.857 ±plus-or-minus\pm± 0.020 0.818 ±plus-or-minus\pm± 0.004 0.834 ±plus-or-minus\pm± 0.008
Macro_F1 0.840 ±plus-or-minus\pm± 0.005 0.790 ±plus-or-minus\pm± 0.005 0.846 ±plus-or-minus\pm± 0.007 0.852 ±plus-or-minus\pm± 0.003
Epinions B_acc 0.731 ±plus-or-minus\pm± 0.008 0.768 ±plus-or-minus\pm± 0.007 0.756 ±plus-or-minus\pm± 0.008 0.769 ±plus-or-minus\pm± 0.001
Macro_F1 0.745 ±plus-or-minus\pm± 0.002 0.720 ±plus-or-minus\pm± 0.001 0.756 ±plus-or-minus\pm± 0.000 0.753 ±plus-or-minus\pm± 0.001
Reddit B_acc 0.715 ±plus-or-minus\pm± 0.003 0.779 ±plus-or-minus\pm± 0.012 0.727 ±plus-or-minus\pm± 0.035 0.762 ±plus-or-minus\pm± 0.006
Macro_F1 0.733 ±plus-or-minus\pm± 0.000 0.707 ±plus-or-minus\pm± 0.002 0.737 ±plus-or-minus\pm± 0.007 0.740 ±plus-or-minus\pm± 0.000

5.1. Motivation

To circumvent the potential pitfalls of the reweighting method that we stated above, and to accentuate edges with elevated entropy, our strategy mandates refinement beyond mere Topological reweight. A corpus of discussions, encapsulated in (Wiles et al., 2021; Sagawa et al., 2021; Zhai et al., 2022), advocate for the incorporation of additional training samples via data augmentation as a viable enhancement to topological reweighting. Wang et al.(Wang et al., 2021b) further contend that modifications to the loss function can potentially amplify the efficacy of our proposed approach. Given these considerations, we posit:

Refer to caption
Figure 5. Overview of the Topological Wedge-based mixup

Can we develop strategies that combine these elements and incorporate our understanding of topological imbalance in edge classification?

Our exploration led us to the random mixup technique. At its core, random mixup formulate synthetic samples via linear interpolation between two randomly selected distinct samples. The central tenet postulates that linear interpolations between feature vectors should logically precipitate linear interpolations of their respective targets (Zhang et al., 2017). However, direct adoptions of this technique between any two random samples present a problem for edge classification since random edge interpolations lack semantic coherence (Zhang et al., 2017), and one way to solve that is to perform mixup in wedges. However, randomly selecting wedges to perform mixup cannot address the observations made in the previous section, as we want to emphasize the high TE edges while preserving semantic coherence. Therefore, we extend our wedge-based mixup to develop a novel Topological wedge-based mixup and formally present it in the next section.

5.2. TE Wedge-based Mixup

In this section, we delineate the underlying intuition and technical intricacies of the wedge-based mixup we propose. We commence by elucidating the rationale and driving forces behind the adoption of the topological wedge-based mixup in edge classification. Subsequently, the mechanics of this approach are elaborated upon. Our methodology hinges on two primary modules. The High TE Wedge Selection Module is employed to curate high TE wedges. Within these selected wedges, a Wedge-mixup Module facilitates node mixup, engendering synthetic nodes and edges and integrating these edges with topological weight, culminating in the final loss function. This evolved loss function supplants the original variant presented in Eq. (9), paving the way for the training of the edge classifier. Below we present the details of each module.

5.2.1. High TE Wedge Selection Module:

We first define the concepts and notions about wedge:

Definition 0.

A wedge 𝒲i,c,jsubscript𝒲𝑖𝑐𝑗\mathcal{W}_{i,c,j}caligraphic_W start_POSTSUBSCRIPT italic_i , italic_c , italic_j end_POSTSUBSCRIPT is a set comprising two distinct undirected edges ecisubscript𝑒𝑐𝑖e_{ci}italic_e start_POSTSUBSCRIPT italic_c italic_i end_POSTSUBSCRIPT and ecjsubscript𝑒𝑐𝑗e_{cj}italic_e start_POSTSUBSCRIPT italic_c italic_j end_POSTSUBSCRIPT intersecting at a singular shared node vcsubscript𝑣𝑐v_{c}italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, which is referred to as the “centric node”, the node i𝑖iitalic_i and node j𝑗jitalic_j are called “end nodes:”

(10) 𝒲i,c,j={eci,ecj|eci,ecj,vi,vc,vj𝒱,ij}subscript𝒲𝑖𝑐𝑗conditional-setsubscript𝑒𝑐𝑖subscript𝑒𝑐𝑗formulae-sequencesubscript𝑒𝑐𝑖subscript𝑒𝑐𝑗subscript𝑣𝑖subscript𝑣𝑐subscript𝑣𝑗𝒱𝑖𝑗\small\mathcal{W}_{i,c,j}=\{e_{ci},e_{cj}|e_{ci},e_{cj}\in\mathcal{E},v_{i},v_% {c},v_{j}\in\mathcal{V},i\neq j\}caligraphic_W start_POSTSUBSCRIPT italic_i , italic_c , italic_j end_POSTSUBSCRIPT = { italic_e start_POSTSUBSCRIPT italic_c italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_c italic_j end_POSTSUBSCRIPT | italic_e start_POSTSUBSCRIPT italic_c italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_c italic_j end_POSTSUBSCRIPT ∈ caligraphic_E , italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_V , italic_i ≠ italic_j }

To address the observation we made in the previous section and emphasize edges with high TE, we begin by determining the TE values for edges within the training set. Then we select K𝐾Kitalic_K number of training edges based on their TE values. Specifically, edges with the highest TE are most likely to be selected. This methodology gives rise to a subset of selected edges, denoted as 1Trsubscript1subscript𝑇𝑟\mathcal{E}_{1}\subseteq\mathcal{E}_{Tr}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ caligraphic_E start_POSTSUBSCRIPT italic_T italic_r end_POSTSUBSCRIPT, and its associated edge label set 𝐘1subscript𝐘1\mathbf{Y}_{1}bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. For each edge ecisubscript𝑒𝑐𝑖e_{ci}italic_e start_POSTSUBSCRIPT italic_c italic_i end_POSTSUBSCRIPT within 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we discern an edge ecjsubscript𝑒𝑐𝑗e_{cj}italic_e start_POSTSUBSCRIPT italic_c italic_j end_POSTSUBSCRIPT that is incident with it. In cases where multiple incident edges are present, preference is accorded to those with elevated TE values. Consequently, we identify the incident edge set 2Trsubscript2subscript𝑇𝑟\mathcal{E}_{2}\subseteq\mathcal{E}_{Tr}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊆ caligraphic_E start_POSTSUBSCRIPT italic_T italic_r end_POSTSUBSCRIPT, its pertinent edge labels 𝐘2subscript𝐘2\mathbf{Y}_{2}bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and high TE wedge set 𝒲𝒲\mathcal{W}caligraphic_W.

5.2.2. Wedge-Mixup Module:

With the aid of the GNN encoder, node embeddings, denoted as Z𝑍Zitalic_Z, are procured. Then for every wedge 𝒲i,c,jsubscript𝒲𝑖𝑐𝑗\mathcal{W}_{i,c,j}caligraphic_W start_POSTSUBSCRIPT italic_i , italic_c , italic_j end_POSTSUBSCRIPT, we can generate a new synthetic node vssubscript𝑣𝑠v_{s}italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT between node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with embedding as:

(11) 𝐙vs=λ𝐙vi+(1λ)𝐙vjsubscript𝐙subscript𝑣𝑠𝜆subscript𝐙subscript𝑣𝑖1𝜆subscript𝐙subscript𝑣𝑗\small\mathbf{Z}_{v_{s}}=\lambda\mathbf{Z}_{v_{i}}+(1-\lambda)\mathbf{Z}_{v_{j}}bold_Z start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_λ bold_Z start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ( 1 - italic_λ ) bold_Z start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT

where λBeta(α,α)similar-to𝜆𝐵𝑒𝑡𝑎𝛼𝛼\lambda\sim Beta(\alpha,\alpha)italic_λ ∼ italic_B italic_e italic_t italic_a ( italic_α , italic_α ), and α𝛼\alphaitalic_α is a hyperparameter that controls the Beta distribution. In this paper, we set α𝛼\alphaitalic_α to be 4.04.04.04.0. Then we can get the new synthetic edge ecssubscript𝑒𝑐𝑠e_{cs}italic_e start_POSTSUBSCRIPT italic_c italic_s end_POSTSUBSCRIPT’s edge embedding as 𝐄ecs=𝐙vc𝐙vssubscriptsuperscript𝐄subscript𝑒𝑐𝑠conditionalsubscript𝐙subscript𝑣𝑐subscript𝐙subscript𝑣𝑠\mathbf{E}^{{}^{\prime}}_{e_{cs}}=\mathbf{Z}_{v_{c}}\|\mathbf{Z}_{v_{s}}bold_E start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_c italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_Z start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_Z start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the synthetic edge feature embedding can be generated as 𝐒ecs=λ𝐒eci+(1λ)𝐒ecjsubscript𝐒subscript𝑒𝑐𝑠𝜆subscript𝐒subscript𝑒𝑐𝑖1𝜆subscript𝐒subscript𝑒𝑐𝑗\mathbf{S}_{e_{cs}}=\lambda\mathbf{S}_{e_{ci}}+(1-\lambda)\mathbf{S}_{e_{cj}}bold_S start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_c italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_λ bold_S start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_c italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ( 1 - italic_λ ) bold_S start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_c italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then the final embedding for the synthetic edge is defined as 𝐄ecs=𝐄ecs𝐒ecssubscript𝐄subscript𝑒𝑐𝑠conditionalsubscriptsuperscript𝐄subscript𝑒𝑐𝑠subscript𝐒subscript𝑒𝑐𝑠\mathbf{E}_{e_{cs}}=\mathbf{E}^{{}^{\prime}}_{e_{cs}}\|\mathbf{S}_{e_{cs}}bold_E start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_c italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_E start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_c italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_S start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_c italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Let the synthetic edge set to be 3subscript3\mathcal{E}_{3}caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. At this point, we can send 𝐄3subscript𝐄subscript3\mathbf{E}_{\mathcal{E}_{3}}bold_E start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT into MLP classifier f𝑓fitalic_f to get the outcome f(𝐄3)𝑓subscript𝐄subscript3f({\mathbf{E}_{\mathcal{E}_{3}}})italic_f ( bold_E start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) that can be used for the following-up module. However, directly using this in practice will bring trouble as the golden label can not be linearly interpolated. Since the objective of the TE wedge-based mixup loss is to minimize the differences between synthetic edge prediction outcome and synthetic edge label, and the synthetic edges are generated through linear interpolation between edges in 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. For implementation, as suggested in  (Zhang et al., 2017), the loss can be computed by linearly combining the loss of edges in 𝐘1subscript𝐘1\mathbf{Y}_{1}bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝐘2subscript𝐘2\mathbf{Y}_{2}bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Thus, the loss function for our TE wedge-based mixup is:

(12) X=λCE(f(𝐄3),𝐘1)+(1λ)CE(f(𝐄3,𝐘2)\mathcal{L}^{\text{X}}=\lambda\mathcal{L}^{\text{CE}}(f(\mathbf{E}_{\mathcal{E% }_{3}}),\mathbf{Y}_{1})+(1-\lambda)\mathcal{L}^{\text{CE}}(f(\mathbf{E}_{% \mathcal{E}_{3}},\mathbf{Y}_{2})caligraphic_L start_POSTSUPERSCRIPT X end_POSTSUPERSCRIPT = italic_λ caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT ( italic_f ( bold_E start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ( 1 - italic_λ ) caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT ( italic_f ( bold_E start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

where CEsuperscriptCE\mathcal{L}^{\text{CE}}caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT is the cross entropy loss. Since incorporating a linear combination of Eq. (12) with original loss into the final loss function deviates from the traditional ERM paradigm, which is congruent with our earlier discourse and is anticipated to bolster the efficiency of the methodology, we can get the final loss function as:

(13) =Original+hXsuperscriptOriginalsuperscriptX\mathcal{L}=\mathcal{L}^{\text{Original}}+h\mathcal{L}^{\text{X}}caligraphic_L = caligraphic_L start_POSTSUPERSCRIPT Original end_POSTSUPERSCRIPT + italic_h caligraphic_L start_POSTSUPERSCRIPT X end_POSTSUPERSCRIPT

where hhitalic_h balances the TE wedge-based mixup and original losses.

Table 2. Classification performance of backbones before and after applying random edge-based mixup (Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT), TE edge-based mixup (Xtesubscript𝑋𝑡𝑒X_{te}italic_X start_POSTSUBSCRIPT italic_t italic_e end_POSTSUBSCRIPT), random wedge-based mixup (Xwsubscript𝑋𝑤X_{w}italic_X start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT), and TE wedge-based mixup (Xtwsubscript𝑋𝑡𝑤X_{tw}italic_X start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT) strategies.
Dataset Bitcoin-alpha Epinions Reddit
Metric B_acc Macro_F1 B_acc Macro_F1 B_acc Macro_F1
GCN 0.797 ±plus-or-minus\pm± 0.005 0.840 ±plus-or-minus\pm± 0.005 0.731 ±plus-or-minus\pm± 0.008 0.745 ±plus-or-minus\pm± 0.002 0.715 ±plus-or-minus\pm± 0.003 0.733 ±plus-or-minus\pm± 0.000
GCN+Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT 0.813 ±plus-or-minus\pm± 0.004 0.851 ±plus-or-minus\pm± 0.001 0.753 ±plus-or-minus\pm± 0.008 0.744 ±plus-or-minus\pm± 0.001 0.719 ±plus-or-minus\pm± 0.006 0.732 ±plus-or-minus\pm± 0.001
GCN+Xtesubscript𝑋𝑡𝑒X_{te}italic_X start_POSTSUBSCRIPT italic_t italic_e end_POSTSUBSCRIPT 0.818 ±plus-or-minus\pm± 0.005 0.851 ±plus-or-minus\pm± 0.006 0.744 ±plus-or-minus\pm± 0.004 0.748 ±plus-or-minus\pm± 0.002 0.720 ±plus-or-minus\pm± 0.016 0.732 ±plus-or-minus\pm± 0.004
GCN+Xwsubscript𝑋𝑤X_{w}italic_X start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT 0.811 ±plus-or-minus\pm± 0.005 0.850 ±plus-or-minus\pm± 0.002 0.751 ±plus-or-minus\pm± 0.004 0.742 ±plus-or-minus\pm± 0.002 0.710 ±plus-or-minus\pm± 0.011 0.730 ±plus-or-minus\pm± 0.005
GCN+Xtwsubscript𝑋𝑡𝑤X_{tw}italic_X start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT 0.828 ±plus-or-minus\pm± 0.010 0.852 ±plus-or-minus\pm± 0.008 0.754 ±plus-or-minus\pm± 0.007 0.749 ±plus-or-minus\pm± 0.002 0.728 ±plus-or-minus\pm± 0.016 0.737 ±plus-or-minus\pm± 0.002
Refer to caption
Figure 6. The overall framework of TopoEdge. The training process begins with utilizing the original graph to compute both edge embedding 𝐄𝐄\mathbf{E}bold_E and edge TE values. These components facilitate the execution of a TE wedge-based mixup strategy, leading to the creation of synthetic edges and their respective embeddings (i.e., 𝐄27subscript𝐄27\mathbf{E}_{27}bold_E start_POSTSUBSCRIPT 27 end_POSTSUBSCRIPT in the figure). Concurrently, TE values inform a topological reweighting process, producing edge weights. The ensemble of edge weights and embeddings forms the input for the final classifier, which will generate output as the edge classification results.

5.3. Empirical Analysis

In this section, we provide an empirical analysis of the effectiveness of TE wedge-based mixups (Xtwsubscript𝑋𝑡𝑤X_{tw}italic_X start_POSTSUBSCRIPT italic_t italic_w end_POSTSUBSCRIPT), which do not incorporate a topological reweight term in the final loss function, enabling a fair comparison. We contrast this method with random edge-based mixups (Xesubscript𝑋𝑒X_{e}italic_X start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT), TE edge-based mixups (Xtesubscript𝑋𝑡𝑒X_{te}italic_X start_POSTSUBSCRIPT italic_t italic_e end_POSTSUBSCRIPT) where edges with the highest TE scores are preferentially selected for mixups, and standard wedge-based mixups (Xwsubscript𝑋𝑤X_{w}italic_X start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT).

The results presented in Table 2 underscore the superior performance of our TE wedge-based mixup strategy, which consistently outperforms alternative methods in both balanced accuracy and Macro-F1. This confirms the hypothesis that prioritizing high TE edges can effectively address the issue of topological imbalance.

Moreover, the observation that TE edge-based mixups surpass random edge-based mixups, yet fall short of TE wedge-based mixups, reinforces the value of our approach. While emphasizing high TE edges alone does yield performance improvements, the integration of these edges with a wedge-based strategy — which also tackles semantic coherence — proves to be more advantageous. This affirms the validity of our method and aligns with our initial objectives, demonstrating the effectiveness of combining high TE edge prioritization with random wedge-based mixup techniques.

6. Overall TopoEdge Framework

To further boost the performance, we can adaptively integrate the proposed topological reweight with TE wedge-based mixup as:

(14) tX=λw3CE(f(𝐄3),𝐘1)+(1λ)w3CE(f(𝐄3,𝐘2)\mathcal{L}^{\text{tX}}=\lambda w_{\mathcal{E}_{3}}\mathcal{L}^{\text{CE}}(f(% \mathbf{E}_{\mathcal{E}_{3}}),\mathbf{Y}_{1})+(1-\lambda)w_{\mathcal{E}_{3}}% \mathcal{L}^{\text{CE}}(f(\mathbf{E}_{\mathcal{E}_{3}},\mathbf{Y}_{2})caligraphic_L start_POSTSUPERSCRIPT tX end_POSTSUPERSCRIPT = italic_λ italic_w start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT ( italic_f ( bold_E start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ( 1 - italic_λ ) italic_w start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT CE end_POSTSUPERSCRIPT ( italic_f ( bold_E start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )

Note that the weight for the synthetic edges will be the linear integration between the weights of TE wedges’ end edges. Then similarly we can linearly combine Eq. (14) with Eq. (9) to get the final loss function as:

(15) =T+htXsuperscriptTsuperscripttX\mathcal{L}=\mathcal{L}^{\text{T}}+h\mathcal{L}^{\text{tX}}caligraphic_L = caligraphic_L start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT + italic_h caligraphic_L start_POSTSUPERSCRIPT tX end_POSTSUPERSCRIPT

Where hhitalic_h is a hyperparameter that scales the topological reweight loss and topological wedge-based mixup loss. This is our final topological imbalance strategy for edge classification: TopoEdge. The pseudo-code of TopoEdge is shown in Algorithm 1, and the overall framework of TopoEdge is shown in Figure 6.

Algorithm 1 Overall TopoEdge Framework
1:Input Edge embedding 𝐄𝐄\mathbf{E}bold_E, training edges Trsuperscript𝑇𝑟\mathcal{E}^{Tr}caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT, training edge labels 𝐘𝐘\mathbf{Y}bold_Y,
classifier f𝑓fitalic_f, hyperparameters K,α,t,θ,h𝐾𝛼𝑡𝜃K,\alpha,t,\theta,hitalic_K , italic_α , italic_t , italic_θ , italic_h
2:Get the quantity weight from Eq.  (1)
3:Get the TE weights for edges in Trsuperscript𝑇𝑟\mathcal{E}^{Tr}caligraphic_E start_POSTSUPERSCRIPT italic_T italic_r end_POSTSUPERSCRIPT from Eq.  (5) and  (6)
4:Get the topological weight from Eq.  (8)
5:Get the topological training loss TsuperscriptT\mathcal{L}^{\text{T}}caligraphic_L start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT from Eq. (9)
6:λBeta(α,α)similar-to𝜆𝐵𝑒𝑡𝑎𝛼𝛼\lambda\sim Beta(\alpha,\alpha)italic_λ ∼ italic_B italic_e italic_t italic_a ( italic_α , italic_α )
7:High TE Wedge Selection Module:
  • Select K𝐾Kitalic_K edges based on TE value to get 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝐘1subscript𝐘1\mathbf{Y}_{1}bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

  • Get incident edge set 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, 𝐘2subscript𝐘2\mathbf{Y}_{2}bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, based on TE and 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,
    to form high TE wedge set 𝒲𝒲\mathcal{W}caligraphic_W

8:Wedge-Mixup Module:
  • Get synthetic loss tXsuperscripttX\mathcal{L}^{\text{tX}}caligraphic_L start_POSTSUPERSCRIPT tX end_POSTSUPERSCRIPT from Eq.  (14)

9:Wedge-loss Module:
  • Get final loss \mathcal{L}caligraphic_L from Eq.  (15)

10:return L𝐿Litalic_L

7. Experiments

In this section, we provide a comprehensive study of the effectiveness of our methods on real-world edge classification tasks.

7.1. Datasets

In this section, we present the experimental datasets employed in this study. As highlighted earlier, it is evident that issues related to topology imbalance can be observed in edge classification across various applications. Thus, we collect and formulate six benchmark imbalanced real-world graph datasets with relevant features. Below are brief descriptions of each dataset, and the basic statistics of the datasets are provided in Table 3.

  • Bitcoin-alpha (Derr et al., 2017) is a cryptocurrency transaction trust network, where a node represents the user and a directed edge represents ratings (how much one user trusts/distrusts another user). The edge features are text embeddings of user comments. Note that ratings ranging -10 (total distrust) to +10 (total trust) can be discretized into different edge types. In binary classification, trust (ratings 1 to 10) is the majority class (93.8%percent93.893.8\%93.8 %), while distrust (ratings -10 to -1) is the minority (6.3%percent6.36.3\%6.3 %). For multi-class classification, weak trust (90.3%percent90.390.3\%90.3 %), strong trust (3.5%percent3.53.5\%3.5 %), weak distrust (3.3%percent3.33.3\%3.3 %), and strong distrust (3.0%percent3.03.0\%3.0 %) are considered.

  • HSPPI is a collection of homo sapiens protein-protein interaction networks based on String database (Szklarczyk et al., 2023). Nodes represent proteins, and node features are protein sequences (acquired from EBI protein API (Nightingale et al., 2017)) that reflect the protein amino acids sequence information. The two edge types are indirect functional interactions (76.3%percent76.376.3\%76.3 %) and direct physical interactions (23.7%percent23.723.7\%23.7 %).

    Table 3. Basic statistics of the imbalanced benchmark datasets. The 4 largest frequency ratios in each dataset are represented as L0subscript𝐿0L_{0}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT through L3subscript𝐿3L_{3}italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.
    Datasets # Nodes # Edges L0subscript𝐿0L_{0}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT L3subscript𝐿3L_{3}italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT # Labels
    Bitcoin-alpha 3,784 24,207 90.3% 3.5% 3.3% 3.0% 2 or 4
    HSPPI 17,895 1,008,006 76.1% 23.9% - - 2
    NID 109,802 431,372 62.0% 14.7% 14.3% 3.0% 9
    Epinions 81,350 763,538 85.4% 14.6% - - 2
    Reddit 67,180 901,056 90.5% 9.5% - - 2
    MAG 40,000 141,725 58.6% 14.2% 13.7% 7.3% 10
  • NID is formulated from NetFlow-based IoT network dataset NF-ToN-IoT (Sarhan et al., 2021), where node represents user’s IP address, and edge represents interactions between users. There are 9999 classes in total, and 62.0%percent62.062.0\%62.0 % of the interactions are injection, 14.7%percent14.714.7\%14.7 % are DDos, 14.3%percent14.314.3\%14.3 % are benign (normal unmalicious flows), and 3.2%percent3.23.2\%3.2 % are scanning. Edge features are network flow statistics (flow duration time, flow bytes, packets, etc) between users.

  • Epinions (Guha et al., 2004) is a product review website where nodes represent each user and directed edges represent the trust/ distrust relationship from one user to another. Each user has reviews for different products. During the data filtering processing, we first remove the users who never give reviews, and then we concatenate the text embeddings of the source node with the target node to form the edge feature. We eventually have 85.4%percent85.485.4\%85.4 % trust and 14.6%percent14.614.6\%14.6 % distrust edges.

  • Reddit (Kumar et al., 2018) is a hyperlink network with communities as nodes and posts between them as edges. The edge labels describe the sentiments of the posts in source communities towards the posts in target communities, where 90.5%percent90.590.5\%90.5 % are neutral or positive, and 9.5%percent9.59.5\%9.5 % sentiments are negative. The edge features are text property vectors between the communities.

  • MAG (Sinha et al., 2015) is a citation network where nodes and edges represent the scholars and papers coauthored by scholars, respectively. Edge labels correspond to the field of study for the paper. Edge features are bag-of-words of paper abstracts. In this paper, we adopt the setting in  (Wang et al., 2023a), and the majority class ratio is 58.6%percent58.658.6\%58.6 %.

Table 4. Binary classification results on four real-world datasets indicate that our proposed method can achieve steady improvement in terms of both balanced accuracy and Macro-F1 compared with backbone GNNs, and applying TopoEdge can achieve comparable performance as SOTA baselines.
Bitcoin-alpha HSPPI Epinions Reddit
B_acc Macro_F1 B_acc Macro_F1 B_acc Macro_F1 B_acc Macro_F1
ExtWF 0.500 ±plus-or-minus\pm± 0.000 0.484 ±plus-or-minus\pm± 0.000 0.497 ±plus-or-minus\pm± 0.000 0.432 ±plus-or-minus\pm± 0.000 0.500 ±plus-or-minus\pm± 0.000 0.476 ±plus-or-minus\pm± 0.000 0.552 ±plus-or-minus\pm± 0.000 0.570 ±plus-or-minus\pm± 0.000
TER+AER (Geometric) 0.871 ±plus-or-minus\pm± 0.013 0.870 ±plus-or-minus\pm± 0.007 0.598 ±plus-or-minus\pm± 0.043 0.514 ±plus-or-minus\pm± 0.063 0.764 ±plus-or-minus\pm± 0.003 0.783 ±plus-or-minus\pm± 0.002 0.789 ±plus-or-minus\pm± 0.002 0.778 ±plus-or-minus\pm± 0.002
TER+AER (Possion) 0.868 ±plus-or-minus\pm± 0.015 0.869 ±plus-or-minus\pm± 0.004 0.603 ±plus-or-minus\pm± 0.037 0.520 ±plus-or-minus\pm± 0.051 0.768 ±plus-or-minus\pm± 0.003 0.783 ±plus-or-minus\pm± 0.002 0.786 ±plus-or-minus\pm± 0.006 0.773 ±plus-or-minus\pm± 0.002
GCN w/o TopoEdge 0.797 ±plus-or-minus\pm± 0.005 0.840 ±plus-or-minus\pm± 0.005 0.655 ±plus-or-minus\pm± 0.003 0.667 ±plus-or-minus\pm± 0.001 0.731 ±plus-or-minus\pm± 0.008 0.745 ±plus-or-minus\pm± 0.002 0.715 ±plus-or-minus\pm± 0.003 0.733 ±plus-or-minus\pm± 0.000
GCN w/ TopoEdge 0.845 ±plus-or-minus\pm± 0.004 0.858 ±plus-or-minus\pm± 0.004 0.688 ±plus-or-minus\pm± 0.009 0.695 ±plus-or-minus\pm± 0.004 0.759 ±plus-or-minus\pm± 0.002 0.756 ±plus-or-minus\pm± 0.003 0.763 ±plus-or-minus\pm± 0.001 0.745 ±plus-or-minus\pm± 0.003
GAT w/o TopoEdge 0.819 ±plus-or-minus\pm± 0.010 0.840 ±plus-or-minus\pm± 0.003 0.612 ±plus-or-minus\pm± 0.030 0.623 ±plus-or-minus\pm± 0.035 0.742 ±plus-or-minus\pm± 0.007 0.751 ±plus-or-minus\pm± 0.005 0.702 ±plus-or-minus\pm± 0.018 0.716 ±plus-or-minus\pm± 0.005
GAT w/ TopoEdge 0.861 ±plus-or-minus\pm± 0.014 0.858 ±plus-or-minus\pm± 0.002 0.641 ±plus-or-minus\pm± 0.021 0.646 ±plus-or-minus\pm± 0.022 0.744 ±plus-or-minus\pm± 0.007 0.753 ±plus-or-minus\pm± 0.005 0.758 ±plus-or-minus\pm± 0.008 0.739 ±plus-or-minus\pm± 0.002
ChebNet w/o TopoEdge 0.803 ±plus-or-minus\pm± 0.002 0.830 ±plus-or-minus\pm± 0.002 0.530 ±plus-or-minus\pm± 0.010 0.507 ±plus-or-minus\pm± 0.026 0.751 ±plus-or-minus\pm± 0.009 0.771 ±plus-or-minus\pm± 0.002 0.707 ±plus-or-minus\pm± 0.010 0.728 ±plus-or-minus\pm± 0.006
ChebNet w/ TopoEdge 0.865 ±plus-or-minus\pm± 0.006 0.865 ±plus-or-minus\pm± 0.006 0.695 ±plus-or-minus\pm± 0.002 0.693 ±plus-or-minus\pm± 0.005 0.782 ±plus-or-minus\pm± 0.009 0.783 ±plus-or-minus\pm± 0.001 0.758 ±plus-or-minus\pm± 0.008 0.746 ±plus-or-minus\pm± 0.003
Table 5. Multi-class classification results suggest that applying our proposed TopoEdge can achieve steady improvement (for both balanced accuracy and Macro-F1) with various GNN backbones over SOTA baselines.
Bitcoin-alpha NID MAG
B_acc Macro_F1 B_acc Macro_F1 B_acc Macro_F1
ExtWF 0.250 ±plus-or-minus\pm± 0.000 0.237 ±plus-or-minus\pm± 0.000 0.238 ±plus-or-minus\pm± 0.000 0.246 ±plus-or-minus\pm± 0.000 0.197 ±plus-or-minus\pm± 0.000 0.175 ±plus-or-minus\pm± 0.000
TER+AER (Geometric) 0.348 ±plus-or-minus\pm± 0.060 0.329 ±plus-or-minus\pm± 0.057 0.270 ±plus-or-minus\pm± 0.071 0.167 ±plus-or-minus\pm± 0.063 0.541 ±plus-or-minus\pm± 0.031 0.647 ±plus-or-minus\pm± 0.050
TER+AER (Possion) 0.381 ±plus-or-minus\pm± 0.100 0.352 ±plus-or-minus\pm± 0.100 0.316 ±plus-or-minus\pm± 0.006 0.242 ±plus-or-minus\pm± 0.009 0.534 ±plus-or-minus\pm± 0.026 0.627 ±plus-or-minus\pm± 0.039
GCN w/o TopoEdge 0.521 ±plus-or-minus\pm± 0.009 0.559 ±plus-or-minus\pm± 0.013 0.490 ±plus-or-minus\pm± 0.005 0.501 ±plus-or-minus\pm± 0.008 0.570 ±plus-or-minus\pm± 0.003 0.606 ±plus-or-minus\pm± 0.007
GCN w/ TopoEdge 0.558 ±plus-or-minus\pm± 0.005 0.559 ±plus-or-minus\pm± 0.005 0.516 ±plus-or-minus\pm± 0.025 0.503 ±plus-or-minus\pm± 0.004 0.619 ±plus-or-minus\pm± 0.009 0.623 ±plus-or-minus\pm± 0.009
GAT w/o TopoEdge 0.517 ±plus-or-minus\pm± 0.012 0.558 ±plus-or-minus\pm± 0.011 0.479 ±plus-or-minus\pm± 0.010 0.462 ±plus-or-minus\pm± 0.014 0.601 ±plus-or-minus\pm± 0.010 0.640 ±plus-or-minus\pm± 0.008
GAT w/ TopoEdge 0.556 ±plus-or-minus\pm± 0.001 0.583 ±plus-or-minus\pm± 0.001 0.528 ±plus-or-minus\pm± 0.030 0.490 ±plus-or-minus\pm± 0.010 0.678 ±plus-or-minus\pm± 0.007 0.652 ±plus-or-minus\pm± 0.004
ChebNet w/o TopoEdge 0.514 ±plus-or-minus\pm± 0.004 0.547 ±plus-or-minus\pm± 0.007 0.532 ±plus-or-minus\pm± 0.019 0.510 ±plus-or-minus\pm± 0.007 0.624 ±plus-or-minus\pm± 0.008 0.658 ±plus-or-minus\pm± 0.004
ChebNet w/ TopoEdge 0.542 ±plus-or-minus\pm± 0.009 0.567 ±plus-or-minus\pm± 0.010 0.556 ±plus-or-minus\pm± 0.014 0.534 ±plus-or-minus\pm± 0.006 0.690 ±plus-or-minus\pm± 0.006 0.698 ±plus-or-minus\pm± 0.001

7.2. Experiment Setting

Our training, validation, and testing sets, were set to 1:2:2 for all datasets. ExtWF (Aggarwal et al., 2016) and TER+AER (Wang et al., 2023a) are selected as baselines. ExtWF is a classic heuristic method that performs edge classification based on the dominant labels in similar edge sets, and TER+AER is a newly proposed edge classification method that is based on both high-order proximities in local subgraphs and edge attributes. GCN (Kipf and Welling, 2016), GAT (Veličković et al., 2017), and ChebNet (Tang et al., 2019) are selected as the backbones. We follow the setting in (Huang et al., 2021), where we first use these methods to get node representations. For edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, we concatenate these two learned end node representations zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and zjsubscript𝑧𝑗z_{j}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to compose an edge representation zijsubscript𝑧𝑖𝑗z_{ij}italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. After that, we train an MLP classifier on the training set and use it to predict the edge label in the testing set. As previous empirical studies do, balanced accuracy and Macro-F1 are selected as the evaluation metrics, which gives insight into the model’s performance over different classes. The temperature hyperparameter t𝑡titalic_t is finely tuned within the range (0,5]05(0,5]( 0 , 5 ]. The parameter K𝐾Kitalic_K is adjusted from 00 up to the size of the training data, while both θ𝜃\thetaitalic_θ and hhitalic_h are tuned within the interval (0,1)01(0,1)( 0 , 1 ).

7.3. Classification Performance Evaluation

The classification results are shown in Table 4 and Table 5. Overall the proposed TopoEdge depicts significant improvement against backbones. Also, applying our strategy on backbones that are not explicitly designed for learning edge representation can achieve a similar level of performance with the most advanced baseline (TER+AER) in binary classification tasks and significantly beat it in multi-class classification. This suggests the effectiveness of our proposed strategies rather than empathizing edge with high distribution variance can mitigate the topological imbalance issues. The balanced accuracy and Macro-F1 score in binary classification is higher than that in multi-class classification, which is because the edges are influenced by more classes during message passing and the aggregated message is of relatively low quality. This again shows the need to address topological imbalance in edge classification. ExtWF performs poorly across datasets because it is based on the dominant edge class in similar edge sets, edges will always tend to be predicted as the majority of edges in a highly imbalanced dataset. TER+AER can depict comparable performance in binary classification as it involves generating edge embedding based on higher-order proximity in local subgraphs, which shows that leveraging local structural patterns can indeed boost performance. However, as this method does not consider the influence of quantity and topological imbalance, hence when uncertainty in the local subgraph arises by switching to multi-class datasets, its performance degrades a lot. This also suggests that compared with focusing on local proximity, emphasizing the variance will help increase the edge classification performance more on the imbalanced dataset.

7.4. Ablation Study

In this section, we detail the ablation studies conducted to evaluate the impact of different components of the TopoEdge framework on edge classification tasks. The results, as illustrated in Figure 7, employing the full TopoEdge framework significantly enhances classification performance compared to using these components separately or the baseline models alone in both tasks. More specifically, the integration of both components in TopoEdge yields the most substantial improvements, demonstrating a synergistic effect that optimizes classification outcomes. Also, both Topological Reweight and TE Wedge-Based Mixup independently improve performance over baseline models, confirming their efficacy. These findings underscore the effectiveness of each component and highlight the benefits of their integration for imbalanced edge classification.

Refer to caption
(a)
Refer to caption
(b)
Figure 7. Ablation studies for edge classification tasks, with (a) showing the performance on binary classification tasks, and (b) showing the results for multi-class classifications. Overall, TopoEdge depicts the best performance, and topological reweight and TE wedge-based mixup can independently improve the performance.

7.5. Influence of Labeled Data Ratio

Generally speaking, the labeling ratios in graph datasets are low, so in this section, we give a further analysis of the classification performance of our proposed method under various labeled training ratios. As shown in Figure 8, our proposed strategies can steadily beat the baseline even when low-labeled training data is available, and TopoEdge (+ RX𝑅𝑋RXitalic_R italic_X ) depicts the best performance under an extremely low labeled ratio. This indicates that capturing local distribution variance works even when scare data is available. as scare data may still depict large variance and we can take advantage of that to get better edge representation.

8. Related Work

In this section, we present a summary of the related work in edge classification and imbalanced graph machine learning, which are the two closest directions to the presented work.

Edge Classification: The problem of edge classification in networks was formulated by Aggarwal et al. (Aggarwal et al., 2016) in 2016, where the authors address the edge classification tasks by proposing a structural similarity model using the weighted Jaccard coefficient as the underlying structural proximity metric. In recent years, most work in the field of edge classification has focused on generating embeddings for edges. One direction uses shallow embedding techniques. For example, AttrE2vec (Bielak et al., 2022) aggregates features through random walk, Edge2vec (Wang et al., 2020b) uses deep auto-encoders, and TER+AER (Wang et al., 2023a) generate edge embeddings by encoding local topology structure information while augmenting edge attributes through a feature aggregation scheme. Another direction lies in GNNs, where the most commonly seen approach is to use the end nodes’ embeddings and form edge embeddings via operations such as averaging, Hadamard product, concatenation, or deep neural network (Kipf and Welling, 2016; Veličković et al., 2017; Tang et al., 2019). Converting the original graphs into line graphs and then using GNNs to generate edge representation has also been studied (Wang et al., 2020a). What’s more, even though methods like EGNN (Kim et al., 2019) and EGAT (Gong and Cheng, 2019) are used for node classification, they can generate embeddings for edges directly, without first converting edges into nodes. However, none of these above methods focused on mitigating the topological imbalance in edge classification.

Imbalanced Graph Machine Learning: Imbalanced graph machine learning is an emerging research area addressing imbalance issues in graph data. In recent years, several methods have been proposed (Chen et al., 2021; Liu et al., 2021; Park et al., 2021; Zhao et al., 2021; Liu et al., 2023; Wang et al., 2022; Sun et al., 2022) in this field. One of the early works GraphSMOTE  (Zhao et al., 2021) adopts SMOTE (Chawla et al., 2002) oversampling in the node embedding space to synthesize minority nodes and complements the topology with a learnable edge predictor. In the field of topology imbalance, ReNode (Chen et al., 2021) first addresses the topology imbalance issue, i.e., the unequal structure role of labeled nodes in the topology, by down-weighting close-to-boundary labeled nodes. PASTEL (Sun et al., 2022) addresses the under-reaching and over-squashing issue in node classification through structural learning. TOBA  (Liu et al., 2023) approaches the source of the class-imbalance bias from a topology-centric perspective. However, none of the existing research focused on an edge-centric perspective. To the best of our knowledge, there are currently no extensive studies related to topology imbalance issues in edge classification. In this paper, we approach the topology imbalance issue in edge classification using a reweighting and resampling method, and consequently future boost the current benchmark performance on edge classification.

Refer to caption
(a) Bitcoin-alpha (multi-class)
Refer to caption
(b) NID
Figure 8. The classification performance of (a) Bitcoin-alpha (multi-class) and (b) NID under various labeled training ratios (e.g., 0.5 means only leveraging half of the training dataset)

9. Conclusion

In this paper, we addressed the previously overlooked issue of topological imbalance in edge classification within graph machine learning (GML). We found that this imbalance significantly hinders model performance across various graphs. To measure these imbalances, we introduced a new metric, Topological Entropy (TE), which quantifies class distribution variance in local subgraphs. Building on this, we developed a topological reweighting approach to adjust the training weights of labeled edges based on their topological properties. Additionally, we introduced a topological wedge-based mixup strategy that generates synthetic training edges through interpolation among high-TE edges. These strategies are integrated into our novel TopoEdge approach for edge classification. Our comprehensive empirical analyses confirm the effectiveness of TopoEdge in various classification tasks. Acknowledging the importance of topological imbalance, we anticipate further dedicated research in this direction. Given the widespread nature of topological imbalances, exploring resolutions across more applications is a promising direction for future research.

References

  • (1)
  • Aggarwal et al. (2016) Charu Aggarwal, Gewen He, and Peixiang Zhao. 2016. Edge classification in networks. In 2016 IEEE 32nd ICDE. IEEE.
  • Ai et al. (2022) Baole Ai, Zhou Qin, Wenting Shen, and Yong Li. 2022. Structure enhanced graph neural networks for link prediction. arXiv preprint arXiv:2201.05293 (2022).
  • Bielak et al. (2022) Piotr Bielak, Tomasz Kajdanowicz, and Nitesh V Chawla. 2022. Attre2vec: Unsupervised attributed edge representation learning. Information Sciences 592 (2022).
  • Byrd and Lipton (2019) Jonathon Byrd and Zachary Lipton. 2019. What is the effect of importance weighting in deep learning?. In ICML. PMLR.
  • Chawla et al. (2002) Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002).
  • Chen et al. (2021) Deli Chen, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie Zhou, and Xu Sun. 2021. Topology-imbalance learning for semi-supervised node classification. NeurIPS 34 (2021).
  • Derr et al. (2017) Tyler Derr, Chenxing Wang, Suhang Wang, and Jiliang Tang. 2017. Signed node relevance measurements. arXiv preprint arXiv:1710.07236 (2017).
  • Errica et al. (2019) Federico Errica, Marco Podda, Davide Bacciu, and Alessio Micheli. 2019. A fair comparison of graph neural networks for graph classification. arXiv preprint arXiv:1912.09893 (2019).
  • Gong and Cheng (2019) Liyu Gong and Qiang Cheng. 2019. Exploiting edge features for graph neural networks. In CVPR.
  • Guha et al. (2004) Ramanthan Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the 13th international conference on World Wide Web. 403–412.
  • Gulrajani and Lopez-Paz (2020) Ishaan Gulrajani and David Lopez-Paz. 2020. In search of lost domain generalization. arXiv preprint arXiv:2007.01434 (2020).
  • Guo et al. (2008) Xinjian Guo, Yilong Yin, Cailing Dong, Gongping Yang, and Guangtong Zhou. 2008. On the class imbalance problem. In 2008 Fourth international conference on natural computation, Vol. 4. IEEE.
  • Han et al. (2022) Xiaotian Han, Zhimeng Jiang, Ninghao Liu, and Xia Hu. 2022. G-mixup: Graph data augmentation for graph classification. In ICML. PMLR.
  • Huang et al. (2016) Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for imbalanced classification. In CVPR.
  • Huang et al. (2021) Junjie Huang, Huawei Shen, Liang Hou, and Xueqi Cheng. 2021. SDGNN: Learning Node Representation for Signed Directed Networks. In AAAI, Vol. 35.
  • Jha et al. (2022) Kanchan Jha, Sriparna Saha, and Hiteshi Singh. 2022. Prediction of protein–protein interaction using graph neural networks. Scientific Reports 12, 1 (2022).
  • Kim et al. (2019) Jongmin Kim, Taesup Kim, Sungwoong Kim, and Chang D Yoo. 2019. Edge-labeling graph neural network for few-shot learning. In CVPR.
  • Kim et al. (2023) Junghurn Kim, Sukwon Yun, and Chanyoung Park. 2023. S-Mixup: Structural Mixup for Graph Neural Networks. In CIKM 2023.
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
  • Koh et al. (2021) Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. 2021. Wilds: A benchmark of in-the-wild distribution shifts. In ICML. PMLR.
  • Kumar et al. (2018) Srijan Kumar, William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2018. Community interaction and conflict on the web. In WWW 2018.
  • Kwak et al. (2010) Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In WWW 2010.
  • Li et al. (2022) Jiaxing Li, Ke Zhang, Xinyan Pu, and Youyong Kong. 2022. Graph Attention Mixup Transformer for Graph Classification. In NeurIPS. Springer.
  • Liu et al. (2021) Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2021. Pick and choose: a GNN-based imbalanced learning approach for fraud detection. In WWW 2021.
  • Liu et al. (2023) Zhining Liu, Zhichen Zeng, Ruizhong Qiu, Hyunsik Yoo, David Zhou, Zhe Xu, Yada Zhu, Kommy Weldemariam, Jingrui He, and Hanghang Tong. 2023. Topological Augmentation for Class-Imbalanced Node Classification. arXiv preprint arXiv:2308.14181 (2023).
  • Ma et al. (2024) Xinyu Ma, Xu Chu, Yasha Wang, Yang Lin, Junfeng Zhao, Liantao Ma, and Wenwu Zhu. 2024. Fused Gromov-Wasserstein Graph Mixup for Graph-level Classifications. NeurIPS 36 (2024).
  • Ma et al. (2023) Yihong Ma, Yijun Tian, Nuno Moniz, and Nitesh V Chawla. 2023. Class-Imbalanced Learning on Graphs: A Survey. arXiv preprint arXiv:2304.04300 (2023).
  • Mao et al. (2023) Haitao Mao, Zhikai Chen, Wei Jin, Haoyu Han, Yao Ma, Tong Zhao, Neil Shah, and Jiliang Tang. 2023. Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All? arXiv preprint arXiv:2306.01323 (2023).
  • Nightingale et al. (2017) Andrew Nightingale, Ricardo Antunes, Emanuele Alpi, Borisas Bursteinas, Leonardo Gonzales, Wudong Liu, Jie Luo, Guoying Qi, Edd Turner, and Maria Martin. 2017. The Proteins API: accessing key integrated protein and genome information. Nucleic acids research 45, W1 (2017).
  • Pandey et al. (2019) Babita Pandey, Praveen Kumar Bhanodia, Aditya Khamparia, and Devendra Kumar Pandey. 2019. A comprehensive survey of edge prediction in social networks: Techniques, parameters and challenges. Expert Systems with Applications 124 (2019).
  • Park et al. (2021) Joonhyung Park, Jaeyun Song, and Eunho Yang. 2021. Graphens: Neighbor-aware ego network synthesis for class-imbalanced node classification. In ICLR.
  • Sagawa et al. (2019) Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. 2019. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731 (2019).
  • Sagawa et al. (2021) Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, et al. 2021. Extending the WILDS benchmark for unsupervised adaptation. arXiv preprint arXiv:2112.05090 (2021).
  • Sarhan et al. (2021) Mohanad Sarhan, Siamak Layeghy, Nour Moustafa, and Marius Portmann. 2021. Netflow datasets for machine learning-based network intrusion detection systems. In Big Data Technologies and Applications: 10th EAI International Conference, BDTA 2020, and 13th EAI International Conference on Wireless Internet, WiCON 2020, Virtual Event, December 11, 2020, Proceedings 10. Springer.
  • Sinha et al. (2015) Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW 2015.
  • Song et al. (2022) Jaeyun Song, Joonhyung Park, and Eunho Yang. 2022. TAM: topology-aware margin loss for class-imbalanced node classification. In ICML. PMLR.
  • Sun et al. (2022) Qingyun Sun, Jianxin Li, Haonan Yuan, Xingcheng Fu, Hao Peng, Cheng Ji, Qian Li, and Philip S Yu. 2022. Position-aware structure learning for graph topology-imbalance by relieving under-reaching and over-squashing. In CIKM.
  • Szklarczyk et al. (2023) Damian Szklarczyk, Rebecca Kirsch, Mikaela Koutrouli, et al. 2023. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research 51, D1 (2023).
  • Tang et al. (2019) Shanshan Tang, Bo Li, and Haijun Yu. 2019. ChebNet: Efficient and stable constructions of deep neural networks with rectified power units using chebyshev approximations. arXiv preprint arXiv:1911.05467 (2019).
  • Veličković et al. (2017) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
  • Wang et al. (2020b) Changping Wang, Chaokun Wang, Zheng Wang, Xiaojun Ye, and Philip S Yu. 2020b. Edge2vec: Edge-based social network embedding. TKDD 14, 4 (2020).
  • Wang et al. (2023a) Hewen Wang, Renchi Yang, Keke Huang, and Xiaokui Xiao. 2023a. Efficient and Effective Edge-wise Graph Representation Learning. In KDD 2023.
  • Wang et al. (2021b) Ke Alexander Wang, Niladri S Chatterji, Saminul Haque, and Tatsunori Hashimoto. 2021b. Is importance weighting incompatible with interpolating classifiers? arXiv preprint arXiv:2112.12986 (2021).
  • Wang et al. (2020a) Pengyang Wang, Jiaping Gui, Zhengzhang Chen, Junghwan Rhee, Haifeng Chen, and Yanjie Fu. 2020a. A generic edge-empowered graph convolutional network via node-edge mutual enhancement. In WWW 2020.
  • Wang et al. (2021a) Yu Wang, Charu Aggarwal, and Tyler Derr. 2021a. Distance-wise prototypical graph neural network in node imbalance classification. arXiv preprint arXiv:2110.12035 (2021).
  • Wang et al. (2021c) Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, and Bryan Hooi. 2021c. Mixup for node and graph classification. In WWW 2021.
  • Wang et al. (2023b) Yu Wang, Tong Zhao, Yuying Zhao, Yunchao Liu, Xueqi Cheng, Neil Shah, and Tyler Derr. 2023b. A Topological Perspective on Demystifying GNN-Based Link Prediction Performance. In ICLR.
  • Wang et al. (2022) Yu Wang, Yuying Zhao, Neil Shah, and Tyler Derr. 2022. Imbalanced graph classification via graph-of-graph neural networks. In CIKM 2022.
  • Wiles et al. (2021) Olivia Wiles, Sven Gowal, Florian Stimberg, Sylvestre Alvise-Rebuffi, Ira Ktena, Krishnamurthy Dvijotham, and Taylan Cemgil. 2021. A fine-grained analysis on distribution shift. arXiv preprint arXiv:2110.11328 (2021).
  • Wu et al. (2021) Bang Wu, Xiangwen Yang, Shirui Pan, and Xingliang Yuan. 2021. Adapting membership inference attacks to GNN for graph classification: Approaches and implications. In ICDM 2021. IEEE.
  • Yu et al. (2020) Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order matters: Semantic-aware neural networks for binary code similarity detection. In AAAI, Vol. 34.
  • Zhai et al. (2022) Runtian Zhai, Chen Dan, Zico Kolter, and Pradeep Ravikumar. 2022. Understanding why generalized reweighting does not improve over erm. arXiv preprint arXiv:2201.12293 (2022).
  • Zhang et al. (2017) Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).
  • Zhang and Chen (2018) Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. NeurIPS 31 (2018).
  • Zhao et al. (2021) Tianxiang Zhao, Xiang Zhang, and Suhang Wang. 2021. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In WSDM 2021.
  • Zhu et al. (2021) Jiong Zhu, Ryan A Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K Ahmed, and Danai Koutra. 2021. Graph neural networks with heterophily. In AAAI.