Key node identification for a network topology using hierarchical comprehensive importance coefficients (2024)

The identification of key nodes in complex networks has always been a hot topic in network scientific research^1,2. The stability of the network topology determines the performance of the entire network system, while the key nodes affect stability of the network^3,4,5. Therefore, designing effective algorithms to identify key nodes in the network topology can significantly improve the stability and robustness of a network^6,7,8.

So far, for the key node identification problem of single-layer networks, a variety of identification methods have been proposed. These methods can be classified according to their essential ideas, including eigenvectors-based method⁹, node removal shrinkage-based method¹⁰, and graph entropy theory-based method¹¹. Considering that identification methods based on a single attribute may ignore other characteristics, a variety of key node identification methods based on multi-attribute fusion have also been proposed¹². In addition, the rapid development in the field of machine learning in recent years has also stimulated new ideas for identifying key nodes in complex networks, such as graph neural networks¹³.

The eigenvector-based method assumes that the importance of a node is determined by the information of its neighbors. Eigenvector centrality uses the number of neighbors to determine the importance of a node. This method works well, but it converges slowly. The cumulative nomination method uses the times of a node invoked by other nodes to measure the importance of a node. This method mitigates the convergence issue of eigenvector centrality, but it is only applicable to social networks. Brin et al.¹⁴ proposed the PageRank algorithm to rank web pages in the search interface, which uses the relationship between web pages to calculate the PageRank values of web pages. The Hits method¹⁵, which was proposed almost at the same time as PageRank, considers the authority value and hub value of each node. This method can effectively discover hub nodes in the network with high computational complexity. The SALSA algorithm¹⁶ introduces the random walk theory to overcome the shortcomings of the Hits algorithm.

The method based on node removal and shrinkage identifies key nodes by measuring the changes of the structure and the function of the network after removing and shrinking nodes. Xu¹⁷ proposed the "core and core degree theory", which uses the relationship between the core degree and the number of nodes/edges to identify key nodes. Li et al.¹⁸ measured the node importance by calculating the sum of the inverse shortest distances of newly generated node pairs after removing the node set. Tan et al.¹⁹ used node shrinkage instead of deletion, where those directly connected nodes are treated as a node and the network cohesion after node shrinkage is used to measure the node importance.

In the method based on graph entropy theory, Qiao et al.²⁰ built a model that decomposes a graph into subgraphs and then computed the entropies of neighboring nodes. Furthermore, Hu et al.²¹ used this method to identify key nodes, and experiments showed that this method can be applied to various types of complex networks. Lin et al.²² used both the information entropy weight method and the analytic hierarchy process to measure the node importance.

In recent years, some methods based on multi-attribute combination have also been proposed. TOPSIS²³ combines multiple centralities with equal weights to evaluate the importance of nodes, which may not be practical. To deal with this problem, an ideal solution ranking weighted algorithm proposed by Hu et al.²⁴ assigns different weights to individual centralities. Yang et al.²⁵ proposed a dynamic TOPSIS weighted ranking method based on the infection recovery model and gray correlation analysis, which can dynamically adjust the weight of each centrality. In addition, Sun et al.²⁶ compared different methodologies such as influential node ranking and influence maximization to identify key nodes in social networks and introduced Shapley centrality as a potentially more general approach. Zhang et al.²⁷ proposed a new semi-local centrality metric based on the relative change in the average shortest path, enhancing the efficiency of identifying influential nodes. Zhu et al.²⁸ introduced a gravity model centrality method, termed HVGC, that outperforms existing methods in evaluating node importance in complex networks. Ren et al.²⁹ discussed methods that consider multiplex influences to identify key nodes in complex networks. Zhao et al.³⁰ presented a novel algorithm called NEGM that excels in measuring the relative importance of nodes in various network types, integrating network embedding with a gravity model for enhanced accuracy. Therefore, they are development trends in the field of complex networks in the future.

There are various topology analysis methods for existing networks, but key node identification methods often only focus on local attributes or global attributes; however, it is difficult to take into account both at the same time. According to Burt's structural hole theory³¹, the structural position of a node in a social network is more important than the corresponding strength of external relationships, since better structural positions have more information, resources, and power. Location advantages in social networks include local advantages and global advantages. The former can be quantified using local structural information, while the latter is determined by global topological connections. For this reason, comprehensive analysis of local and global attributes is crucial to evaluate the importance of complex network nodes. For this purpose, we propose a comprehensive importance indicator as a powerful tool for evaluating the importance of network nodes. The main contributions of this paper are summarized as follows:

(1)
Based on the Salton indicator, a weakly connected network constraint coefficient is constructed, and the local influence indicator is then refined.
(2)
Based on the improved K-Shell decomposition method, a hierarchical tenacity global coefficient is constructed, and the global influence indicator is refined.
(3)
By integrating the constraint coefficient of weakly connected network and hierarchical tenacity global coefficient, a comprehensive identification algorithm for local and global attributes is proposed. Experimental results show that the proposed algorithm outperforms many existing algorithms on real network datasets.

This paper is organized as follows. In Part 2, a hierarchical comprehensive node importance identification algorithm is proposed and classic node importance identification algorithms are briefly introduced. In Part 3, evaluation indicators are introduced to measure the performance of each algorithm. In Part 4, real network datasets are introduced for experiments. In Part 5, the comprehensive importance identification algorithm and other classic identification algorithms are tested on real network datasets. In Part 6, conclusion is drawn.

Construction of hierarchical tenacity global coefficient

Consider an undirected topological graph $G=(V,E)$, where the total number of nodes is $N=\left|V\right|$ and the total number of edges is $M=\left|E\right|$. Define A as the adjacency matrix of the undirected network and ${a}_{ij}$ as the (i,j)-th entry of A. If node i is connected to node j, then the element ${a}_{ij}=1$; otherwise, ${a}_{ij}=0$. For an undirected graph, it has ${a}_{ij}={a}_{ji},{a}_{ii}=0$. Define ${\Gamma }_{i}$ as the set of neighbors of node i. Let ${k}_{i}$ represent the degree of node i, and ${e}_{ij}$ represents the edge between node i and node j. For the undirected graph shown in Fig.1, it has $N=15, M=19,{k}_{G}=6,{a}_{KM}=1$ and ${a}_{FD}=0$.

An undirected topological graph containing 15 nodes and 19 edges.

Full size image

The importance of network nodes will be analyzed by designing a method for identifying key nodes using local and global attributes. The identification algorithm is summarized into the following three steps:

(1)
Construct weakly connected network constraint coefficient based on Salton indicator.
(2)
Construct hierarchical tenacity global coefficient based on the improved K-Shell decomposition method.
(3)
Construct a comprehensive indicator of local and global attributes based on the normalization technique. Calculate the comprehensive indicator of each node in the network and identify the importance of all the nodes.

A flow chart for realizing node importance identification in a network. The input is a certain network topology, and the output is the ranking result of node importance.

Full size image

Construction of weakly connected network constraint coefficient based on Salton indicator

This section quantifies the local attributes of each node. Structural hole theory provides a new perspective for understanding the local behavior of individuals. In fact, a structural hole is a gap between two disconnected nodes. When these two unconnected nodes are connected by a third node, the bridging node usually has more information advantages and control advantages.

To quantify the control advantages of bridge nodes, Burt introduced the network constraint coefficient NCC³¹. The NCC of node i is described as

$${NCC}_{i}=\sum_{j\in\Gamma (i)}{({p}_{ij}+\sum_{l\in\Gamma (i)\cap\Gamma (j)}{p}_{il}{p}_{lj})}^{2}$$

(1)

where ${p}_{ij}$ is the ratio of energy investment directly related to the given node i and node j, defined as

$${p}_{ij}=\frac{{a}_{ij}}{{k}_{i}}$$

(2)

As a local evaluation indicator of key nodes, NCC is usually negatively correlated with its importance in a given network. As the NCC decreases, the formation of structural holes is enhanced, and the importance of nodes increases.

The NCC of a node is calculated based on the node's neighborhood topology, including the number of neighbors of the node and the corresponding closeness between neighbors. However, NCC only collects the information of nearest neighbors and ignores the structural information of farther neighbors. In fact, NCC is ineffective when faced with nodes bridging the same number of non-redundant contacts.

For example, in Fig. 1, nodes C and F serve as bridges for node pairs (G, H) and (L, G) respectively. Nodes C and F have the same NCC, i.e.

$${NCC}_{C}={NCC}_{F}=0.46$$

That is, the two nodes have the same local influence. However, it can be seen from the figure that, although nodes C and F have the same NCC, node C has higher-order neighbors and stronger propagation ability. Therefore, NCC cannot accurately quantify the difference between node C and node F in this network.

The above analysis shows that NCC only collects information from the nearest neighbors, which results in less accurate identification of local features of nodes. In order to improve the accuracy of the method, more local structural information needs to be considered. Therefore, we propose an improved weakly connected network constraint coefficient WNCC.

Kleinberg³² points out that the strength of the connection between two people depends on the size of their shared social circle. When two social circles overlap, the power between them increases. Onnela et al.³³ studied that weak connections often serve as connectors among different communities and are of great significance to the overall connectivity of the network. Commonly used indicators to measure the effect of weak connections include Salton indicator ${S}_{ij}$ and Jaccard indicator ${J}_{ij}$³⁴, which are defined respectively as

$${S}_{ij}=\frac{\left|\Gamma (i)\cap\Gamma (j)\right|}{\sqrt{{k}_{i}\times {k}_{j}}}$$

(3)

$${J}_{ij}=\frac{\left|\Gamma (i)\cap\Gamma (j)\right|}{\left|\Gamma (i)\cup\Gamma (j)\right|}=\frac{\left|\Gamma (i)\cap\Gamma (j)\right|}{\left|\Gamma (i)\right|+\left|\Gamma (j)\right|-\left|\Gamma (i)\cap\Gamma (j)\right|}.$$

(4)

The Salton indicator ${S}_{ij}$ and the Jaccard indicator ${J}_{ij}$ represent the degree of local overlap of adjacent nodes. The lower the degree of overlap, the stronger the weak connectivity. Obviously, the greater the number of weak connections associated with a node, the more important the node's role in maintaining network connectivity.

For example, as shown in the left diagram in Fig. 3, node M is located on the shortest path between its neighbors A, B and C, and there is no direct connection between its three neighbor nodes. Therefore, the information transferred between nodes A, B, C and the cluster to which they belong will strongly depend on the link to which they are connected to node M. For node N in the right figure, its importance in maintaining network connectivity is significantly lower than that of node M due to the existence of alternative communication channels within its neighborhood.

Example network illustrating the effect of weak connections. In the left figure, the information transmission between the neighbors of node M strongly depends on the path connecting them to node M. In the figure on the right, the neighbors of node N can communicate directly through the connection paths between them.

Full size image

Inspired by Salton indicator and Jaccard indicator, the weak connection coefficient $w$ is designed as an indicator to measure the impact of node's high-order neighbor structure information on the node propagation ability, which is defined as

$${w}_{ij}=\frac{{S}_{ij}\sqrt{({k}_{i}-1)\times ({k}_{j}-1)}+1}{\left|\Gamma (i)\cup\Gamma (j)\right|-1}$$

(5)

where nodes i and j are neighbor nodes of each other. ${S}_{ij}\sqrt{({k}_{i}-1)\times ({k}_{j}-1)}+1=0$ when ${S}_{ij}=0$, that is, when the intersection of node i and neighbor node j is an empty set, the propagation ability is evaluated based on the degrees of the two nodes. $\left|\Gamma (i)\cup\Gamma (j)\right|-1$ is to eliminate the influence of nodes i and j themselves on the union of their neighbor nodes, and to eliminate the possibility of the denominator being 0. ${k}_{i}-1$ and ${k}_{j}-1$ are to eliminate the influence of nodes i and j on each other's neighbor nodes. The weak connection coefficient $w$ satisfies ${w}_{ij}={w}_{ji}$.

Construction of hierarchical tenacity global coefficient based on improved K-Shell decomposition method

This section quantifies the global attributes of each node. Often, influential nodes also play a crucial role in maintaining network connectivity. If these most influential nodes are removed or do not participate in the propagation process, the final propagation scope and propagation efficiency will be reduced. Therefore, the global performance of nodes should be considered in terms of maintaining network connectivity and facilitating information flow.

Generally speaking, if removing a node results in more components and smaller connected components in the network, then the removed node is important to maintain network connectivity. To measure the vulnerability of a given network, Cozzens et al.³⁵ proposed the concept of tenacity. Before defining tenacity, the concept of cut set will be firstly explained.

Suppose $S$ is a subset of the edge set $E$ of the graph $G$, and the deletion of all the edges of $S$ cause the connected graph $G$, $G-S$ to be unconnected. If there is no subset of $S$ that can cause the unconnection of $G-S$, then the edge set $S$ is said to be a cut set of the graph $G$.

In the graph $G$ shown in Fig.4, ${S}_{1}=\{c,d,f,g\}$ and ${S}_{2}=\{b,c,f\}$ are two different subsets of the edge set E. For subset ${S}_{1}$, since $G-{S}_{1}$ is unconnected after deleting all edges in the set, and there is no proper subset of ${S}_{1}$ that makes $G-{S}_{1}$ disconnected, edge set ${S}_{1}$ is a cut set of graph $G$. After deleting all edges of subset ${S}_{2}$, $G-{S}_{2}$ is still connected, so the edge set ${S}_{2}$ is not a cut set of graph $G$.

Schematic diagram illustrating the concept of cut sets. For the subsets ${S}_{1}$ and ${S}_{2}$ of the edge set $E$ of graph $G$, after deleting the edges contained in the subsets respectively, it can be judged according to the definition whether they can become cut sets of graph $G$.

Full size image

By combining the criteria of network damage cost, number of components and maximum connected component size, tenacity $T$ is defined as

$${T}_{G,A}=min\{\frac{\left|A\right|+\tau (G-A)}{\omega (G-A)}\}$$

(7)

where A is the cut set of graph $G$, and $\tau (G-A)$ is the number of nodes of the maximum connected subgraph of undirected graph $G-A$, which represents the size of the connected component after removing the edge set. $\omega (G-A)$ is the number of connected subgraphs of the undirected graph $G-A$, which represents the number of connected components after removing the edge set.

Tenacity $T$ can intuitively represent the decomposition ability of a connected graph after removing a certain part. When the number of removed edges is small, for some important nodes at the edge, even though they are directly connected to many nodes in the network, the topology is not destroyed after removing the connecting edges of the node. This result is consistent with the removal of many isolated nodes at the edge.

Following the calculation method of tenacity by removing edge sets, we define the tenacity of node i as

$${T}_{G,i}=\frac{1+\tau (G-i)}{\omega (G-i)}$$

(8)

where $\tau (G-i)$ is the number of nodes of the largest connected subgraph of the undirected graph $G-i$, and $\omega (G-i)$ is the number of connected subgraphs of the undirected graph $G-i$ . Obviously, when $\tau (G-i)$ is smaller and $\omega (G-i)$ is larger, the removed edge set becomes more important in maintaining network connectivity.

For example, in Fig.1, nodes A and M serve as boundary nodes in the undirected topology network. After removing two nodes from the original network, nodes A and M have the same $T$ value, that is

$${T}_{G,A}={T}_{G,M}=15$$

That is, both nodes have the same global impact. However, it can be seen from the figure that the number of nodes connected to node A is significantly higher than that of node M. Therefore, although nodes A and M have the same $T$ value, node A has a stronger propagation ability. Therefore, tenacity $T$ cannot accurately quantify the difference between node A and node M in the above sample network.

The above analysis shows that tenacity $T$ only considers the ability of node removal to split the network, which leads to inaccurate identification of the global characteristics. In order to improve the accuracy of the method, more global structural information needs to be considered. Therefore, we propose an improved hierarchical tenacity global coefficient HTGC.

The K-Shell decomposition method³⁶ is a coarse-grained node importance classification method that divides the network layer by layer from boundary to core based on node location information. The K-Shell value reflects the global position of the node in the network. The larger the K-Shell value, the more central the node's position and the more important the node is. The steps of K-Shell decomposition method are as follows:

Step 1 Calculate the degrees of all nodes in the network, take the degree of the smallest node and record it as KS, which is the K-Shell value.
Step 2 Delete all nodes with degree KS in the network, update the network and recalculate the degree value, recursively delete nodes with degree less than or equal to KS until the node degrees in the network are greater than KS. Mark all deleted nodes as KS.
Step 3 Repeat the above steps until all nodes in the network are stripped. Mark the K-Shell value.

Figure 5 shows a network containing 17 nodes and 21 edges which will be used to explain the steps of the K-Shell decomposition method. In this network, in the process of KS rising from 1 to 3, the nodes from the outermost layer to the innermost layer in the network are marked respectively. It is not difficult to see that as the core status of a node increases in the network, its K-Shell value also increases accordingly.

Schematic diagram illustrating the steps of the K-Shell decomposition method. The deeper the node is in the network, the later it will be stripped, and it will have a larger K-Shell value.

Full size image

However, using K-Shell value to represent the importance of a node is too rough, and a large number of nodes with obvious structural and functional differences have the same K-Shell value. In the refinement process of nodes with the same K-Shell value, the actual degree of the node can be used to determine the position information of the node in the same shell. As an improvement of the K-Shell decomposition process, the improved K-Shell value IKS is defined as

$${IKS}_{i}={KS}_{i}+\frac{{k}_{i}}{{k}_{i|max}+1}({KS}_{i|next}-{KS}_{i})$$

(9)

where ${KS}_{i}$ is the K-Shell value of node i, ${KS}_{i|next}$ is the K-Shell value of the node in the next layer of i (if i is in the deepest layer, the default is ${KS}_{i|next}={KS}_{i}+1$), ${k}_{i}$ is the degree of node i, ${k}_{i|max}$ is the maximum degree of the nodes in the same layer as node i.

According to Table 2, it is not difficult to conclude that ${KS}_{i}<{IKS}_{i}<{KS}_{i|next}$, so the improved K-Shell value IKS is a further refinement of the global attributes of nodes with the same K-Shell value, which can further distinguish the importance of nodes. For the K-Shell value KS and the improved K-Shell value IKS, the larger the value, the deeper the node's hierarchical position, and the higher the global importance of the node.

Full size table

Based on the improved K-Shell value IKS, the tenacity $T$ is improved by proposing the hierarchical tenacity global coefficient HTGC as follows

$${HTGC}_{i}=\frac{{T}_{G,i}}{{IKS}_{i}}$$

(10)

HTGC considers the hierarchical structure information of different nodes and refines the global influence indicator. For Fig. 1, HTGC indicators of node A and M satisfy

$${HTGC}_{A}<{HTGC}_{M}$$

Construction of comprehensive indicator of local and global attributes based on the normalization method

In complex network analysis, assessing the importance of nodes is a multidimensional problem. It is often impossible to fully reveal the true role and status of nodes from a single local or global perspective. Local attributes reflect a node's direct influence within its neighborhood and micro-position in the network, while global attributes reveal a node's influence and macro-position in the entire network structure. Therefore, developing a comprehensive indicator that integrates local and global attributes is crucial for a deep understanding of the comprehensive importance of nodes.

An effective comprehensive indicator should be able to combine these two aspects. To this end, the weakly connected network constraint coefficient WNCC and the hierarchical tenacity global coefficient HTGC are combined to yield the hierarchical comprehensive importance coefficient HCIC, which is defined as

$${HCIC}_{i}=\frac{{CL}_{i}\times {CG}_{i}}{\sqrt{\frac{1}{N-1}{\sum }_{j=1}^{N}{CL}_{j}}\times \sqrt{\frac{1}{N-1}{\sum }_{j=1}^{N}{CG}_{j}}}$$

(11)

where ${CL}_{i}$ and ${CG}_{i}$ are the normalized weakly connected network constraint coefficient and hierarchical tenacity global coefficient of the node i, which are defined respectively as follows

$${CL}_{i}=\frac{{WNCC}_{i}-\underset{j=\text{1,2},\dots ,N}{\mathit{min}}{WNCC}_{j}}{\underset{j=\text{1,2},\dots ,N}{\mathit{max}}{WNCC}_{j}-\underset{j=\text{1,2},\dots ,N}{\mathit{min}}{WNCC}_{j}}$$

(12)

$${CG}_{i}=\frac{{HTGC}_{i}-\underset{j=\text{1,2},\dots ,N}{\mathit{min}}{HTGC}_{j}}{\underset{j=\text{1,2},\dots ,N}{\mathit{max}}{HTGC}_{j}-\underset{j=\text{1,2},\dots ,N}{\mathit{min}}{HTGC}_{j}}$$

(13)

Algorithm 1 shows the pseudo code for calculating the hierarchical comprehensive importance coefficient HCIC of node i.

HCIC algorithm.

Full size image

According to the above algorithm, nodes with lower HCIC values have greater impact on maintaining network connectivity, so that the corresponding nodes are more important.

Using such a comprehensive indicator, we can not only evaluate the importance of nodes more comprehensively, but also better understand and predict the dynamic behavior and evolution trends of complex networks. This is of great significance to many fields of network science, such as social network analysis, bioinformatics, and information dissemination.

Classic benchmark algorithm

We use several classic benchmark algorithms to compare the performance of the proposed method, including:

(1) Degree centrality (DC) algorithm

Degree centrality³⁷ is a basic identification algorithm for identifying the importance of nodes. The degree of node i is defined as

$${DC}_{i}=\sum_{j=1}^{N}{a}_{ij}$$

(14)

(2) Collective influence (CI) algorithm

The collective influence³⁸ of node i is defined as

$${CI}_{i}=({k}_{i}-1)\sum_{j\in set(i,l)}({k}_{j}-1)$$

(15)

where $set(i,l)$ represents the set of all nodes whose distances from node i are less than $l$.

(3) WL algorithm

WL algorithm³⁹ is an identification method based on node degree and adjacent node degree, which is defined as

$${WL}_{i}=\sum_{j\in\Gamma (i)}({k}_{i}\times {k}_{j})$$

(16)

(4) DWT algorithm

The DWT algorithm⁴⁰ is a method that quantifies link strength based on local information of network topology and evaluates the importance of nodes based on the number of connections and overlap degree of neighbor nodes, which is defined as

$${DWT}_{i}=\sum_{j\in\Gamma (i)}{(\frac{1+{S}_{ij}}{{k}_{i}})}^{2}$$

(17)

where ${S}_{ij}$ is the Salton indicator of node i and node j.

(5) K-Shell decomposition method

The K-Shell decomposition method is a coarse-grained node importance identification algorithm that divides the network layer by layer from boundary to core based on node location information. The implementation steps of this method have been introduced above.

(6) KPD algorithm

The KPD algorithm⁴¹ is an improved algorithm based on the K-Shell decomposition method, which is defined as

$${KPD}_{i}={KS}_{i}+{k}_{i}+\frac{{l}_{i}}{{l}_{max,i}+1}$$

(18)

where ${KS}_{i}$ is the K-Shell value of node i, ${l}_{i}$ is the stripping order of node i in the same layer, and ${l}_{max,i}$ is the maximum stripping order of node i in the same layer.

(7) INCC algorithm

The INCC algorithm⁴² combines the direct and indirect effects of the nearest neighbors and second-nearest neighbors, which is defined as

$${INCC}_{i}=\sum_{j\in\Gamma (i)}{p}_{ij}+\sum_{k\in\Gamma (i)\cap\Gamma (j)}{p}_{ik}$$

(19)

where ${p}_{ij}$ is the proportion of energy investment directly related to the node i and node j.

(8) Random algorithm

A random algorithm ranks the importance of network nodes through random scoring.

(9) CIM algorithm

The CIM algorithm⁴³ is a method for identifying key nodes in complex networks based on the global structure. It constructs a comprehensive influence matrix from three aspects: shortest path length, shortest path number and non-shortest path number to reflect the influence between nodes, which is defined as

$${CIM}_{i}=\sum_{j=1}^{n}{CM}_{ji}$$

(20)

where $CM$ is the comprehensive influence matrix.

(10) GLS algorithm

The GLS algorithm⁴⁴ also considers both the local and global structures of the network, which is defined as

$${GLS}_{i}={GI}_{i}\times {LI}_{i}$$

(21)

where ${GI}_{i}$ and ${LI}_{i}$ are respectively the global influence and local influence of node i.

Evaluation indicators

In the above content, we analyzed the local attributes and global attributes of complex network topology nodes, and designed two evaluation indicators: weakly connected network constraint coefficient WNCC and hierarchical tenacity global coefficient HTGC. The above two types of indicators are normalized and integrated, and the hierarchical comprehensive importance coefficient HCIC is proposed as an evaluation indicator for the network node importance.

In order to verify the rationality of the HCIC identification algorithm, other classic importance identification algorithms will be compared, and a comparative experiment will be designed to validate the HCIC algorithm based on different evaluation indicators.

The node importance is sorted in descending order according to the node importance ranking values generated by different algorithms. The experiment evaluates the advantages and disadvantages of different node importance identification algorithms by comparing the connectivity properties of the remaining subgraphs after removing nodes with a certain importance proportion by different algorithms.

To indicate the connectivity of the remaining subgraph after removing a number of important nodes, commonly used evaluation indicators include:

(1) Maximum connectivity coefficient

The maximum connectivity coefficient ${P}_{Subset}$ is an important indicator for evaluating the performance of the identification algorithm, which is defined as

$${P}_{Subset}=\frac{R}{N}\times 100\%$$

(22)

where $R$ is the number of nodes in the maximum connected subgraph after removing a number of nodes, and $N$ is the number of nodes in the original connected network. Therefore, the smaller the ${P}_{Subset}$ value is, the smaller the largest connected subgraph will be after removing nodes from the original graph; as a result, the removing nodes are more important.

(2) Average remaining edges of the network

The average remaining edges of the network reflects the remaining connectivity of the network after a node is destroyed. It can be used as an important indicator of network repair cost and is defined as:

$${P}_{Edges}=\frac{E}{M}\times 100\%$$

(23)

where $E$ is the total number of edges remaining in the network after each removal of nodes, and $M$ is the number of edges in the initial network. Therefore, the smaller the value of ${P}_{Edges}$, the smaller the number of remaining edges in the network; as a consequence, the removing nodes are more important.

(3) Network sensitivity

Network sensitivity is defined as:

$$S=\sum_{s<\sigma }\frac{{n}_{s}{s}^{2}}{N}$$

(24)

where ${n}_{s}$ is the number of connected subgraphs with $s$ nodes, $N$ is the number of network nodes, and $\sigma $ is the node threshold.

As network nodes are gradually removed, the network is decomposed into many disconnected subgraphs. When the number of removed nodes reaches a ratio p, the peak sensitivity $S$ often appears. The peak value also means that the original network is decomposed to the greatest extent into fragment groups that are less than or equal to the threshold $\sigma $. Therefore, the smaller $p$ is, the better the identification algorithm is.

(4) Network monotonicity

A suitable node importance identification algorithm should assign a unique ranking indicator to each node. If there are multiple nodes having the same ranking indicator, the algorithm needs to be improved. In order to quantitatively measure the resolution of different methods, the monotonicity indicator $m$ of the ranking table is used, which is defined as

$$m=\frac{r-1}{N-1}\times 100\%$$

(25)

where $r$ is the number of different sortings. When nodes have the same indicator value, they have the same rank. When $m=100\%$, it means that the algorithm is completely monotonic (each node is classified into a different indicator value), while $m=0$ means that all nodes are classified into the same score (or grade).

The monotonicity indicator reflects whether the identification algorithm can distinguish nodes well. Therefore, the larger $m$ is, the stronger the ability of the method to determine a unique ranking.

Datasets

In order to show the impact of different node-removing strategies on different networks, we selected 7 network datasets with sizes ranging from ${10}^{3}$ to ${10}^{4}$ as experimental objects:

(1) bio-DM-HT
The bio-DM-HT dataset is a gene network of Drosophila melanogaster, containing 3098 protein nodes and 4750 gene regulatory edges. This dataset can be used for gene functional association analysis.
(2) bn-fly-drosophila_medulla
The bn-fly-drosophila_medulla dataset is a brain neural network containing 1802 neuron nodes and 33549 fiber bundle edges. This dataset can be used to understand neural processes such as cognition, learning, and memory.
(3) CL-10000-2d0-trial3
The CL-10000-2d0-trial3 dataset is an artificial network containing 9260 nodes and 30173 edges. This dataset can be used to evaluate the impact of network structure on its performance.
(4) p2p-Gnutella08
The p2p-Gnutella08 dataset is a P2P file sharing network, containing 6301 user nodes and 20878 file transfer edges. This dataset can be used for P2P network modeling.
(5) tech-routers-rf
The tech-routers-rf dataset is a router radio network containing 2154 device nodes and 6695 communication link edges. This dataset can be used to design connection paths for communication networks.
(6) ukerbe1
The ukerbe1 dataset is a two-dimensional finite element problem network, containing 19207 discrete nodes in space and 3522 line segment set edges. This dataset can be used to solve differential equations or variational problems.
(7) Hamrle2
The Hamrle2 dataset is a simulated circuit network, containing 5952 electrical nodes and 22162 circuit element edges. This dataset can be used to determine the voltage and current relationships over time at various points in the circuit.

The basic attributes of the network corresponding to each dataset are shown in Table 4. Where $N$ is the number of nodes in the network, $M$ is the number of edges in the network, $<k>$ is the average degree of network nodes, ${k}_{max}$ is the maximum degree of network nodes, ${C}_{avg}$ is the average clustering coefficient of the network, and $d$ is the network density.

Full size table

In this experiment, some classic identification algorithms are used as the reference objects of the HCIC algorithm, such as DC algorithm, CI algorithm, K-Shell decomposition method, INCC algorithm, random algorithm, etc. By implementing the above identification algorithms, the importance of each node in the undirected topology network can be intuitively compared. Usually, the importance is arranged in ascending or descending order according to different sorting indicators, and the specific ranking method associates with a specific ranking indicator.