Abstract
Graph pooling methods provide mechanisms for structure reduction that are intended to ease the diffusion of context between nodes further in the graph, and that typically leverage community discovery mechanisms or node and edge pruning heuristics. In this paper, we introduce a novel pooling technique which borrows from classical results in graph theory that is nonparametric and generalizes well to graphs of different nature and connectivity patterns. Our pooling method, named KPlexPool, builds on the concepts of graph covers and kplexes, i.e. pseudocliques where each node can miss up to k links. The experimental evaluation on benchmarks on molecular and social graph classification shows that KPlexPool achieves state of the art performances against both parametric and nonparametric pooling methods in the literature, despite generating pooled graphs based solely on topological information.
Introduction
Graph Neural Networks (GNNs) allow for the adaptive processing of topologyvarying structures representing complex data which comprises atomic information entities (the nodes) and their relationships (the edges).
The processing of graphs by neural networks typically leverages on message passing between neighboring nodes (Bacciu et al. 2020) to collect and exchange information on the context of nodes. Such process is made effective and efficient, also on cyclic structures, by feedforward neural layers with nodelevel weightsharing, an approach popularized under the term graph convolutions (Kipf and Welling 2017), but previously known as contextual structure processing (Micheli 2009; Bacciu et al. 2018).
Graph pooling methods provide mechanisms for structure reduction that are intended to ease the diffusion of such a context between nodes farther in the graph. These methods developed from the original concept in image processing, realizing structure reduction layers that are interleaved between graph convolutions to provide a multiresolution view of the input graph. This is intended to extract coarser and more abstract representations of the graph as we go deeper in the network, boosting information spreading among nodes. In this respect, graph pooling can help to contain model complexity, by reducing the number of convolutional layers needed to achieve full coverage of the graph, and counteract oversmoothing issues (i.e. nodes converging to very similar embeddings) by introducing structural diversity among convolutional layers (Li et al. 2018).
The definition of a robust, general and efficient subsampling mechanism that can scale to topologyvarying graphs is made difficult by the irregular nature of the data and by the lack of a reference ordering of nodes between samples. Graph pooling methods in the literature are addressing the problem from a community discovery perspective, more or less explicitly, by considering node connectivity patterns (Simonovsky and Komodakis 2017; Ma et al. 2019; Wang et al. 2020; Defferrard et al. 2016) or by aggregating the nodes based on their similarity in the neural embedding space (Ying et al. 2018; Lee et al. 2019; Bianchi et al. 2020).
Our work hinges on an explicit (and novel) link between pooling operators and two consolidated graph theoretical concepts in community discovery, namely, kplexes and graph covers. The former ones provide a flexible formalization for a community of nodes as a densely interconnected and cohesive subgraph, which relaxes the definition of a clique by allowing nodes to miss up to k links each. The latter ones relate to a softpartition of nodes whose union covers all nodes on the original graph. We argue that both concepts are necessary to realize an effective and general graph pooling mechanism as they permit to summarize the overall community structure by taking a small but meaningful set of highly connected components, represented by the kplexes.
We introduce KPlexPool, a novel pooling method using only topological graph features, which is not parameterized nor its outcomes depend on the specific predictive task. Hence, its structure reduction can be precomputed once and reused across multiple learning model configurations. We show how KPlexPool, despite being nonadaptive, provides a flexible and robust definition for node communities which can generalize well to graphs of different nature and topology, from molecular structures to social networks.
We remark that KPlexPool is not just a straightforward application of kplexes to deep learning for graphs. Rather, our pooling mechanism is the result of the careful design and integration of different community discovery and graph simplification mechanisms specifically crafted to obtain a hierarchical structure coarsening algorithm well suited to promote effective context diffusion in neural message passing. This goal is quite challenging as, for instance, a previous attempt by Luzhnica et al. (2019) clearly shows that the straightforward application of clique discovery does not yield an effective graph pooling method. We hope that our work can further stimulate the community interests in graph theory and algorithms, which have been excellent sources of inspiration for the machine learning community, such as the WeisfeilerLehman graph kernel (Shervashidze et al. 2011; Kriege et al. 2020; Vishwanathan et al. 2010) to name one of these achievements. The main original contributions of this paper are summarized below.

We propose KPlexPool, the first pooling mechanism based on the concepts of kplex communities and graph covers (Sect. 2.3).

We define a scalable algorithm to compute a hierarchy of kplex cover decompositions that optimizes context propagation and promote diversity in convolutional layers (Sect. 2.4).

We propose a postprocessing cover heuristic for sparsifying pooled graphs in scalefree structures (Sect. 2.6).

We provide a thorough and reproducible empirical assessment on 9 graph classification benchmarks, where KPlexPool is shown to be stateoftheart (Sect. 4).
We remark that while the notion of cliquecovering has been studied since the 70s, we are not aware of any formal definition or algorithm for kplex covering in literature. Hence, our contribution is novel in: i) the use of kplex communities to define pooling mechanisms in neural processing systems; ii) the definition of the notion of kplex cover; iii) the definition of an efficient algorithm to implement the concepts above.
Kplex cover graph pooling
In this section we present KPlexPool, a novel method for graph coarsening that uses kplexes as a pooling block. In Sects. 2.1, 2.2 we introduce some useful definitions from graph theory and deep learning that will be used throughout this paper. In Sect. 2.3, we begin by defining how kplexes can be used to coarsen connected communities in the graph and how edges can be determined in the pooled structure. In Sect. 2.4 we describe the algorithms to efficiently compute the kplex covers. Finally, in Sect. 2.6 we introduce a useful heuristic for sparsifying scalefree structures.
Preliminaries on graph theory
Given an undirected graph \(G\), let \(V = V(G)\) be its node set and \(E = E(G)\) be its edge set, where \(v(G) = V=n\) and \(e(G) = E=m\) are, respectively, the number of nodes and edges in \(G\). Given an edge \(e = \{u, v\}\), nodes u and v are said to be adjacent or neighboring each other. The neighborhood \(N(v)\) of v is the set of nodes adjacent to it, and the degree \(d(v)\) of v is defined as the number of its neighbors, i.e. \(N(v)\). An attributed graph is a tuple \((G, \phi , \psi )\) where \(\phi : V \rightarrow \mathbb {R}^{h_V}\) and \(\psi : V \times V \rightarrow \mathbb {R}^{h_E}\) are functions that assign a vector of features to each node and to each edge, respectively of size \(h_V\) and \(h_E\). If \(e \notin E\), then \(\psi (e) = \varvec{\mathbf {0}}\). An attributed graph can be also represented in matrix notation \((\varvec{\mathbf {A}}, \varvec{\mathbf {X}})\) by taking an arbitrary (usually predefined) ordering of its nodes \(V = \{v_1, \dots , v_n\}\) and by having \(\varvec{\mathbf {X}}\in \mathbb {R}^{n \times h_V}\) as its nodefeature matrix, whose rows are defined as \(\varvec{\mathbf {x}}_i = \phi (v_i)\). The term \(\varvec{\mathbf {A}}\in \mathbb {R}^{n \times n \times h_E}\) denotes its adjacency matrix, where \(\varvec{\mathbf {A}}_{ij} = \varvec{\mathbf {A}}_{ji} = \psi (\{v_i, v_j\})\).
A kplex is a subset of nodes \(S \subseteq V\) such that each node in S has at least \(S  k\) adjacent nodes in S: for all \(v \in S\), we have \(N(v) \cap S \ge S  k\). This definition is quite flexible as for \(k=1\) we get the classical clique and for larger values of k we obtain a relaxed and broader family of (possibly larger) subgraphs of \(G\). Some examples are shown in Fig. 1. A kplex cover of G is a family of subsets \(\mathcal {S}\) of V such that each set \(S \in \mathcal {S}\) is a kplex and their union is \(\bigcup _{S \in \mathcal {S}} S = V\).
Preliminaries on graph neural networks
Graph Neural Networks are a specialized kind of neural network that adaptively learns fixedsize representations of a set of graphs in a given distribution. GNNs are typically defined in terms of graph convolutions, both in the spectral (Bruna et al. 2014; Defferrard et al. 2016; Kipf and Welling 2017) and in the spatial domain ( Hamilton et al. 2017; Monti et al. 2017; Xu et al. 2018; Veličkovićet al. 2018; Xu et al. 2019; Morris et al. 2019). Specral GNNs perform graph convolution using the graph Fourier transform, or by approximating it with a (truncated) Chebyshev expansion (Hammond et al. 2011). Spatial GNNs, instead, perform graph convolution by applying a learned filter to the features of each node and its local neighborhood. Most GNNs can be described as instances of the socalled MessagePassing Neural Network model (Gilmer et al. 2017), which takes inspiration from the classical messagepassing paradigm and from the very first learning approaches to graphstructured data (Micheli 2009; Scarselli et al. 2005, 2009; Gori et al. 2005). Battaglia et al. (2018) further abstract this model by proposing a more general form, GraphNet, where also edge and graphlevel attributes can be parametrized.
GNNs often leverage on pooling techniques to obtain a coarsened representation of the graphs in input. As in Convolutional Neural Networks (Fukushima 1980; LeCun et al. 1989), pooling serves both as dimensionality reduction and to reduce the distance between objects in the input, thus increasing the receptive field of parametric functions applied on its output. Graph pooling is typically performed using classical clustering algorithms such as Graclus (Dhillon et al. 2007; Bruna et al. 2014; Defferrard et al. 2016; Simonovsky and Komodakis 2017; Ma et al. 2019; Wang et al. 2020), or adaptively, as in Ying et al. (2018); Cangea et al. (2018); Gao and Ji (2019); Lee et al. (2019). A more indepth comparison will be discussed in Sect. 3. Pooling is also used to obtain a final, global representation of the whole graph in input. This technique, which is usually denoted as global pooling, is usually performed by using standard aggregation functions such as sum, max or mean (Xu et al. 2019), or by exploiting techniques from the field of (multi)set representation learning, such as GlobalAttentionPool (Li et al. 2016), Set2Set (Vinyals et al. 2016), and SortPool (Zhang et al. 2018).
Graph pooling with kplexes
KPlexPool computes a kplex cover \(\mathcal {S}= \big \{S_1,\ \dots ,\ S_c\big \}\) of the input graph \((G, \phi , \psi )\), for a given k, and returns a coarsened graph \((G', \phi ', \psi ')\), such that
where \(E\big (G[S_i,\,S_j]\big ) = E(G) \cap (S_i \times S_j)\). Node \(v'_i\) represents the coarsened version of \(S_i\), and edge \(\{v'_i,\, v'_j \big \}\) exists iff there is at least one edge in G linking a node of \(S_i\) with a node of \(S_j\). The feature functions \(\phi ': V' \rightarrow \mathbb {R}^{h_V}\) and \(\psi ': V'\times V' \rightarrow \mathbb {R}^{h_E}\) aggregate, respectively, features belonging to the same kplex \(S_i\) and features of edges linking two different \(S_i\) and \(S_j\). In other words, they are defined in such a way to provide a suitable relabeling for nodes and edges in the coarsened graph:
where \(\beta \) and \(\gamma \) are arbitrary aggregation functions defined over multisets of feature vectors. Typical aggregators for node attributes are elementwise max or sum (Xu et al. 2019). For edge weights, our choice is the sum reduction: if the input graph has \(\forall e.\ \psi (e) = 1\), then the weight of an edge in the coarsened graph is the number of edges crossing the two linked kplexes. Figure 2 shows an example of a graph and the hierarchical reduction produced by the application of a series of three sumpooling.
Differently from other partitioningbased graph coarsening methods (Dhillon et al. 2007; Ng et al. 2002), in our approach a node may belong to multiple kplexes. This is also a key difference between CliquePool (Luzhnica et al. 2019) and KPlexPool with \(k = 1\) (i.e., performing a clique cover), where the former model forces a partition between nodes potentially destroying structural relationships in the communities.
From Eq. 2.2 and by the fact that every edge forms a clique, it follows that whenever two kplexes share a node, their respective aggregated nodes in the coarsened graph are adjacent. For this reason, not only \(G'\) can be denser than \(G\), but KPlexPool may also produce \(G'\) such that \(e(G') > e(G)\) on some pathological cases (e.g., star graphs). For this reason, our KPlexPool algorithm incorporates a graph sparsification method, whose details are discussed in Sect. 2.6.
To provide a compact (vectorial) definition of our operator, we can represent the kplex cover as a matrix \(\mathbf{S} \in \{0, 1\}^{n\times c}\) of hardassignments, i.e., . If we use sum for both aggregation functions, it is easy to see that Eqs. 2.3, 2.4 can be rewritten in matrix form as
where \(\varvec{\mathbf {X}}' \in \mathbb {R}^{c \times h_V}\) and \(\varvec{\mathbf {A}}' \in \mathbb {R}^{c \times c \times h_E}\) are, respectively, the node feature and adjacency matrix of \(G'\).
Kplex cover algorithm
We propose an algorithm, whose pseudocode is shown in Algorithms 1, 2, that finds a cover containing large kplexes that have small intersection. The rationale for this choice is driven by the soughtafter effect on graph pooling mechanisms in graph neural networks. On one hand, we seek to condense into a single communitynode those neighboring nodes which are likely to share the same context and, hence, very similar embeddings. On the other hand, we would like the pooled graph to preserve diversity for nodes belonging to different communities, i.e. avoiding trivial aggregations which would induce heavy connectivity between the communities. Our algorithm is inspired to the clique covering framework in Conte et al. (2016, 2020), and leverages on heuristics that specifies the order on which nodes are considered for kplex inclusion. Algorithm 1 receives in input this order by means of two priority functions \(f,\,g\) on V that are defined to provide large kplexes with small pairwise intersections. In practice, we fixed \(f,\,g\) to prioritise nodes with lower degree (for \(f\)) and more neighbors in the kplex (for \(g\)). A deeper discussion on priority functions is provided in Table 1.
Algorithm 1 uses Algorithm 2 as a subroutine, where the latter, starting from an uncovered node v (i.e., a node that is not included in the current cover), retrieves a kplex \(S \subseteq V\) containing v. We discuss more in details both Algorithms 1 and 1 in the following. Algorithm 1 begins by iterating over the available nodes in the candidate set \(U\), which is initialized with the whole set of nodes of the input graph. At each iteration, it selects the next candidate v by extracting the node in \(U\) with highest priority \(f(v)\). The node v will be then used as a starting node for retrieving the next kplex \(S \subseteq V\), using Algorithm 2. The elements of S will then be removed from the set \(U\) of candidates, and S will be included in the output cover \(\mathcal {S}\). Note that the nodes in the kplexes are removed from \(U\) but not from the graph, hence successive executions of Algorithm 2 may contain previously removed nodes. The algorithm stops when eventually all the nodes are removed from \(U\), and then returns the cover \(\mathcal {S}\). Algorithm 2, instead, constructs a kplex S starting from a given node v, which is the only element available at startup. It initializes a candidate set \(C\) of nodes that could be part of the kplex, this time relying on \(N(v)\). Again, Algorithm 2 iterates over the nodes in \(C\) following the ordering defined by the priority \(g\), and adds them to S. The main loop has the following invariants:
where Eq. 2.6 states that every node u in S needs to have at least \(Sk\) adjacent nodes in S and Eq. 2.7 states that, when u has exactly \(Sk\) adjacent nodes in S, we cannot have nodes in C that are not adjacent to u (as it would break Eq. 2.6 if selected). As a result, any node from C can be added to S.
Both invariants are satisfied at the first iteration as all the candidates in \(C=N(v)\) are adjacent to v, which is the only element in S. At each iteration, Algorithm 2 preserves Eq. 2.6 by selecting a node u from C according to priority g. It needs to preserve Eq. 2.7 as S changes because of the addition of u. The first two for loops remove the nodes from C that no longer can be added to S. The third loop adds to C the nodes adjacent to u which can be added to S. After that, g is updated.
The above invariants guarantee that any S retrieved by Algorithm 2 is a kplex. As Algorithm 1 terminates only when all the nodes have been covered by at least one kplex in the current solution \({\mathcal {S}}\), we obtain that \({\mathcal {S}}\) is a kplex cover, hence proving the correctness of Algorithm 1.
Priority functions
To effectively summarize the graph for pooling purposes, we need to cover all the nodes with as few kplexes as possible: we thus want these kplexes to be large and with small overlaps. Following the experimented heuristics in Conte et al. (2016, 2020), we adapt the best performing priorities to the case of kplexes. Specifically, the priorities \(f\) and \(g\) define how the nodes will be extracted during the execution of Algorithms 1, 2, guiding the covering process.
Table 1 provides a selection of priority functions along with their computational cost. They span from random baseline policies to orders induced by the topology of the pooled graph and intended to yield pooled graph with interesting structural properties. Random and maxdegree priorities should be quite selfexplanatory; maxuncovered returns the number of neighbors that are not yet covered by a kplex in \(\mathcal {S}\). Every \(f\) priority is also a valid strategy for \(g\). Concerning the latter, we can also consider

i
maxinkplex, that assigns to every candidate node the number of its neighbors within the current kplex S; and

ii
maxcandidates, that assigns to every candidate node the number of its neighbors within the current candidate set C.
For the experimental assessment in Sect. 4, we implement the cover priority \(f\) as the concatenation of mindegree and maxuncovered. Here the term concatenation denotes the fact that nodes are ordered following a lexicographic ordering defined first on mindegree with ties broken on maxuncovered (note that mindegree is discrete valued and many nodes are likely to have the same priority). The kplex priority \(g\) is obtained as the concatenation of maxinkplex, maxcandidates and minuncovered, following the lexicographic ordering approach described for \(f\). Both of the concatenated priorities, respectively for \(f\) and \(g\), will be referred to as default in the following. Note that min priorities can be trivially obtained by the respective max definition. The choice of these combinations of policies is driven by the observation that they are able to generate large kplexes while maintaining small intersections between them which, again, is a key property we would like to induce in our pooling mechanism. Figures 3, 4 show two useful statistics obtained executing Algorithm 1 with various combinations of the proposed priority functions. Specifically:

the average number of clusters in a cover (Fig. 3), which we aim to minimize, since a high number of clusters can be a signal of large intersections and/or small kplexes;

the average number of occurrences of nodes in a cover (Fig. 4), which again we want to minimize, since a node has more than one occurrence if it belongs to multiple kplexes and, thus, lies in an intersection between them.
Cover postprocessing
On certain kinds of graphs, like star graphs or scalefree networks, few hub nodes have a degree that greatly exceeds the average on the graph: Algorithm 1 may include them in many distinct (and all pairwise adjacent) kplexes, generating dense artifacts that do not well represent the graph. To overcome this problem, we discuss a cover postprocessing mechanism, referred to as hub promotion, that sparsifies the coarsened graph by changing the cover assignments, removing hub nodes from the sets of the cover and assigning each to a dedicated singleton set. Specifically, for a given cover \(\mathcal {S}\) of a graph \(G\), and a threshold value \(p\in [0, 100]\), we proceed as follows.

1.
Compute the covering index \(s(v) = \left \big \{S \in \mathcal {S}\bigm  v \in S\big \}\right \) for every \(v \in V\).

2.
Compute \(s_p\) as the \(p\)th percentile of the above values and find the subset of hub nodes \(H_p\subseteq V\) such that \(s(v) > s_p\) for all \(v \in H_p\).

3.
Generate a modified cover \(\mathcal {S}_p= \big \{S \setminus H_p\bigm  S \in \mathcal {S}\big \} \cup \big \{\{v\} \bigm  v \in H_p\big \}\). Note that, as a limit case, \(\mathcal {S}_{100} = \mathcal {S}\).

4.
Coarsen \(G\) applying Eqs. 2.1, 2.2, 2.3, 2.4 with the resulting cover \(\mathcal {S}_p\).
The effect of this method is assessed on social network datasets in Sect. 4. The results of a detailed ablation study, including both molecular and social graphs, is reported in Sect. 4.1.
Computational cost
It can be easily shown that Algorithm 2 takes O(m) time, since we can efficiently store and update \(g\) with standard data structures: Indeed, as each node v is added to S or \(C\) and removed from C at most once, and each update costs \(O(d(v))\), the amortized cost is \(O(\sum _{v\in V(G)}d(v)) = O(m)\) time. The time cost of Algorithm 1 is bounded by O(mn): each time \(\textsc {FindKPlex} \) is called at least one new node (v) is covered, thus the while loop will be executed O(n) times. Each time, updating S, U, and \(f\), and calling Algorithm 2, all O(m) time, meaning Algorithm 1 always terminates within O(mn) time. The cost of the postprocessing is also dominated by the cost of Algorithm 1.
KPlexPool performs \(\ell \) pooling steps. In each step, it finds a cover via Algorithm 1, taking O(nm) time, then performs feature aggregating in the clusters by multiplying two matrices of sizes at most \(n\times n\) (the cover has at most n kplexes). The best known cost for such multiplication is \(O(n^\omega )\), where \(\omega < 2.373\) (Le Gall 2014), giving a total cost of \(O((nm + n^\omega )\ell )\) time.
Related works
Models in literature can be partitioned in two general classes, based on whether they tackle the structure reduction problem using a topological or an adaptive approach (Bacciu et al. 2020). The former exploits solely structural and community information conveyed by the sample topology. The latter generates the graph reduction using the information on the node embeddings, leveraging adaptive parameters that are trained together with graph convolution parameters. Several topological pooling algorithms are based on a clustering of the adjacency matrix. METIS (Karypis and Kumar 1998) and Graclus (Dhillon et al. 2007) are multilevel clustering algorithms that partition the graph by a sequence of coarsening and refinement phases. Louvain (Blondel et al. 2008), ECG (Poulin and Théberge 2019), and Leiden (Traag et al. 2019), start with singleton clusters an then greedily merge the ones that maximize the overall modularity. NMFPool (Bacciu and Di Sotto 2019) performs a probabilistic clustering of the nodes by nonnegative factorization of the adjacency matrix. Other approaches leverage node neighborhoods and local communities, such as Gama et al. (2019). ECC (Simonovsky and Komodakis 2017) considers a point cloud representation of the graph to group nearby nodes in Euclidean space. EigenPool (Ma et al. 2019) and HaarPool (Wang et al. 2020; Li et al. 2020) define novel aggregation techniques on the top of existing clustering algorithms (such as METIS). CliquePool (Luzhnica et al. 2019), instead, aggregates the attribute vectors of the nodes within the same clique. Adaptive pooling is, again, largely based on clustering but, differently from topological approaches, this is performed on the neural embeddings obtained from the graph convolutional layers. These methods rely on a parameterized clustering of the neural activations which entails that they need to preserve differentiability. DiffPool (Ying et al. 2018) has pioneered the approach by introducing a hierarchical clustering that softassigns each node to a fixed number of clusters c, generating \(\varvec{\mathbf {S}}\in \mathbb {R}^{n\times c}\) with another GNN. This process does not yield to an actual reduction of the adjacency matrix, as this would infringe differentiability, resulting in highly parameterized models of considerable complexity. MinCutPool ( Bianchi et al. 2020) further improve this technique by optimizing a relaxed version of the normalizedcut objective. StructPool (Yuan and Ji 2019), instead, computes the softclustering associations through a Conditional Random Field (CRF). gPool (Gao and Ji 2019; Cangea et al. 2018), also known as TopKPool (Fey and Lenssen 2019), addresses this shortcoming by learning a single vector \(\varvec{\mathbf {p}} \in \mathbb {R}^{h_\text {in}}\), used to compute projection scores that serve to retain the top \(\lceil kn\rceil \) ranked nodes. SAGPool (Lee et al. 2019; Knyazev et al. 2019) later extended TopKPool to compute the score vector by means of a GNN. GSAPool (Zhang et al. 2020) further extends this model by a convex combination of two projection vectors: one learned by a GNN, and another by a classical neural network (with no topology information). The authors of ASAPool (Ranjan et al. 2020) showed the limitations of using a standard GCN (Kipf and Welling 2017) to compute the projection scores, and defined another convolution for graphs (LEConv) for that specific purpose. EdgePool (Diehl 2019; Diehl et al. 2019) reduces the graph by edge contraction, based on a score computed by a neural model that takes the edge incident nodes as input.
KPlexPool falls into the family of topological pooling but proposes a radically different approach to adjacency clustering models, which is based on wellgrounded concepts from graph algorithmics. CliquePool is the most closely related model, but it considers a much more restricted and less flexible form of community than ours. Also, CliquePool is limited to simple graph partitions, whereas our approach can leverage the flexibility of assignments of graph covers. The effect of the greater generality and flexibility of KPlexPool is evident by its excellent empirical performances on a variety of graph topologies (Sect. 4). When compared to adaptive pooling methods, KPlexPool has certainly the disadvantage of relying on graph reductions that are not driven by the predictive task. Nonetheless, the experimental assessment shows that KPlexPool can achieve state of the art performances on both molecular and social graphs, also outperforming adaptive approaches where pooling is learned to optimize the predictive task.
Experimental analysis
We tested KPlexPool against related methods from literature on four molecular graph datasets, namely DD (Dobson and Doig 2003), NCI1 (Wale et al. 2008), ENZYMES (Schomburg et al. 2004), and PROTEINS (Borgwardt et al. 2005), and five social network datasets, COLLAB, IMDBBINARY, IMDBMULTI, REDDITBINARY, and REDDITMULTI5K (Yanardag and Vishwanathan 2015). All datasets have been retrieved from the TUDortmund collection (Morris et al. 2020).
For fairness, each pooling method has been tested by plugging it into the same standardized architecture (Baseline), comprising \(\ell \) convolutional blocks, followed by two dense layers, the latter interleaved by dropout with probability 0.3. Every convolutional block is formed by two GNN layers (either GCN (Kipf and Welling 2017) or GraphSAGE ( Hamilton et al. 2017)) with Jumping Knowledge (Xu et al. 2018) followed by a dense layer. After every convolutional block we have a global sumpooling, and the concatenation of their resulting vector is batchnormalized and fed to the final dense block. Every layer, GNN or dense, has h units and a ReLU activation function (Goodfellow et al. 2016). For every other model, a pooling layer is placed after the first \(\ell  1\) convolutional blocks and its output feed to the next block. All models have been implemented using PyTorchGeometric (Fey and Lenssen 2019), which also provided implementations for Graclus, DiffPool, MinCutPool, TopKPool, and SAGPool. Leiden has instead been computed using the cuGraph library (RAPIDS Development 2018). We reimplemented CliquePool using the NetworkX library (Hagberg et al. 2008), which provided an implementation of the Bron and Kerbosch (1973) algorithm and its optimizations (Tomita et al. 2006; Cazals and Karande 2008).
Our experimental approach followed the standardized reproducible setting in Errica et al. (2020). Specifically, for model assessment, we used a stratified 10fold crossvalidation and for model selection an inner stratified shuffle split, generating a validation set of the same size of the outer fold and leaving the remaining examples as training set. We performed a grid search for each fold, using the parameter space listed in Table 2. For each parameter combination, we trained each model with earlystopping after 20 epochs, monitored on the validation set. We selected the one that obtained the highest accuracy on the validation set evaluated it on the outer fold. We tested different configurations of KPlexPool depending on the domain. Since graphs become denser after each pooling layer, on biological datasets we tested the effect of a progressive reduction factor \(r_k\in (0, 1]\), so that pooling at layer \(\ell \) has \(k^{(\ell )} = \lceil r_kk^{(\ell 1)}\rceil \). Here we also used the concatenation of sum and max as aggregation function on nonparametric pooling methods, i.e. Leiden, Graclus, CliquePool and KPlexPool. This could not be applied to selectionbased methods (TopKPool, SAGPool), nor to the softclustering in DiffPool and MinCutPool.
For the sake of conciseness, in the following, we summarize the empirical results that assess the effect of the postprocessing technique described in Sect. 2.6. Section 4.1 further provides an ablation study showing how k, the progressive reduction factor \(r_k\), and the hub promotion threshold contribute to the result.
All models were trained on a NVIDIA^{®} V100 GPU with 16GB of dedicated memory, while the coverings were precomputed on CPU, a Intel^{®} Xeon^{®} Gold 6140M with 1.23TB of RAM. KPlexPool has been implemented both in sparse coordinate form, which is slower but space efficient, and in matrix form, which is spaceinefficient but apt to GPU parallelization. In the experiments, we used the latter except for DD, REDDITB and REDDIT5K, as they contain larger graphs. For the same reason, DiffPool could not be trained on these datasets, since it required too much memory unless using only six samples per batch, thus requiring longer training times (more than two weeks per experiment). For the experiments in which DiffPool hits the outofresource limit, we report results from Errica et al. (2020), which have been obtained under similar, although not fully equivalent, conditions. The coarsened graphs’ topologies for nonparametric methods were precomputed once at the beginning of every experiment, while their node features were aggregated at every training step.
Tables 3, 4 show the mean accuracy and standard deviation on test data (outer fold). On chemical datasets, KPlexPool yields competitive results on all datasets, with higher performances than other related methods on all benchmarks but ENZYMES. The application of the incremental reduction \(r_k\) to the k value provides sensible benefits only on ENZYMES, while on other tasks effects are superficial at most.
On social benchmarks, KPlexPool performs better than parametric pooling models on all datasets, when considering the same experimental conditions. DiffPool is outofresources on REDDIT data, but KPlexPool is still competitive also with respect to DiffPool results from Errica et al. (2020). On REDDIT5K, only the topological pooling of Graclus yields to higher accuracy. Applying hubpromotion (Sect. 2.6) increases KPlexPool performance on IMDBM and REDDITB, as shown in Table 4. Its communityseeking bias seems certainly very adequate for the processing of social graphs, where adaptive pooling methods do not seem capable of leveraging their parameters to produce more informative graph reductions. Perhaps surprisingly, KPlexPool achieves excellent performances also on molecular data, where we would have expected adaptive models to have an edge, confirming our initial intuition about the flexibility and generality of kplex cover communities. These results can be well appreciated in Table 5, where we report the average rank of each model separately on chemical, social, and the union of all datasets, with respect to the results shown in Tables 3, 4. For KPlexPool, we report the average rank considering a plain configuration (i.e. no incremental reduction nor hubpromotion) as well as for the best and worst performing configurations. These results highlight how the performances of KPlexPool are remarkably stable with respect to the choice of its configuration options (cover postprocessing methods).
Ablation studies and practical considerations
For completeness, we performed ablation studies aimed at analyzing the every component of KPlexPool contributes to the overall performance. Table 6 compares the results obtained on ENZYMES, NCI1, and PROTEINS using KPlexPool with different combinations of k and \(r_k\). We used the experimental approach described in Sect. 4, but restricted the hyperparameter space (Table 2) by fixing \(\ell = 3\), \(h = 64\), and \({\text {GNN}} = {\text {CGN}}\). As anticipated in Sect. 4, KPlexPool appears to yield a better accuracy by aggregating 2 to 4plexes instead of simple cliques, and the best values (highlighted in bold) are obtained for \(k=2\) on two out of the three datasets.
Applying a reduction factor seem to be more effective with larger k values: This could be caused by the fact that a large k generates a cover containing few, large, sets; hence, pooling layers beyond the first one will work on a smaller and denser coarsened graph, with denser inner communities, that is better summarized by cliques or kplexes with small k values.
Table 7 compares instead the results obtained on COLLAB, IMDBB, and IMDBM using the same approach as above, but varying k and \(p\) (Sect. 2.6), and fixing \(\ell = 2\).
Considering the generation process of the networks (see Yanardag and Vishwanathan 2015), we note that COLLAB is obtained by connecting authors that coauthored a paper, while IMDBB and IMDBM by connecting actors/actresses that costarred in a movie. Furthermore, IMDBM includes more movies than IMDBB. We can thus imagine that IMDBM undergoes a more prominent presence of hubs compared to IMDBB due to the “rich gets richer” phenomenon (i.e., if we increase the number of movies considered, more famous actors have a greater probability of being featured in these movies and thus receiving more connection).
This is reflected on the data in Table 7: we can observe how the best value for IMDBM is obtained when \(p= 80\) (i.e., the top \(20\%\) of nodes is considered a hub) whereas the best value of IMDBB is obtained for \(p= 90\) (i.e., the top \(10\%\) is considered a hub). More in general, we can observe that Table 7 motivates the usage of the hub promotion parameter \(p\), as in all three datasets considered we obtain benefits by setting it as a nontrivial value (we recall that \(p=100\) corresponds to not using hub promotion).
As for varying the k values, we do not observe any significant trend. This is perhaps unsurprising, considering that these datasets are obtained by turning the coauthors of a paper (resp., costars of a movie) into a clique, thus \(k=1\) may be general enough for our requirements in some cases, although we observe that the best results are found on different lines for different values of \(p\), and in particular the best value of IMDBM is found for \(k=4\).
For an optimal choice of parameters, the ideal strategy would be to find the most effective values of k from the hyperparameter tuning process. However, we observe that the choice \(k=2\) seems to have good overall performance, being often the best result or close to it: if it is required to use the approach on the fly, as a rule of thumb we suggest trying low values of k.
Conclusions
We have introduced KPlexPool, a novel graph pooling methodology leveraging kplexes and graph covers. Starting from consolidated graphtheoretic concepts and recent results on scalable community detection (Conte et al. 2018), we have built a flexible graph reduction approach that works effectively across structures with different topological properties. We have provided a general formulation that can account for different node inspection schedules and that can, in principle, be tailored based on prior or domain knowledge. Nevertheless, in the paper, we discussed the effectiveness of the approach for a fixed node ordering heuristic and cover postprocessing strategy, showing the effectiveness of the method even in the plainest configuration.
The resulting KPlexPool algorithm has stateoftheart performances in 7 out of 9 graph classification benchmarks. KPlexPool is shown to be the best performing method, on average, when confronted with related pooling mechanisms from literature. It does so through a fully topological approach that does not leverage task information for community building. None of the related models, including the adaptive ones, seem to have the same ability to cope effectively with structures of radically different nature (molecules and social networks). Apart from predictive performance, KPlexPool has a very practical advantage in terms of computational cost, when compared to adaptive models such as DiffPool. For instance, its graph reduction can be precomputed once for the whole dataset and reused throughout the whole model selection and validation phase, as it does not depend on adaptive node embedding. This aspect is clear from the empirical results, in which DiffPool is shown to fail to complete training within the 2 weeks limit (or to exceed the available GPU memory) on datasets comprising larger graphs. Conversely, as the proposed kplex cover algorithm is mostly sequential, computing KPlexPool on the fly during the training loop will produce an overhead due to GPUCPU synchronizations. To overcome this limitation, we need to design a parallel alternative for our algorithm that could also run on GPU. We left this as a future work.
Code availability
Code available at https://github.com/flandolfi/kplexpool/.
Change history
15 November 2021
Funding note has been added to this article.
References
Bacciu D, Di Sotto L (2019) A nonnegative factorization approach to node pooling in graph convolutional neural networks. In: Alviano M, Greco G, Scarcello F (eds) AI*IA 2019advances in artificial intelligence, Lecture notes in computer science. Springer, Cham, pp 294–306, https://doi.org/10.1007/9783030351663_21
Bacciu D, Errica F, Micheli A (2018) Contextual graph markov model: a deep and generative approach to graph processing. In: International Conference on Machine Learning, pp 294–303, ISSN: 19387228
Bacciu D, Errica F, Micheli A, Podda M (2020) A gentle introduction to deep learning for graphs. Neural Netw 129:203–221. https://doi.org/10.1016/j.neunet.2020.06.006
Battaglia PW, Hamrick JB, Bapst V, SanchezGonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R, Gulcehre C, Song F, Ballard A, Gilmer J, Dahl G, Vaswani A, Allen K, Nash C, Langston V, Dyer C, Heess N, Wierstra D, Kohli P, Botvinick M, Vinyals O, Li Y, Pascanu R (2018) Relational inductive biases, deep learning, and graph networks. arXiv:1806.01261
Bianchi FM, Grattarola D, Alippi C (2020) Spectral clustering with graph neural networks for graph pooling. Proc Int Conf Mach Learn 1
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp. https://doi.org/10.1088/17425468/2008/10/P10008
Borgwardt KM, Ong CS, Schönauer S, Vishwanathan SVN, Smola AJ, Kriegel HP (2005) Protein function prediction via graph kernels. Bioinform (Oxford, Engl) 21(Suppl 1):i47–56. https://doi.org/10.1093/bioinformatics/bti1007
Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577. https://doi.org/10.1145/362342.362367
Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and locally connected networks on graphs. In: Bengio Y, LeCun Y (eds) Proceedings of the 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Conference Track Proceedings
Cangea C, Veličković P, Jovanović N, Kipf T, Liò P (2018) Towards sparse hierarchical graph classifiers. arXiv:1811.01287
Cazals F, Karande C (2008) A note on the problem of reporting maximal cliques. Theor Comput Sci 407(1):564–568. https://doi.org/10.1016/j.tcs.2008.05.010
Conte A, Grossi R, Marino A (2016) Clique covering of large realworld networks. In: Proceedings of the 31st annual ACM symposium on applied computing, association for computing machinery, Pisa, Italy, SAC ’16, pp 1134–1139, https://doi.org/10.1145/2851613.2851816
Conte A, De Matteis T, De Sensi D, Grossi R, Marino A, Versari L (2018) D2K: scalable community detection in massive networks via smalldiameter kplexes. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, association for computing machinery, London, United Kingdom, KDD ’18, pp 1272–1281, https://doi.org/10.1145/3219819.3220093
Conte A, Grossi R, Marino A (2020) Largescale clique cover of realworld networks. Inform Comput 270: https://doi.org/10.1016/j.ic.2019.104464
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the 30th international conference on neural information processing systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’16, pp 3844–3852
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intel 29(11):1944–1957. https://doi.org/10.1109/TPAMI.2007.1115
Diehl F (2019) Edge contraction pooling for graph neural networks. arXiv:1905.10990
Diehl F, Brunner T, Le MT, Knoll A (2019) Towards graph pooling by edge contraction. In: ICML 2019 workshop on learning and reasoning with graphstructured data
Dobson PD, Doig AJ (2003) Distinguishing enzyme structures from nonenzymes without alignments. J Mol Biol 330(4):771–783. https://doi.org/10.1016/s00222836(03)006284
Errica F, Podda M, Bacciu D, Micheli A (2020) A fair comparison of graph neural networks for graph classification. In: International conference on learning representations
Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch geometric. In: ICLR workshop on representation learning on graphs and manifolds
Fukushima K (1980) Neocognitron: a selforganizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202. https://doi.org/10.1007/BF00344251
Gama F, Marques AG, Leus G, Ribeiro A (2019) Convolutional neural network architectures for signals supported on graphs. IEEE Trans Signal Process 67(4):1034–1049. https://doi.org/10.1109/TSP.2018.2887403
Gao H, Ji S (2019) Graph UNets. In: International Conference on Machine Learning, pp 2083–2092, ISSN: 19387228 Section: Machine Learning
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for Quantum chemistry. In: Proceedings of the 34th international conference on machine learning, vol 70, JMLR.org, Sydney, NSW, Australia, ICML’17, pp 1263–1272
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In: Proceedings of the 2005 IEEE international joint conference on neural networks, vol 2, pp 729–734. https://doi.org/10.1109/IJCNN.2005.1555942, ISSN: 21614407
Hagberg AA, Schult DA, Swart PJ (2008) Exploring network structure, dynamics, and function using networkx. In: Varoquaux G, Vaught T, Millman J (eds) Proceedings of the 7th Python in Science Conference, Pasadena, CA USA, pp 11–15
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st international conference on neural information processing systems, Curran Associates Inc., Long Beach, California, USA, NIPS’17, pp 1025–1035
Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Appl Comput Harmonic Anal 30(2):129–150. https://doi.org/10.1016/j.acha.2010.04.005
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392. https://doi.org/10.1137/S1064827595287997
Kipf TN, Welling M (2017) Semisupervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings, OpenReview.net
Knyazev B, Taylor GW, Amer M (2019) Understanding attention and generalization in graph neural networks. In: Wallach H, Larochelle H, Beygelzimer A, AlchéBuc F, Fox E, Garnett R (eds) Advances in neural information processing systems 32, Curran Associates, Inc., pp 4202–4212
Kriege NM, Johansson FD, Morris C (2020) A survey on graph kernels. Appl Netw Sci 5(1):6. https://doi.org/10.1007/s4110901901953
Le Gall F (2014) Powers of tensors and fast matrix multiplication. In: Proceedings of the 39th international symposium on symbolic and algebraic computation, pp 296–303
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541
Lee J, Lee I, Kang J (2019) Selfattention graph pooling. In: International conference on machine learning, pp 3734–3743, ISSN: 19387228 Section: Machine Learning
Li M, Ma Z, Wang YG, Zhuang X (2020) Fast Haar transforms for graph neural networks. Neural Netw 128:188–198. https://doi.org/10.1016/j.neunet.2020.04.028
Li Q, Han Z, Wu XM (2018) Deeper insights into graph convolutional networks for semisupervised learning. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the thirtysecond AAAI conference on artificial intelligence, (AAAI18), the 30th innovative applications of artificial intelligence (IAAI18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI18), New Orleans, Louisiana, USA, February 2–7, 2018, AAAI Press, pp 3538–3545
Li Y, Tarlow D, Brockschmidt M, Zemel RS (2016) Gated graph sequence neural networks. In: Proceedings of the 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings
Luzhnica E, Day B, Lio’ P (2019) Clique pooling for graph classification. arXiv:1904.00374
Ma Y, Wang S, Aggarwal CC, Tang J (2019) Graph convolutional networks with EigenPooling. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, Association for Computing Machinery, Anchorage, AK, USA, KDD ’19, pp 723–731, https://doi.org/10.1145/3292500.3330982
Micheli A (2009) Neural network for graphs: a contextual constructive approach. IEEE Trans Neural Netw 20(3):498–511. https://doi.org/10.1109/TNN.2008.2010350
Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. pp 5115–5124
Morris C, Ritzert M, Fey M, Hamilton WL, Lenssen JE, Rattan G, Grohe M (2019) Weisfeiler and leman go neural: higherorder graph neural networks. Proc AAAI Conf Artif Intel 33(01):4602–4609. https://doi.org/10.1609/aaai.v33i01.33014602
Morris C, Kriege NM, Bause F, Kersting K, Mutzel P, Neumann M (2020) TUDataset: a collection of benchmark datasets for learning with graphs. In: ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), arXiv:2007.08663
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856
Poulin V, Théberge F (2019) Ensemble clustering for graphs. In: Aiello LM, Cherifi C, Cherifi H, Lambiotte R, Lió P, Rocha LM (eds) Complex networks and their applications VII, Studies in Computational Intelligence. Springer, Cham, pp 231–243, https://doi.org/10.1007/9783030054113_19
Ranjan E, Sanyal S, Talukdar P (2020) ASAP: adaptive structure aware pooling for learning hierarchical graph representations. Proc AAAI Conf Artif Intel 34(04):5470–5477. https://doi.org/10.1609/aaai.v34i04.5997
RAPIDS Development Team (2018) RAPIDS: collection of libraries for end to end GPU data science
Scarselli F, Yong SL, Gori M, Hagenbuchner M, Tsoi AC, Maggini M (2005) Graph neural networks for ranking Web pages. In: The 2005 IEEE/WIC/ACM international conference on web intelligence (WI’05), pp 666–672, https://doi.org/10.1109/WI.2005.67
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80. https://doi.org/10.1109/TNN.2008.2005605
Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D (2004) BRENDA, the enzyme database: updates and major new developments. Nucl Acids Res. https://doi.org/10.1093/nar/gkh081
Shervashidze N, Schweitzer P, Leeuwen EJV, Mehlhorn K, Borgwardt KM (2011) WeisfeilerLehman graph kernels. J Mach Learn Res 12:2539–2561
Simonovsky M, Komodakis N (2017) Dynamic edgeconditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3693–3702
Tomita E, Tanaka A, Takahashi H (2006) The worstcase time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci 363(1):28–42. https://doi.org/10.1016/j.tcs.2006.06.015
Traag VA, Waltman L, van Eck NJ (2019) From Louvain to Leiden: guaranteeing wellconnected communities. Sci Rep 9(1):5233. https://doi.org/10.1038/s4159801941695z
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International conference on learning representations
Vinyals O, Bengio S, Kudlur M (2016) Order matters: sequence to sequence for sets. In: Proceedings of the 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings
Vishwanathan SVN, Schraudolph NN, Kondor R, Borgwardt KM (2010) Graph Kernels. J Mach Learn Res 11(40):1201–1242
Wale N, Watson IA, Karypis G (2008) Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inform Syst 14(3):347–375. https://doi.org/10.1007/s1011500701035
Wang Y, Li M, Ma Z, Montufar G, Zhuang X, Fan Y (2020) Haar Graph Pooling. Proc Int Conf Mach Learn 1
Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi K, Jegelka S (2018) Representation learning on graphs with jumping knowledge networks. In: International conference on machine learning, PMLR, pp 5453–5462, ISSN: 26403498
Xu K, Hu W, Leskovec J, Jegelka S (2019) How powerful are graph neural networks? In: International conference on learning representations
Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, association for computing machinery, New York, NY, USA, KDD ’15, pp 1365–1374, https://doi.org/10.1145/2783258.2783417
Ying R, You J, Morris C, Ren X, Hamilton WL, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. In: Proceedings of the 32nd international conference on neural information processing systems, Curran Associates Inc., Montréal, Canada, NIPS’18, pp 4805–4815
Yuan H, Ji S (2019) StructPool: structured graph pooling via conditional random fields
Zhang L, Wang X, Li H, Zhu G, Shen P, Li P, Lu X, Shah SAA, Bennamoun M (2020) Structurefeature based graph selfadaptive pooling. In: Proceedings of the web conference 2020, Association for Computing Machinery, New York, NY, USA, WWW ’20, pp 3098–3104, https://doi.org/10.1145/3366423.3380083
Zhang M, Cui Z, Neumann M, Chen Y (2018) An endtoend deep learning architecture for graph classification. In: Proceedings of the thirtysecond AAAI conference on artificial intelligence, (AAAI18), the 30th innovative applications of artificial intelligence (IAAI18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI18), New Orleans, Louisiana, USA, February 2–7, 2018, pp 4438–4445
Acknowledgements
This work partially supported by the EU H2020 project TAILOR (n.952215), and MIUR under PRIN Project AHeAD (n.20174LF3T8).
Funding
Open access funding provided by Università di Pisa within the CRUICARE Agreement.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Responsible editor: Annalisa Appice, Sergio Escalera, Jose A. Gamez, Heike Trautmann.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bacciu, D., Conte, A., Grossi, R. et al. Kplex cover pooling for graph neural networks. Data Min Knowl Disc 35, 2200–2220 (2021). https://doi.org/10.1007/s1061802100779z
Received:
Accepted:
Published:
Issue Date:
Keywords
 Graph pooling
 Graph neural network
 Kplex
 Graph covering
 Pseudoclique