The Law of Attraction: Affinity-Aware Placement Optimizationusing Graph Neural Networks
Yi-Chen [email protected]
Georgia Institute of TechnologyAtlanta, Georgia USA
Georgia Institute of TechnologyAtlanta, Georgia, USA
Sung Kyu [email protected]
Georgia Institute of TechnologyAtlanta, Georgia USA
ABSTRACTPlacement is one of the most crucial problems in modern Elec-tronic Design Automation (EDA) flows, where the solution qualityis mainly dominated by on-chip interconnects. To achieve targetclosures, designers often perform multiple placement iterations tooptimize key metrics such as wirelength and timing, which is highlytime-consuming and computationally inefficient. To overcome thisissue, in this paper, we present a graph learning-based frameworknamed PL-GNN that provides placement guidance for commercialplacers by generating cell clusters based on logical affinity and man-ually defined attributes of design instances. With the clusteringinformation as a soft placement constraint, commercial tools willstrive to place design instances in a common group together duringglobal and detailed placements. Experimental results on commercialmulti-core CPU designs demonstrate that our framework improvesthe default placement flow of Synopsys IC Compiler II (ICC2) by3.9% in wirelength, 2.8% in power, and 85.7% in performance.
CCS CONCEPTSโขHardwareโ Electronic design automation;Methodologiesfor EDA; Placement.
ACM Reference Format:Yi-Chen Lu, Sai Pentapati, and Sung Kyu Lim. 2021. The Law of Attraction:Affinity-Aware Placement Optimization using Graph Neural Networks. InProceedings of the 2021 International Symposium on Physical Design (ISPD โ21),March 22โ24, 2021, Virtual Event, USA. ACM, New York, NY, USA, 8 pages.https://doi.org/10.1145/3439706.3447045
1 INTRODUCTIONPlacements of Application-Specific Integrated Circuits (ASICs) re-quire designers to place millions or even billions of gate-level in-stances on constrained physical layouts, which are performed bysophisticated commercial tools in modern physical design (PD)flows. However, with the ever increasing design complexity drivenby Mooreโs Law, commercial EDA tools are struggled with achiev-ing high-quality placements without spending significant amountof time performing placement iterations to achieve target closures.
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] โ21, March 22โ24, 2021, Virtual Event, USAยฉ 2021 Association for Computing Machinery.ACM ISBN 978-1-4503-8300-4/21/03. . . $15.00https://doi.org/10.1145/3439706.3447045
synthesis
floorplanning
RTL netlist PL-GNN
gate-level netlist placement
optimized
placement
cell clusters
PD stage
element
proposed
framework
Figure 1: PL-GNN powered design flow. The cell clusters de-termined by our framework PL-GNNare taken as placementguidance. During placement, Synopsys ICC2 will spend ef-fort in grouping the cells within a common effort together.
It is well known that placement directly impacts the final qualityof a full-chip design, and the logical affinity among design instancesdominates the quality of the placement. To achieve a high-qualityplacement in terms of key quality of result (QoR) metrics, designershave to understand the underlying data flows in order to set instruc-tions for commercial tools to place the design instances accordingly.In modern PD flows, this process is called โplacement guidanceโ,which heavily relies on the knowledge of experienced designers.
In recent years, placement guidance has become a must-use stepto achieve high-quality placements in the semiconductor industry. Itoptimizes default placement flows in commercial tools by informingplacers about the design instances that should better be placednearby in actual physical layouts in order to optimize key designmetrics. With the given grouping information, commercial placerswill spend effort in grouping the cells in a common cluster togetherduring the placement process. However, as mentioned, performingplacement guidance requires in-depth design-specific knowledge,which is only achievable by experienced designers who knows theunderlying data flows in Register-Transistor Level (RTL) well.
To overcome the above issue, in this paper, we present a universalplacement optimization framework named PL-GNN that providesautomated and accurate placement guidance for any design with-out requiring users to have profound design knowledge. Figure 1shows the PL-GNN powered design flow, where our frameworkwill determine the cell clusters in an unsupervised manner whichserve as placement guidance in order to guide commercial placers tooptimize the key metrics such as wirelength, power, and timing byplacing cells with a common cluster together. PL-GNN is consistedof two stages. First, given a netlist, we perform unsupervised node
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
7
ISPD โ21, March 22โ24, 2021, Virtual Event, USA Yi-Chen Lu, Sai Pentapati, and Sung Kyu Lim
...node representation
learning (unsupervised)
Netlist Graph
K-Means Clustering
(create placement groups)
Commercial
Placer
(Synopsys ICC2)
optimized placement
Initial Node
Features
Figure 2: Overview of PL-GNN framework. Given a netlist graph and the initial node features, we first perform node represen-tation learning to transform the initial features into better representations that accurately characterize the underlying designknowledge. Then, with the learned node embeddings, we perform weighted K-means clustering to determine the placementgroups as placement guidance for a commercial placer. Based on the provided grouping information, the placer will spendeffort in placing the instances in a common group together during global and detailed placements.
representation learning using graph neural networks (GNNs) basedon the initial features manually defined for each design instance.The goal of node representation learning is to learn accurate noderepresentations that are related to the underlying logical affinityand attributes of a given netlist. In the second stage, based on thelearned representations, we leverage the weighted K-means clus-tering algorithm [3] to group instances into different clusters. Tofind the optimal number of groups for clustering, we introduce theSilhouette score [19] and perform sweeping experiments to findthe sweet spot. As aforementioned, the final clustering results areutilized as placement guidance for commercial placers. In this work,we target the renowned commercial physical design tool SynopsysIC Compiler II (ICC2) [20] as our baseline, and demonstrate that theproposed framework significantly improves the default placementflow of ICC2 on commercial multi-core CPU designs.
The goal of this work is to provide designers a placement op-timization framework that achieves high-quality placements forgeneral designs by distilling underlying design knowledge. Notethat since our framework learns the node representation for ev-ery design instance by optimizing an unsupervised loss function,it is generalizable to any design. In addition, PL-GNN does notassume any pre-defined netlist structure. Instead, it adapts to dif-ferent netlists through novel graph embedding techniques. Finally,although we take Synopsys ICC2 as the reference tool in this work,our framework can easily be integrated with other physical designtools to significantly improve the placement quality.
2 RELATEDWORKS AND MOTIVATIONS2.1 Learning-Based Placement OptimizationRecently, the authors of [15] propose DREAMPlace, which utilizesGPUs with deep learning toolkits to significantly accelerate theruntime of analytical placers. However, the proposed method doesnot improve the placement solution quality because the underlyingplacement algorithms remains the same. To optimize placement
quality, the authors of [7] map the traditional placement probleminto a reinforcement learning (RL) problem and present the usageof applying GNNs to encode netlist features. In [17], a completeRL framework is proposed to perform floorplanning for memorymacros of Google TPU designs, where a force-directed method isintroduced to place standard cells. It is shown that the achieved finaldesigns through RL agents outperform the ones built by designersin much shorter turn-around-time. Nonetheless, the proposed RLframework [17] only focuses on optimizing the locations of memorymacros, where the logical affinity among standard cells are themost dominated factor to achieve high-quality placements. Anotherwork [25] proposes a detailed placement optimization techniquebased on the prediction of pin assignment, where the goal is tofine-tune the placement with a pre-trained model for minimizationof design rule violations (DRVs) after routing. However, since thepre-trained model is obtained through a online and supervisedmanner, the proposed method is subject to the underlying designflow. Furthermore, the improvement on the placement quality withthe incremental update of the pre-trained model is minor.
2.2 Placement PredictionAs for placement prediction, previous work [6] proposes a newhypergraph to clique-based graph transformation model and lever-ages the Louvain modularity-based clustering method [1] to predictplacement relevant cell clusters, where the goal is to predict thedesign instances that will be placed nearby in the actual physi-cal layouts. They demonstrate that the adopted clustering methodbetter predicts the final placement results than the renowned k-way partitioning algorithm [12] (hMETIS) under evaluations ofDaviesโBouldin index (DBI) [2]. However, the applications of suchprediction are limited, because it is subject to a fixed placementflow, which means when the flow is changed, the prediction will beinaccurate. Another work [16] develops a method to encode place-ment features using transfer learning with layout images. However,
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
8
The Law of Attraction: Affinity-Aware Placement Optimization using Graph Neural Networks ISPD โ21, March 22โ24, 2021, Virtual Event, USA
aggregated
featuresh1
h2
h3
...
(b)(a)
(c)
target
K=1
K=2
Figure 3: Illustration of GNN aggregation process on a VLSInetlist. Given a netlist as shown in (a), we first transform thedirected hypergraph (original representation of the netlist)into an undirected clique-based graph as shown in (b). Then,based on the clique-based graph, for each node, we per-form feature aggregation on its neighborhood from ๐พ =
{1, 2, ..., ๐พ}, and finally (c) obtain the final representations.
the presented encoding method is at graph-level, where a singledesign is encoded into one single vector. Therefore, it cannot beutilized to cluster instances within a design.
2.3 MotivationsIn this paper, we aim to overcome all the drawbacks presentedabove. We develop a graph learning-based placement optimizationframework that significantly improves the standard industrial place-ment flow. Unlike [17] that uses RL to optimize macro placement,our framework focuses on optimizing the standard cell placement(global and detailed placements) by considering the netlist affinityand node-level hierarchy features. In addition, we present a detailedcomparison with previous work [6] and demonstrate that the pro-posed graph learning-based technique better guide the commercialplacer (ICC2) to optimize placement quality than the modularity-based clustering method [1].
3 PL-GNN FRAMEWORK3.1 OverviewGNNs are powerful to encode underlying graph knowledge intorepresentative knowledge. They perform effective graph represen-tation learning by aggregating the features from one node withits neighbors (not limited to direct 1-hop neighbors) in a messagepassing scheme. The initial features of a node are thus being trans-formed iteratively into better representations that are related tothe objective of feature aggregation. These learned representations(transformed features) can be further utilized in downstream taskssuch as node classification, clustering, or link prediction. In this
paper, we devise an unsupervised loss function that serves as an ob-jective for the feature aggregation process. With the aggregated fea-tures, we leverage the weighted K-means clustering algorithm [3]to determine the standard cells clusters that should better be placednearby.
Figure 2 presents a high-level overview of our PL-GNN frame-work. Since VLSI netlists are originally represented as hypergraphs,given a netlist, we first transform the directed hypergraph intoan undirected clique-based graph, where a net that originally con-tains ๐ cells will form a ๐-clique. Then, based on the transformedclique-based graph and the initial node features we define for eachinstance, we leverage GraphSAGE [9], a variant of GNNs, to performunsupervised node representation learning. GNN can be consid-ered as a โgraph filterโ that iterates through every design instanceto transform its initial features into better representations by ag-gregating its neighboring information. Figure 3 demonstrates theillustration of graph learning process on a VLSI netlist. First, asaforementioned, we transform the hypergraph (Figure 3 (a)) intoa clique-based graph (Figure 3 (b)). Then, we leverage GNNs toperform node representation learning as shown in Figure 3 (b). Inthis work, our GNN has two layers, where each of them is dedicatedto aggregate the neighboring features at a specific hop of neigh-borhood. Finally, after the graph learning is complete, we leveragethe weighted K-means clustering algorithm [3] to determine theplacement groups based on the learned representations, where thecell area is taken as the weight.
The placement groups determined by the clustering algorithmsare taken as the โplacement guidanceโ for the commercial placer,where in this work we take Synopsys ICC2 as the baseline flow.This clustering information is expected to help the commercialtool optimize placement quality by placing instances in a commoncluster together in the actual physical layout. The key idea is thatinstances in a common cluster will have stronger affinity to thosein different clusters, since they are being grouped by performinggraph representation learning, which transforms the initial featuresby distilling the underlying design knowledge it terms of logicalaffinity. Therefore, by knowing which instances should better beplaced together based on their affinity, the commercial tool willbe able to insert less buffers to meet timing constraints, becausecells with stronger affinity usually means that they have moreconnections. Note that in this work, the placement of memorymacros is achieved manually based on design manuals providedby the design-house. This work focuses on improving global anddetailed placements of standard cells. In the following sections, weillustrate the underlying algorithms in detailed.
3.2 Graph Model for Netlist TransformationAs mentioned above, VLSI netlists are originally represented asa directed hypergraph, which is not applicable for many graphoptimization techniques. Therefore, throughout the years, exten-sive research has been conducted extensively to find appropriategraph models to transform a netlist from a directed hypergraphinto a โnormalโ graph representation where one edge only containstwo vertices. The clique-based model is one of the most populargraph transformation model [22] where the edge weight๐ค๐ in the
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
9
ISPD โ21, March 22โ24, 2021, Virtual Event, USA Yi-Chen Lu, Sai Pentapati, and Sung Kyu Lim
a b c
a b c
d e f
ad be bf
adg
g
beh
h
level one
level two
level three
virtual root
Trie with hierarchies
as keys (edge attributes)
logic levels to
memory macros (|V| x |M|)
Initial Node
Features
t-SNE visualization
(colored by level-one hierarchy)
Figure 4: Construction and visualization of initial node features (colored in red), which are obtained fromdesign hierarchy andlogical affinity of memory macros. Alphabets on the edges of the trie structure denote hierarchies at different levels, whereeach node has a unique encoding obtained by concatenating the edge attributes on the path starting from the root to itself.Note that the initial features are further transformed to better representations through graph learning.
transformed undirected graph has a weight as
๐ค๐ =1
|๐๐ | โ 1, (1)
where |๐๐ | denotes the number of gates the edge (net) is connectedto in the original hypergraph. The key rationale of this transfor-mation is to keep the total weight of a net consistent between thetwo graphs. Many renowned algorithms [8, 13] adopt this clique-based model to solve the graph partitioning problem. Still, otherimproved transformation techniques [10, 21] are developed to tacklethe ever-evolving crucial EDA problems. However, although theabove approaches aim to find the best transformation model forgeneral EDA problems, in [11], it is proven that there is no trans-formation that preserves the original information intact. In thispaper, we adopt the transformation model based on Equation 1 totransform the original netlist into an undirected clique-based graphprior to the graph learning process.
3.3 Initial Node FeaturesPrior to the graph learning process, given an undirected graph๐บ = (๐ , ๐ธ), as shown in Figure 4, we determine an initial featurevector for each instance ๐ฃ โ ๐ based on its hierarchy informationand the logical affinity with memory macros ๐ in the design. Toencode the hierarchy information, we implement a trie [4] (suffixgraph) data structure, where the keys are the hierarchies in differentlevels. Since in a gate-level netlist, the name of a design instancetakes a combination of multiple hierarchies as its prefix, there isa unique mapping from an instance in the design to a node in thetrie. For example, a cell may have a name as โa/d/gโ, where โaโis the first-level hierarchy, โdโ is the sub-hierarchy of โaโ, and โgโis the instance name defined in โdโ. The combination of โaโ, โdโ,and โgโ (โa/d/gโ) is unique, but each of them may be not (e.g. theremight be two or more edges in the trie with the attribute โgโ). Notethat since the length of the node attributes varies in the trie, weperform zero-padding to ensure every instance to have a commonlength of features. The reason we take hierarchy information asfeatures is because instances with a common hierarchy tend to havemore connections compared with those in different hierarchies, andthese interconnects dominate the placement quality. Apart from
the hierarchy information, for each design instance ๐ฃ , we also takeits logical levels to memory macros ๐ as features, which resultsin a vector in ๐ |๐ | . The reasons we introduce the memory relatedfeatures is because the logic to memory paths are often the criticaltiming paths. Finally, we concatenate the hierarchy features withthe memory features to form the initial node representations.
3.4 Node Representation LearningWith the initial node features presented, we now illustrate theprocess of performing node representation learning using Graph-SAGE [9]. The goal of graph learning is to obtain the node represen-tationsโ๐พ๐ฃ (๐พ denotes the aggregation level) that better characterizethe underlying design knowledge than the initial features โ0๐ฃ foreach node ๐ฃ๐ โ ๐ , where ๐บ = (๐ , ๐ธ) represents the transformedclique-based graph. Figure 3 demonstrates the illustration of thegraph learning process on a target node colored in purple, wherethe key idea is to leverage GNNs to aggregate the information fromits neighboring nodes based on the underlying logical affinity withconsideration of the node attributes (features). This aggregationprocess is performed iteratively, where for each level (hop) ๐พ = ๐ ,there is a GNN layer (a one-layer neural network) dedicated toaggregate the features at the specific hop of neighborhood of thetarget node. These ๐พ layers form the overall GNN module. Notethat since the number of neighbors grow exponentially as the hop-count increases, to stabilize the training process and to preventoverfitting, we limit the number of neighbors ๐ ๐ to be aggregatedat each level ๐ .
For a node ๐ฃ โ ๐ , the representations at level ๐ is obtained as:
โ๐โ1๐๐ (๐ฃ) = reduce_mean
({W๐๐๐
๐โ๐โ1๐ข , โ๐ข โ ๐๐ (๐ฃ)}
),
โ๐๐ฃ = ๐
(W๐๐๐ ๐
๐ยท concat[โ๐โ1๐ฃ , โ๐โ1
๐๐ฃ (๐ฃ) ]),
(2)
where ๐ is the sigmoid function, โ๐๐ฃ denotes the representationvector of node ๐ฃ at level ๐ , ๐๐ (๐ฃ) denotes the neighbors sampledat ๐-hop which is subject to the sampling size ๐ ๐ ,๐
๐๐๐
๐and๐ ๐๐๐ ๐
๐denote the aggregation and projection matrices respectively, whichtogether form the neural layer at level ๐ . Note that the concept of"level" is corresponding to the concept of "hop", where โ0๐ฃ is the
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
10
The Law of Attraction: Affinity-Aware Placement Optimization using Graph Neural Networks ISPD โ21, March 22โ24, 2021, Virtual Event, USA
layer
twoa
b
d
e
input graph
PROJ
AGG
a b d
a
PROJ
AGG
b a d
b
e
PROJ
AGG
d a b
d
h1 d
e
AGGPROJ
a
feature
layer
one
h0 dh0 bh0 a h0 b h0 a h0 d h0 e h0 d h0 a h0 b h0 e
h1 a h1 b
h2 a
aggregation
Figure 5: Illustration of feature aggregation process in node representation learning. We leverage a two-layer GNN to deter-mine the final representation of node โaโ in the input graph by considering information within its 2-hop neighborhood. AGGdenotes the aggregation matrix and PROJ denotes the projection matrix in Equation 2. Gradient descent is utilized to updatethe parameters of these two matrices by minimizing Equation 3.
initial features of node ๐ฃ , and โ๐=๐พ๐ฃ is the final representation afteraggregation the information within the ๐พ-hop neighborhood of ๐ฃ .Figure 5 demonstrates the illustration of the feature aggregationprocess of the target node โaโ (colored in red) in the input graph,which is shown that GNN is not a traditional fully-connected net-work, since all the node features of the previous level have to beprocessed through Equation 2 before performing the aggregation ofthe current level. In the implementation, our GNN module has twolayers, which means for each design instance we would performfeature aggregation within its two-hop neighborhood to obtainbetter representations. These transformed features are further uti-lized to cluster design instances into placement groups through theweighted K-means clustering algorithm [3].
3.5 Unsupervised Loss FunctionPrevious sub-section introduces the forward process of graph learn-ing. To update the parameters {๐๐ } during back propagation, weintroduce an unsupervised loss functionL as the objective function,where L takes the form of
L(โ๐ฃ) = โโ
๐ขโ๐ (๐ฃ)log(๐ (โโค๐ฃ โ๐ข ))โ
๐โ๐=1E๐๐โผ๐๐๐ (๐ฃ) log(๐ (โโ
โค๐ฃ โ๐๐ )),
(3)where ๐๐๐(๐ฃ) represents the negative sampled nodes in the per-spective node ๐ฃ , and๐ represents the negative sampling size. Thenegative sampled nodes are the nodes that are distant (not within2-hop neighborhood) from the target node ๐ฃ in the clique-basedgraph, and at each iteration, these nodes are re-sampled. The reasonwe introduce this negative sampling technique is because we notonly want to enhance the similarity between the target node andits neighbors, but also want to maximize the dissimilarity of thetarget node with the nodes that are distant from it. This negativesampling technique is known to help improve the efficiency ofgraph learning by providing faster loss convergence. Essentially,Equation 3 encourages nodes that share common neighborhoods tohave similar representations, and penalizes similarity to the onesthat are distant. By minimizing Equation 3 using gradient descent,
we can update the parameters in the GNN module. Note that sinceour objective function is defined in an unsupervised manner, ourframework has the ability to adapt to various netlists since it doesnot require any pre-train process.
3.6 Training MethodologyAlgorithm 1 summarizes the graph learning process of our PL-GNNframework. Lines 3โ10 illustrate the forwarding process of graphlearning (feature aggregation), where for each node ๐ฃ โ ๐ , we trans-form its initial features โ0๐ฃ to โ๐พ๐ฃ by aggregating its neighboringfeatures at each level (hop) through Equation 2. Note that prior tothe aggregation at each level, we normalize the node representa-tions at previous level in Line 2 and Line 9. This normalization helpsimprove the convergence of the overall training process by reduc-ing the oscillation of gradient descent. Finally, based on the learnedrepresentation vectors, in Lines 12โ19 we calculate the unsuper-vised loss based on the aggregated features by Equation 3, where anegative sampling technique (Lines 14โ15) is leveraged to improvethe overall training process. Finally, to update the parameters inthe the framework, we leverage Adam [14], a renowned gradientdescent optimizer to minimize the loss function. The overall train-ing process takes about an hour on each CPU design utilized in thiswork based on a machine with a 2.40๐บ๐ป๐ CPU and a NVIDIA RTX2070 graphic cards with 16๐บ๐ต memory.
3.7 Complexity Analysis of Graph LearningTime Complexity. The time complexity of the proposed frame-work is linear with respect to the netlist size. Since the sampling size๐ ๐ at each aggregation level is fixed, our GNN module which actsas a โgraph filterโ only spends constant amount of time in visitingevery design instance and collecting features from its neighbors.Space Complexity. Instead of storing the graph connectivity ina |๐ |๐ฅ |๐ | matrix which requires ๐ (๐2) space complexity, we storethe connectivity information in the compressed sparse row (CSR)format [23] thanks to the high sparsity of the netlist adjacency ma-trix. Therefore, the space complexity is far less than ๐ (๐2), whichcan be considered as pseudo-linear.
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
11
ISPD โ21, March 22โ24, 2021, Virtual Event, USA Yi-Chen Lu, Sai Pentapati, and Sung Kyu Lim
Algorithm 1 Graph learning in PL-GNN.We use default values of ๐ผ = 0.001, ๐พ = 2, ๐ = 30, ๐ 1 = 10, ๐ 2 =
5, ๐ฝ1 = 0.9, ๐ฝ2 = 0.999.
Input: ๐บ (๐ , ๐ธ): clique-based graph. {โ0}: initial node features. ๐ผ :learning rate, ๐พ : maximum aggregation level,๐ : negative sam-pling size, {๐ ๐ ,โ๐ โ {1, ..., ๐พ}}: k-hop neighborhood samplingsize, ๐ : sigmoid function, {๐๐ ,โ๐ โ {1, ..., ๐พ}}: parameters ofNN at hop (level) k, {๐ฝ1, ๐ฝ2}: Adam parameters.
Output: {๐ฆ}: learned node representations.1: while {๐๐ } do not converge do2: โ0๐ฃ โ
โ0๐ฃโฅโ0๐ฃ โฅ2
,โ๐ฃ โ ๐3: for ๐ โ 1 to ๐พ do โฒ feature aggregation4: for ๐ฃ โ ๐ do5: ๐๐ (๐ฃ) โ Sample ๐ ๐ neighbors at ๐-hop6: โ๐
๐๐ (๐ฃ) = reduce_mean({W๐๐๐
๐โ๐โ1๐ข , โ๐ข โ ๐๐ (๐ฃ)}
)7: โ๐๐ฃ = ๐ ๐๐๐๐๐๐
(W๐๐๐ ๐
๐ยท concat[โ๐โ1๐ฃ , โ๐
๐๐ฃ (๐ฃ) ])
8: โ๐๐ฃ โโ๐๐ฃโฅโ๐๐ฃ โฅ2
,โ๐ฃ โ ๐
9: ๐ฆ๐ฃ โ โ๐พ๐ฃ ,โ๐ฃ โ ๐10: for ๐ฃ โ ๐ do โฒ minimize unsupervised loss11: for ๐ข โ ๐ (๐ฃ) do12: ๐๐ (๐ฃ) โ Sample๐ samples from {๐ โ ๐ (๐ฃ)} \ ๐ฃ13: ๐๐๐_๐๐๐ ๐ โ โ
๐๐ โ๐๐ (๐ฃ) log(๐ (โ๐ฆโค๐ฃ ๐ฆ๐๐ ))
14: ๐๐ฃ โ โ๐ [log(๐ (๐ฆโค๐ฃ ๐ฆ๐ข )) + ๐๐๐_๐๐๐ ๐ ]15: {๐๐ } โ ๐ด๐๐๐(๐ผ, {๐๐ }, ๐๐ฃ, ๐ฝ1, ๐ฝ2)
Algorithm 2 Weighted K-means Clustering.Learned representations {โ๐ฃ} from Algorithm 1 are taken as {๐ฆ}.Cell areas are taken as node weights {๐ค}.Input: ๐บ (๐ , ๐ธ): clique-based graph, {๐ค}: node weights, {๐ฆ}: node
representations, ๐ : number of clusters.Output: {๐ถ1, ...,๐ถ๐ }: ๐ clusters.1: Select ๐ initial centroids {๐1, ..., ๐๐ } randomly2: repeat3: {๐ถ1, ...,๐ถ๐ } = argmin๐ถ
โ๐=2๐=1
โ๐ฃโ๐ถ๐
๐ค (๐ฃ)โฅ๐ฆ๐ฃ โ ๐๐ โฅ2
4: ๐๐ =โ
๐ฃโ๐ถ๐๐ฆ๐ฃ๐ค (๐ฃ)โ
๐ฃโ๐ถ๐๐ค (๐ฃ) ,โ๐ = 1, ..., ๐
5: until {๐ถ1, ...,๐ถ๐ } no longer change
3.8 Clustering for Placement GuidanceAfter obtaining the learned representations, we leverage theweightedK-means clustering algorithm [3] to cluster design instances intoplacement groups to perform placement guidance in Synopsys ICC2.Algorithm 2 summarizes our clustering algorithm, where the cellareas of design instances are taken as the node weights. Giventhe learned node representations {๐ฆ} = {โ๐=3} from Algorithm 1and the weights {๐ค} as cell areas, the algorithm will strive to clus-ter all the nodes ๐ into ๐ weight-balanced groups by minimizingthe Euclidean distance of every node to its assigned centroid. Theobjective function of clustering is derived as
L๐๐๐๐๐ =
๐โ๐=1
โ๐ฃโ๐ถ๐
๐ค (๐ฃ) ยท โฅ๐ฆ๐ฃ โ ๐๐ โฅ2, (4)
where ๐๐ =โ
๐ฃโ๐ถ๐๐ฆ๐ฃ๐ค (๐ฃ)โ
๐ฃโ๐ถ๐๐ค (๐ฃ) denotes the weighted centroid of cluster
๐ถ๐ . Equation 4 is updated in an iterative manner. The key idea isthat in each iteration, we assign each node belongs to the clusterthat it has the minimum distance with, where the procedure worksas follows:
โข Starting from an initial centroids {๐1, ..., ๐๐ }, for each iteration,we determine the clusters {๐ถ1, ...,๐ถ๐ } by assigning each node tothe centroid that has the minimum weighted distance (Line 3).โข After the assignments, we update the centroids based on thenewly obtained clusters (Line 4).โข Repeat previous two steps until the locations of the centroids nolonger change.
To determine the optimal number of clusters (the optimal ๐พ),we perform sweeping experiments from ๐ = 8 to ๐ = 32 basedon the Silhouette score [19]. Here, we explain the calculation ofthe Silhouette score given a clustering result, which is consisted oftwo parts. In the first part, for a node ๐ฃ , we calculate the averagedistance ๐(๐ฃ) between it and other nodes in a common cluster,which is derived as:
๐(๐ฃ) = 1|๐ถ๐ฃ | โ 1
โ๐โ๐ถ๐ฃ ,๐ฃโ ๐
โฅ๐ฆ๐ฃ โ ๐ฆ๐ โฅ2, (5)
where๐ถ๐ฃ represents that cluster that node ๐ฃ belongs to. The averagedistance ๐(๐ฃ) (Equation 5) essentially represents how well the node๐ฃ is assigned to its current cluster. Note that the reason the sumof distance is weighted by 1
|๐ถ๐ฃ |โ1 is because the distance from ๐ฃ toitself is not included in the summation. Besides ๐(๐ฃ), we calculateanother metric ๐ (๐ฃ) which represents the smallest distance of ๐ฃ toevery other node in a different cluster ๐ถ
โฒ๐ฃ , which is derived as:
๐ (๐ฃ) = min๐โ ๐1|๐ถ๐ |
โ๐โ๐ถ๐
โฅ๐ฆ๐ฃ โ ๐ฆ๐ โฅ2 . (6)
In Equation 6, the cluster ๐ถ๐ that results in the minimum ๐ (๐ฃ)represents the second-best cluster for the target node ๐ฃ . Therefore,the greater ๐ (๐ฃ) is, the better the target node ๐ฃ is assigned to itscurrent cluster. Finally, the Silhouette score for a node ๐ฃ is definedas
๐ (๐ฃ) = ๐ (๐ฃ) โ ๐(๐ฃ)๐๐๐ฅ{๐(๐ฃ), ๐ (๐ฃ)} . (7)
Note that Equation 7 is defined in node-level. We take the averageof ๐ (๐ฃ) over every node (design instance) in the clique-based graphto evaluate how well the current clustering results is (the higher theaverage of ๐ (๐ฃ), the better the result). To find the best number ofclusters (optimal ๐พ of K-Means), we perform sweeping experimentsover ๐พ and take the one that produces the highest Silhouette scoreas the final clustering solution for placement guidance.
4 MODULARITY-BASED CLUSTERINGIn this work, besides leveraging the presented framework PL-GNNto determine cell clusters for placement guidance, we also imple-ment the Louvain modularity-based clustering method [1] adoptedby previous work [6] to perform placement guidance to performcomparison with our framework. The modularity metric is first
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
12
The Law of Attraction: Affinity-Aware Placement Optimization using Graph Neural Networks ISPD โ21, March 22โ24, 2021, Virtual Event, USA
Table 1: Our commercial benchmarks and their attributes inTSMC 28nm.
Design Name # Cells # Flip-Flops # Nets # MacrosCPU-Design-A 202791 22366 206224 21CPU-Design-B 537085 47552 542391 29
proposed in [18], which is expressed as
๐ =12๐
โ๐, ๐
[๐ธ๐ ๐ โ
๐๐๐ ๐
2๐
]๐ฟ (๐๐ , ๐ ๐ ), (8)
where๐ denotes the total edge weights in the clique-based graph,๐ธ๐ ๐ denotes the edge weight between node ๐ and node ๐ , ๐๐ and ๐ ๐denote the sum of edge weights for nodes ๐ and ๐ ,๐ถ๐ and๐ถ ๐ denotethe clusters nodes ๐ and ๐ belong to, and finally ๐ฟ (ยท) denotes the in-dicator cluster which returns 1 when the two clusters are connected,and 0 otherwise. Based on the modularity metric defined above, theLouvain clustering method [1] can efficiently group millions of in-stances into clusters with๐ (๐ ยท๐๐๐2๐) time complexity. The numberof clusters are determined automatically by maximizing Equation 8.Therefore, no sweeping experiment is needed as in the clusteringapproach (Section 3.8) above. However, Louvain performs the opti-mization in a greedy manner, where small clusters are first found byperforming optimization on local subgraphs, and then beingmergedgreedily to determine the final solutions. Furthermore, this clus-tering method does not take cell attributes into account, which ispurely achieved by minimizing connectives among clusters. Hence,it cannot comprehend the underlying design knowledge. In theexperiments, we perform head-to-head comparison between theproposed framework PL-GNN and the modularity-based cluster-ing algorithm Louvain on optimizing placement quality throughplacement guidance in Synopsys ICC2.
5 EXPERIMENTAL RESULTSIn this work, we validate the proposed framework PL-GNN on twocommercial multi-core CPU designs in the TSMC 28nm technologynode. Their attributes after performing synthesis from SynopsysDesign Compiler are shown in Table 1. Due to the confidentiality, inthis paper, we name the two designs as โCPU-Design-Aโ and โCPU-Design-Bโ, respectively. As aforementioned, the memory macros ofthe two designs are placed by experienced designers based on designmanuals from the design-house. In the experiments, we take thedefault placement flow in Synopsys IC Compiler II (ICC2) as our base-line, and demonstrate the placement optimization results achievedby our framework. Note that in ICC2, the placement groups arecreated by using the command โcreate_placement_attraction {in-stance_list}โ, which are taken as soft placement constraints by thecommercial placer. With this clustering information, ICC2 willspend effort in grouping the cells in a common group together dur-ing global and detailed placements. Finally, the PL-GNN frameworkand the modularity-based method [1] is implemented in Python3with the PyTorch Geometric [5] library.
5.1 Optimization Results on CPU DesignsThe detailed optimization results are shown in Table 2. Comparedwith the default placement flow in ICC2, the proposed framework
PL-GNN achieves up to 3.9% wirelength, 2.8% power, and 85.7%performance improvements. Furthermore, it is demonstrated thatthe proposed method outperforms the modularity-based methodLouvain [1] adopted by the previous work [6]. The main reasonis that Louvain simply determines the clustering groups based ongraph connectivity in a greedy manner as mentioned in Section 4,where our framework not only considers underlying logical affinitywhen determining the cell clusters for placement guidance, but alsotakes the node attributes (presented in Figure 4) that are crucial tothe final placement quality into account.
5.2 DiscussionsTiming Perspective. Timing is a highly critical objective duringthe placement stage in modern PD flows, which is as importantas the minimization of wirelength. Timing quality of a placementdirectly impacts the final quality of a full-chip design. In the PDflow of Synopsys ICC2, prior to the placement stage, all the buffersinserted in the synthesis stage are removed in the gate-level netlistin order to give placement engine more freedom to achieve tim-ing closures (same situation also happens in Cadence Innovus). Themain rationale is that with the actual physical information of designinstances, the commercial tool will have more accurate informa-tion regarding the strength or the number of buffers needed tomeet timing closures compared with the synthesis stage. In Ta-ble 2, we observe that with the placement guidance provided by ourframework, the placement engine in ICC2 inserts significantly lessbuffers compared with the default placement flow. The reasons aretwo-fold. First, the features (defined in Figure 4) take the impact ofmemory macros into account for determine the placement groups,and the logic-to-memory paths are usually the critical timing paths.Therefore, with this information, the commercial placer is ableto place the cells that could potentially lead to bad timing resultstogether. Second, the hierarchy features also impact the timingresults, because instances within a common hierarchy tend to havemore connections that those in different hierarchies. Therefore, ifinstances in a common hierarchy are physically placed in distant,more buffers will need to be inserted to meet timing.Wirelength Perspective. In the table, we also observe the wireleg-nth gets improved as well. One of the reasons is that because fewerbuffers are inserted as the reasons mentioned in the previous dis-cussion, the net count of the optimized placement achieved byour framework will be smaller as well, which results in smallerwirelength as a by-product from the reduction of buffers. Anotherreason is that because the placement guidance provided by ourframework PL-GNN is achieved by considering the underlying log-ical affinity of a given netlist, cells that have more connections willtend to be placed together, which reduces detours of routing.
6 WHY PL-GNNWORKS?The superior achievements of our framework can mainly be ac-counted in two perspectives:Well-Defined Initial Features. First, the initial node features pre-sented in Figure 4 accurately capture the underlying characteristicsof each design instance that are related to achieving high-qualityplacements. Although these features are not good enough to per-form placement guidance as aforementioned, they provide precious
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
13
ISPD โ21, March 22โ24, 2021, Virtual Event, USA Yi-Chen Lu, Sai Pentapati, and Sung Kyu Lim
Table 2: Placement optimization impact on commercial CPU designs, where โICC2 defaultโ represents the toolโs default place-ment flow, and โLouvainโ [1] denotes the modularity-based clustering method.
Design Name Method # of Wirelength WNS TNS Total Power # insertedclusters (m) (ns) (ns) (mW) buffers
CPU-Design-AICC2 default - 4.37 -0.07 -0.22 142 5942Louvain [1] 82 4.34 -0.10 -0.62 141 5826
ours 22 4.20 -0.01 -0.03 138 5371
CPU-Design-BICC2 default - 11.66 -0.24 -240.39 582 2728Louvain [1] 58 11.65 -0.38 -296.54 578 2689
ours 32 11.55 -0.18 -62.21 574 2274
information for the graph learning process. Based on these features,GNNs will know which are the nodes (instances) that are inher-ently similar to each other (e.g. hierarchy features), and will furthertransform them into better representations. Furthermore, by takingthe logic levels to memory macros as features, GNNs will learn tobalance the critical paths when performing feature aggregation,which results in fewer inserted buffers during placement.Superiority of Graph Learning. Second, GNNs are highly ef-fective in encoding graph structures with consideration of nodeattributes, and particularly in highly sparse graphs [24]. They cap-ture invaluable latent knowledge on netlists that are hard to beobserved even by experienced designers, but crucial to achieve high-quality placements. Since the final physical location of a designinstance highly depends on the local neighborhood structure andits design attributes, GNNs become particularly suitable to encodesuch information by performing representation learning. Unlikemodularity-based clustering algorithm [1] that groups design in-stances into clusters simply based on connectivity, our frameworkPL-GNN leverages GNNs to carefully distill the underlying designknowledge in order to accurately determine the cell clusters thatachieve high-quality placements. In summary, the logical affinityamong design instances directly dominates the quality of place-ment, and one should better encode, rather than fighting โthe lawof attractionโ when performing placement.
7 CONCLUSION AND FUTUREWORKIn this paper, we have proposed PL-GNN, a graph learning-basedframework that performs placement guidance for commercial tools.We demonstrate the proposed framework significantly improvesthe default placement flow in Synopsys ICC2 on commercial CPUdesigns. We believe this work demonstrates promising directionsof leveraging ML algorithms to solve crucial EDA problems. In thefuture, we plan to study the impact of leveraging different nodefeatures, and apply our framework in other advanced technologies.
8 ACKNOWLEDGMENTThis work is supported by the National Science Foundation underGrant No. CNS 16-24731 and the industry members of the Centerfor Advanced Electronics in Machine Learning.
REFERENCES[1] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding
of communities in large networks. Journal of statistical mechanics: theory andexperiment, 2008(10):P10008, 2008.
[2] D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE transactionson pattern analysis and machine intelligence, (2):224โ227, 1979.
[3] R. C. De Amorim and B. Mirkin. Minkowski metric, feature weighting andanomalous cluster initializing in k-means clustering. Pattern Recognition, 2012.
[4] R. De La Briandais. File searching using variable length keys. In Papers presentedat the the March 3-5, 1959, western joint computer conference, 1959.
[5] M. Fey and J. E. Lenssen. Fast graph representation learning with pytorchgeometric. arXiv preprint arXiv:1903.02428, 2019.
[6] M. Fogaรงa, A. B. Kahng, R. Reis, and L. Wang. Finding placement-relevant clusterswith fast modularity-based clustering. In Proceedings of the 24th Asia and SouthPacific Design Automation Conference, pages 569โ576, 2019.
[7] A. Goldie and A. Mirhoseini. Placement optimization with deep reinforce-ment learning. In Proceedings of the 2020 International Symposium on PhysicalDesign, pages 3โ7, 2020.
[8] L. Hagen and A. B. Kahng. A new approach to effective circuit clustering. InICCAD, volume 92, pages 422โ427, 1992.
[9] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on largegraphs. In Advances in Neural Information Processing Systems, 2017.
[10] D.-H. Huang and A. B. Kahng. When clusters meet partitions: new density-basedmethods for circuit decomposition. In Proceedings the European Design andTest Conference. ED&TC 1995, pages 60โ64. IEEE, 1995.
[11] E. Ihler, D. Wagner, and F. Wagner. Modeling hypergraphs by graphs with thesame mincut properties. Information Processing Letters, 45(4):171โ175, 1993.
[12] G. Karypis and V. Kumar. Multilevel k-way hypergraph partitioning. VLSI design,11(3):285โ300, 2000.
[13] B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioninggraphs. The Bell system technical journal, 49(2):291โ307, 1970.
[14] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXivpreprint arXiv:1412.6980, 2014.
[15] Y. Lin, Z. Jiang, J. Gu, W. Li, S. Dhar, H. Ren, B. Khailany, and D. Z. Pan. Dream-place: Deep learning toolkit-enabled gpu acceleration for modern vlsi place-ment. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems, 2020.
[16] Y.-C. Lu, J. Lee, A. Agnesina, K. Samadi, and S. K. Lim. Gan-cts: A generative adver-sarial framework for clock tree prediction and optimization. In 2019 IEEE/ACMInternational Conference on Computer-Aided Design (ICCAD). IEEE, 2019.
[17] A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, Y.-J. Lee,E. Johnson, O. Pathak, S. Bae, et al. Chip placement with deep reinforcementlearning. arXiv preprint arXiv:2004.10746, 2020.
[18] M. E. Newman and M. Girvan. Finding and evaluating community structure innetworks. Physical review E, 69(2):026113, 2004.
[19] P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validationof cluster analysis. Journal of computational and applied mathematics, 20, 1987.
[20] I. Synopsys. Compiler user guide, 2019.[21] R.-S. Tsay and E. Kuh. A unified approach to partitioning and placement (vlsi
layout). IEEE Transactions on Circuits and Systems, 38(5):521โ533, 1991.[22] L.-T. Wang, Y.-W. Chang, and K.-T. T. Cheng. Electronic design automation:
synthesis, verification, and test. Morgan Kaufmann, 2009.[23] J. B. White and P. Sadayappan. On improving the performance of sparse
matrix-vector multiplication. In Proceedings Fourth International Conferenceon High-Performance Computing, pages 66โ71. IEEE, 1997.
[24] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neuralnetworks? arXiv preprint arXiv:1810.00826, 2018.
[25] T.-C. Yu, S.-Y. Fang, H.-S. Chiu, K.-S. Hu, P. H.-Y. Tai, C. C.-F. Shen, and H. Sheng.Lookahead placement optimization with cell library-based pin accessibility pre-diction via active learning. In Proceedings of the 2020 International Symposiumon Physical Design, pages 65โ72, 2020.
Session 2: Machine Learning for Physical Design (1/2) ISPD '21, March 22โ24, 2021, Virtual Event, USA
14