INTERCONNECT TREE OPTIMIZATION ALGORITHM IN NANOMETER VERY LARGE SCALE INTEGRATION DESIGNS

CHESSDA UTRAPHAN EH KAN

UNIVERSITI TEKNOLOGI MALAYSIA
INTERCONNECT TREE OPTIMIZATION ALGORITHM IN NANOMETER VERY LARGE SCALE INTEGRATION DESIGNS

CHESSDA UTTRAPHAN EH KAN

A thesis submitted in fulfilment of the requirement for the award of the degree of Doctor of Philosophy (Electrical Engineering)

Faculty of Electrical Engineering
Universiti Teknologi Malaysia

MARCH 2016
To Nukkul, Sukkritth, Su Naas, Father and Mother
ACKNOWLEDGEMENT

I would like to express my deepest appreciation to all those who provided me the possibility to complete this thesis. A special gratitude I give to my supervisor, Dr. Nasir Shaikh Husin, whose contribution in stimulating suggestions and encouragement, helped me to coordinate my research especially in writing this thesis. I am also very thankful to Prof. Dr. Mohamed Khalil Mohd. Hani who gives countless ideas and supports in completing this thesis.

Furthermore, I would also like to acknowledge with much appreciation the crucial role of the staff member of VLSI research lab especially to En. Zulkifli Md. Yusof who contributed many ideas during the discussion of this work. Many thanks to Dr. Usman Ullah Sheikh for help me in debugging the codes. I am also indebted to Universiti Tun Hussein Onn (UTHM) and Ministry of Education, Malaysia for funding my Ph.D study.

My thanks, love and appreciations also go to my family. My wife, Su Naas who understands and always provide care and support throughout this challenging work. To my sons, Nukkul and Sukkrith who suffer from autism, I sincerely apologize if during this work I sometime spent less time with both of you.
ABSTRACT

This thesis proposes a graph-based maze routing and buffer insertion algorithm for nanometer Very Large Scale Integration (VLSI) layout designs. The algorithm is called Hybrid Routing Tree and Buffer insertion with Look-Ahead (HRTB-LA). In recent VLSI designs, interconnect delay becomes a dominant factor compared to gate delay. The well-known technique to minimize the interconnect delay is by inserting buffers along the interconnect wires. In conventional buffer insertion algorithms, the buffers are inserted on the fixed routing paths. However, in a modern design, there are macro blocks that prohibit any buffer insertion in their respective area. Most of the conventional buffer insertion algorithms do not consider these obstacles. In the presence of buffer obstacles, post routing algorithm may produce poor solution. On the other hand, simultaneous routing and buffer insertion algorithm offers a better solution, but it was proven to be NP-complete. Besides timing performance, power dissipation of the inserted buffers is another metric that needs to be optimized. Research has shown that power dissipation overhead due to buffer insertions is significantly high. In other words, interconnect delay and power dissipation move in opposite directions. Although many methodologies to optimize timing performance with power constraint have been proposed, no algorithm is based on grid graph technique. Hence, the main contribution of this thesis is an efficient algorithm using a hybrid approach for multi-constraint optimization in multi-terminal nets. The algorithm uses dynamic programming to compute the interconnect delay and power dissipation of the inserted buffers incrementally, while an effective runtime is achieved with the aid of novel look-ahead and graph pruning schemes. Experimental results prove that HRTB-LA is able to handle multi-constraint optimizations and produces up to 47% better solution compared to a post routing buffer insertion algorithm in comparable runtime.
ABSTRAK

# TABLE OF CONTENTS

<table>
<thead>
<tr>
<th>CHAPTER</th>
<th>TITLE</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>DECLARATION</td>
<td></td>
<td>ii</td>
</tr>
<tr>
<td>DEDICATION</td>
<td></td>
<td>iii</td>
</tr>
<tr>
<td>ACKNOWLEDGEMENTS</td>
<td></td>
<td>iv</td>
</tr>
<tr>
<td>ABSTRACT</td>
<td></td>
<td>v</td>
</tr>
<tr>
<td>ABSTRAK</td>
<td></td>
<td>vi</td>
</tr>
<tr>
<td>TABLE OF CONTENTS</td>
<td></td>
<td>vii</td>
</tr>
<tr>
<td>LIST OF TABLES</td>
<td></td>
<td>xi</td>
</tr>
<tr>
<td>LIST OF FIGURES</td>
<td></td>
<td>xiv</td>
</tr>
<tr>
<td>LIST OF ABBREVIATIONS</td>
<td></td>
<td>xix</td>
</tr>
<tr>
<td>LIST OF APPENDICES</td>
<td></td>
<td>xxi</td>
</tr>
</tbody>
</table>

1 INTRODUCTION | 1
1.1 Overview | 1
1.2 Problem statement | 2
1.3 Research objectives | 6
1.4 Problem formulation | 7
1.5 Scope of works | 8
1.6 Research contributions | 9
1.7 Thesis outline | 9

2 LITERATURE REVIEW | 11
2.1 Interconnect routing | 11
    2.1.1 Two-terminal nets routing | 12
    2.1.2 Multi-terminal nets routing and topology
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.2</td>
<td>Post routing optimization with buffer insertion</td>
</tr>
<tr>
<td>2.2.1</td>
<td>Closed-form solution</td>
</tr>
<tr>
<td>2.2.2</td>
<td>Dynamic programming</td>
</tr>
<tr>
<td>2.3</td>
<td>Simultaneous routing and buffer insertion</td>
</tr>
<tr>
<td>2.3.1</td>
<td>Two-terminal net</td>
</tr>
<tr>
<td>2.3.2</td>
<td>Multi-terminal net</td>
</tr>
<tr>
<td>2.4</td>
<td>Tree adjustment technique</td>
</tr>
<tr>
<td>2.5</td>
<td>Multi-constraint optimization techniques</td>
</tr>
<tr>
<td>2.6</td>
<td>Other delay models</td>
</tr>
<tr>
<td>2.7</td>
<td>Summary</td>
</tr>
</tbody>
</table>

3 FUNDAMENTAL THEORY AND MODELLING

<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.1</td>
<td>Algorithm and complexity analysis</td>
</tr>
<tr>
<td>3.2</td>
<td>Dijkstra’s shortest path algorithm</td>
</tr>
<tr>
<td>3.2.1</td>
<td>Implementation of Dijkstra’s algorithm using priority queue</td>
</tr>
<tr>
<td>3.2.2</td>
<td>Time complexity of Dijkstra’s algorithm</td>
</tr>
<tr>
<td>3.3</td>
<td>Interconnect delay model</td>
</tr>
<tr>
<td>3.4</td>
<td>Power dissipation in buffered path interconnect</td>
</tr>
<tr>
<td>3.5</td>
<td>van Ginneken algorithm</td>
</tr>
<tr>
<td>3.6</td>
<td>Simultaneous routing and buffer insertion</td>
</tr>
<tr>
<td>3.6.1</td>
<td>Modelling VLSI routing with buffer insertion as a shortest-path problem</td>
</tr>
<tr>
<td>3.6.2</td>
<td>Simultaneous routing and buffer insertion for two-terminal nets</td>
</tr>
<tr>
<td>3.6.3</td>
<td>Simultaneous routing and buffer insertion for multi-terminal nets</td>
</tr>
<tr>
<td>3.7</td>
<td>Delay and power model formulation for HRTB-LA</td>
</tr>
<tr>
<td>3.7.1</td>
<td>Delay and power computation for upstream path expansion</td>
</tr>
<tr>
<td>3.7.2</td>
<td>Delay and power computation for</td>
</tr>
</tbody>
</table>
downstream path expansion 57
3.8 Multi-constraint routing 59
3.9 Multi-constraint routing with look-ahead scheme 63
3.10 Summary 66

4  DESIGN AND DESCRIPTION OF HRTB-LA 67
   ALGORITHM
   4.1 HRTB-LA overview 67
   4.2 Tree adjustment 68
   4.3 Graph pruning 72
   4.4 Path expansion and look-ahead scheme 78
      4.4.1 Path expansion in HRTB 78
      4.4.2 Path expansion in HRTB-LA 84
   4.5 Time complexity of HRTB-LA 90
   4.6 Numerical illustration of HRTB and HRTB-LA 91
      4.6.1 Numerical illustration of HRTB 92
      4.6.2 Numerical illustration of HRTB-LA 103
   4.7 Summary 108

5  SOFTWARE DESIGN OF HRTB-LA 109
   5.1 HRTB-LA data structure 109
      5.1.1 Data structure of the pre-processing data 110
      5.1.2 Data structure of the candidate solutions 110
   5.2 Linked list functions 113
      5.2.1 Inserting data into a linked list 114
      5.2.2 Retrieving data from a linked list 114
   5.3 Priority queue in HRTB-LA 115
   5.4 Pseudo-code of HRTB-LA’s main function 115
   5.5 Summary 118
# VERIFICATION AND PERFORMANCE TEST OF HRTB AND HRTB-LA

6.1 Overview 119
6.2 Wire and buffer parameters 120
6.3 Verification of the proposed algorithm 122
   6.3.1 Verification for timing optimization 122
   6.3.2 Verification of the iterative power computation scheme 125
6.4 Performance test 1 – solution quality 130
6.5 Performance test 2 – solution quality, runtime and the number of candidate solutions 133
6.6 Performance test 3 – Delay-power optimization 140
6.7 Summary 146

# CONCLUSIONS AND FUTURE WORKS

7.1 Conclusions 147
7.2 Future works 149

REFERENCES 152

Appendices A – C 160 - 189
# LIST OF TABLES

<table>
<thead>
<tr>
<th>TABLE NO.</th>
<th>TITLE</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.1</td>
<td>Time complexity for heap operations for binary, binomial and Fibonacci heap (Cormen et al. 2009)</td>
<td>35</td>
</tr>
<tr>
<td>4.1</td>
<td>$Prev_i$ and $Prev_j$ for all vertices for the graph in Figure 4.9</td>
<td>81</td>
</tr>
<tr>
<td>4.2</td>
<td>List of the predicted end-to-end delay, end-to-end power and predicted end-to-end path length $l_p(P)$ at vertex 8 in Figure 4.10</td>
<td>88</td>
</tr>
<tr>
<td>4.3</td>
<td>The values in the list and priority queue for path expansion from $sink_1$ to Steiner node in HRTB for graph in Figure 4.12. The key in the grey box indicates the lowest key value that will be extracted in the next $\text{EXTRACT}_\text{MIN}(Q)$ (a) initial values of the list and priority queue (b) to (l) the values in the list and priority queue after 1$^{st}$ extraction to 11$^{th}$ extraction respectively (m) the values in the list and priority queue after the path expansions were completed (the text in red colour indicates that the candidate solution is dominated)</td>
<td>93</td>
</tr>
<tr>
<td>4.4</td>
<td>The values in the list and priority queue for path expansions from $sink_2$ to Steiner node in HRTB for graph in Figure 4.12. The key in the grey box indicates the lowest key value and it will be extracted in the next $\text{EXTRACT}_\text{MIN}(Q)$ (a) initial values of the list and priority queue (b) and (c) the values after the 1st and 2nd extractions respectively (d) the final values after the queue is empty</td>
<td>100</td>
</tr>
</tbody>
</table>
4.5 The values in the list and priority queue for path expansions from the Steiner node to the source in HRTB for graph in Figure 4.12. The key in the grey box indicates the lowest key value and it will be extracted in the next EXTRACT_MIN(Q)
(a) initial values of the list and priority queue (b) to (e) the values after 1st extraction to 4th extraction respectively (f) the final values after the queue is empty

4.6 The values in the list and priority queue for path expansions from the Steiner node to the source in HRTB for graph in Figure 4.12. The key in the grey box indicates the lowest key value and it will be extracted in the next EXTRACT_MIN(Q)
(a) initial values of the list and priority queue (b) to (e) the values after 1st extraction to 4th extraction respectively (f) the final values after the queue is empty

4.7 The values in the list and priority queue for the path expansions from sink2 to Steiner node in HRTB-LA for graph in Figure 4.12. The key in the grey box indicates the lowest key value and it will be extracted in the next EXTRACT_MIN(Q) (a) initial values of the list and priority queue (b) and (c) the values after the 1st and 2nd extraction respectively

4.8 The values in the list and priority queue for path expansions from the Steiner node to the source in HRTB-LA for graph in Figure 4.12. The key in the grey box indicates the lowest key value and it will be extracted in the next EXTRACT_MIN(Q) (a) initial values of the list and priority queue (b) to (d) the values after the 1st extraction to 3rd extraction respectively

5.1 Attributes of a candidate solution

6.1 Wire dimension and parameters

6.2 Buffer library

6.3 Characteristics of the test nets and graphs
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>6.4</td>
<td>Solution from HRTB and FBI running on test nets</td>
<td>125</td>
</tr>
<tr>
<td>6.5</td>
<td>Delay at source comparison between FBI, RIATA, HRTB and HRTB-LA</td>
<td>132</td>
</tr>
<tr>
<td>6.6</td>
<td>Solution quality, runtime and number of candidate solutions for net N5</td>
<td>135</td>
</tr>
<tr>
<td>6.7</td>
<td>Solution quality, runtime and number of candidate solutions for net N10</td>
<td>135</td>
</tr>
<tr>
<td>6.8</td>
<td>Solution quality, runtime and number of candidate solutions for net N25</td>
<td>136</td>
</tr>
<tr>
<td>6.9</td>
<td>Performance comparison between HRTB and HRTB-LA for net N5</td>
<td>142</td>
</tr>
<tr>
<td>6.10</td>
<td>Performance comparison between HRTB and HRTB-LA for net N25</td>
<td>142</td>
</tr>
</tbody>
</table>
# LIST OF FIGURES

<table>
<thead>
<tr>
<th>FIGURE NO.</th>
<th>TITLE</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.1</td>
<td>(a) Buffer insertion on fixed routing tree that ignores buffer obstacles (b) buffer insertion on fixed routing tree that avoids obstacles (c) buffer insertion on the fixed routing tree with tree adjustment (RIATA) and (d) simultaneous routing tree and buffer insertion on the adjusted tree</td>
<td>5</td>
</tr>
<tr>
<td>1.2</td>
<td>A tree on uniform grid graph $G = (V, E)$</td>
<td>8</td>
</tr>
<tr>
<td>2.1</td>
<td>(a) Rectilinear minimum spanning tree (R-MST) (b) rectilinear Steiner minimal tree (R-SMT). Hollow dots indicate net terminals while solid dots are the Steiner nodes</td>
<td>13</td>
</tr>
<tr>
<td>2.2</td>
<td>(a) A wire of length $L$ and (b) Corresponding π-model RC circuit</td>
<td>17</td>
</tr>
<tr>
<td>2.3</td>
<td>A wire inserted with $N-1$ number of buffers</td>
<td>18</td>
</tr>
<tr>
<td>3.1</td>
<td>$O$-notation gives an upper bound for a function to within a constant factor (Cormen et al. 2009)</td>
<td>30</td>
</tr>
<tr>
<td>3.2</td>
<td>Runtime trend of algorithms</td>
<td>31</td>
</tr>
<tr>
<td>3.3</td>
<td>Relaxation on edge $(u, v)$ with weight $w(u, v) = 3$. (a) The relaxation step when $v.d &gt; u.d + w(u, v)$ and (b) The relaxation step when $v.d &lt; u.d + w(u, v)$</td>
<td>33</td>
</tr>
<tr>
<td>3.4</td>
<td>A $5 \times 4$ grid graph</td>
<td>34</td>
</tr>
</tbody>
</table>
Illustration of Dijkstra’s algorithm on general graph, with \( a \) as the source (a) initialization (b) path expansion from \( a \rightarrow b, a \rightarrow c \) (c) path expansion from \( c \rightarrow d, c \rightarrow e \) (d) path expansion from \( b \rightarrow d \) (e) path expansion from \( d \rightarrow e, d \rightarrow f \) (f) final expansion from \( e \rightarrow f \) gives shortest path from \( a \rightarrow c \rightarrow d \rightarrow e \rightarrow f \) with cost = 7

Illustration of Dijkstra’s algorithm on uniform grid graph (a) Initial graph \( v.d = \infty \) (b) expansion 1 \( \rightarrow \) 2 and 4 (c) expansion 2 \( \rightarrow \) 3 and 5 (d) expansion 4 \( \rightarrow \) 5 and 7 (e) expansion 3 \( \rightarrow \) 6 (f) expansion 5 \( \rightarrow \) 6 and 8 (g) expansion 7 \( \rightarrow \) 8 (h) expansion 6 \( \rightarrow \) 9 (i) expansion 8 \( \rightarrow \) 9

A simple RC tree illustrating the process of calculating Elmore delay

Buffered path interconnect

Fixed routing tree connecting source node to the Steiner node and all sink nodes

Candidate solutions at each node. The red colour is the best solution for the given path

Path expansion in 2-terminal net simultaneous routing and buffer insertion (a) expansion from Sink node to vertices 5, 9 and 15 (b) expansion from vertex 5 to vertex 4

The 2D grid graph representing the interconnect tree in Figure 3.9

Simultaneous routing and buffer insertion in multi-terminal net. The arrows show the direction of the path expansions

Wire expansion from vertex \( v \) to vertex \( u \) for upstream path expansion

Wire expansion from vertex \( v \) to vertex \( u \) and insert buffer at \( v \)
3.16 Wire expansion from vertex $u$ to vertex $v$ for downstream path expansion

3.17 Wire expansion from vertex $u$ to vertex $v$ and insert buffer at $v$

3.18 Path expansion in multi-constraint graph (a) Initialization (b) first path expansion (c) expansion for path $c\rightarrow d$, $c\rightarrow e$ (d) expansion for path $b\rightarrow d$ (e) expansion for path $e\rightarrow f$ (f) expansion for path $d\rightarrow e$, $d\rightarrow f$ extracted from 0.6 (g) expansion for path $d\rightarrow e$, $d\rightarrow f$ extracted from 0.7 (h) expansion for path $e\rightarrow f$

3.19 Path expansion in multi-constraint graph with look-ahead scheme (a) initialization (b) first path expansion (c) expansion for path $c\rightarrow d$ (d) expansion for path $d\rightarrow f$ (e) expansion from path $b\rightarrow d$

4.1 Main stages in HRTB-LA

4.2 Tree adjustment technique (a) a Steiner node $m$ is inside the obstacle (b) an alternative Steiner node $m'$ is generated (Hu et al. 2003)

4.3 Flowchart of the tree adjustment

4.4 Sample tree for illustrating tree adjustment

4.5 Flowchart of the graph pruning

4.6 Vertex $v$ is pruned as $L_{ToEnd}[v] + L_{ToStart}[v] > L_{StartEnd}$

4.7 Graph pruning in HRTB-LA (a) initial graph (b) graph pruning for $sink1 \rightarrow Steiner$ node expansions (c) graph pruning for $sink2 \rightarrow Steiner$ node expansions (d) graph pruning for $Steiner$ node $\rightarrow Source$ expansions

4.8 Flowchart of the path expansion in the proposed algorithm

4.9 Example of candidate solutions at each vertex
4.10 Sample routing graph and path expansion of the proposed algorithm 83
4.11 The look-ahead weight vectors for the graph in Figure 4.10 87
4.12 Sample grid graph 92
4.13 Routing solution 102
4.14 Look-ahead weight vectors for (a) first set (b) second set 103
5.1 Node labelling for a net with two sinks in HRTB-LA 110
5.2 Illustration of Previ and Prevj attributes 112
5.3 Linked list for vertex $v$ with three candidate solutions 113
6.1 Sample tree with 5 sinks 123
6.2 Solution from FBI algorithm 123
6.3 Solution from HRTB algorithm 124
6.4 Illustration of the iterative power computation (a) sample net (b) upstream computation (c) downstream computation 128
6.5 Illustration of the iterative power computation for multi buffer types (a) sample net (b) upstream computation (c) downstream computation 129
6.6 Plot of net N5 test results (a) slack at source (b) runtime (c) number of candidate solutions 137
6.7 Plot of net N10 test results (a) slack at source (b) runtime (c) number of candidate solutions 138
6.8 Plot of net N25 test results (a) slack at source (b) runtime (c) number of candidate solution 139
6.9 Plot of net N5 test results for delay-power constraint optimization (a) runtime (b) number of candidate solutions 143
6.10 Plot of net N25 test results for delay-power constraint optimization (a) runtime (b) number of candidate solutions 144
6.11 Routing solutions for 4-sink net for different power constraints
(a) routing solution for maximum slack with no power
constraint (b) routing solution when power was constrained at 30 mW (c) routing solution when power was constrained at 20 mW

7.1 Sample net where a pair of nodes are on the same horizontal line (a) before pruning (b) effective search space for path expansions between sink1 and Steiner node

A1 Fibonacci heap structure

A2 Inserting a node into a Fibonacci heap (a) a Fibonacci heap $H$
(b) Fibonacci heap $H$ after inserting the node with key 21

A3 The process of EXTRACT_MIN($H$) (a) meld the childs into root list (b) label the rank (c) to (e) mark current node and updating rank list from left to right (f) link 23 into 17 (g) to (h) link 24 into 7 (i) to (k) link 41 into 18 (l) final heap

A4 DECREASE-KEY($H$, $x$, $k$), if the heap order not violated (a) initial heap structure (b) the key of $x$ is decreased from 46 to 29

A5 DECREASE-KEY($H$, $x$, $k$), if the heap order is violated (a) decrease the key (b) cut the tree rooted at $x$, meld into the root list and former parent node is marked
## LIST OF ABBREVIATIONS

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>A-Tree</td>
<td>Arborescence-Tree</td>
</tr>
<tr>
<td>BPRIM</td>
<td>Bounded Prim</td>
</tr>
<tr>
<td>BRBC</td>
<td>Bounded Radius Bounded Cost</td>
</tr>
<tr>
<td>BR-MRT</td>
<td>Bounded Radius - Minimum Routing Tree</td>
</tr>
<tr>
<td>CMOS</td>
<td>Complementary Metal Oxide Semiconductor</td>
</tr>
<tr>
<td>C-Tree</td>
<td>Clustered-Tree</td>
</tr>
<tr>
<td>DP</td>
<td>Dynamic Programming</td>
</tr>
<tr>
<td>ED</td>
<td>Elmore Delay</td>
</tr>
<tr>
<td>FBI</td>
<td>Fast Buffer Insertion</td>
</tr>
<tr>
<td>HRTB</td>
<td>Hybrid Routing Tree and Buffer insertion</td>
</tr>
<tr>
<td>HRTB-LA</td>
<td>Hybrid Routing Tree and Buffer insertion with Look-Ahead</td>
</tr>
<tr>
<td>ITRS</td>
<td>International Technology Roadmap for Semiconductors</td>
</tr>
<tr>
<td>L-RST</td>
<td>L-shaped Rectilinear Steiner Tree</td>
</tr>
<tr>
<td>MCOP</td>
<td>Multi-Constraint Optimal Path</td>
</tr>
<tr>
<td>MCP</td>
<td>Multi-Constraint Path</td>
</tr>
<tr>
<td>MOS</td>
<td>Metal Oxide Semiconductor</td>
</tr>
<tr>
<td>MRSA</td>
<td>Minimum Rectilinear Steiner Arborescence</td>
</tr>
<tr>
<td>MST</td>
<td>Minimum Spanning Tree</td>
</tr>
<tr>
<td>NP</td>
<td>Non-deterministic Polynomial time</td>
</tr>
<tr>
<td>PDF</td>
<td>Probability Density Function</td>
</tr>
<tr>
<td>QoS</td>
<td>Quality-of-Service</td>
</tr>
<tr>
<td>RAT</td>
<td>Required Arrival Time</td>
</tr>
<tr>
<td>RC</td>
<td>Resistor-Capacitor</td>
</tr>
<tr>
<td>RIATA</td>
<td>Repeater Insertion with Adaptive Tree Adjustment</td>
</tr>
<tr>
<td>RLC</td>
<td>Resistor-inductor-Capacitor</td>
</tr>
<tr>
<td>Acronym</td>
<td>Description</td>
</tr>
<tr>
<td>---------</td>
<td>--------------------------------------------------</td>
</tr>
<tr>
<td>RMP</td>
<td>Recursive Merging and Pruning</td>
</tr>
<tr>
<td>R-MST</td>
<td>Rectilinear Minimal Spanning Tree</td>
</tr>
<tr>
<td>R-SMT</td>
<td>Rectilinear Steiner Minimal Tree</td>
</tr>
<tr>
<td>RTBW</td>
<td>Routing Tree with Buffer insertion and Wire sizing</td>
</tr>
<tr>
<td>SAMCRA</td>
<td>Self-Adaptive Multi-Constrained Routing Algorithm</td>
</tr>
<tr>
<td>SMT</td>
<td>Steiner Minimal Tree</td>
</tr>
<tr>
<td>S-RABILA</td>
<td>Simultaneous Routing and Buffer Insertion with Look-Ahead</td>
</tr>
<tr>
<td>VLSI</td>
<td>Very Large Scale Integration</td>
</tr>
<tr>
<td>APPENDIX</td>
<td>TITLE</td>
</tr>
<tr>
<td>----------</td>
<td>------------------------------------</td>
</tr>
<tr>
<td>A</td>
<td>Fibonacci heap operations</td>
</tr>
<tr>
<td>B</td>
<td>C Code for HRTB-LA algorithm</td>
</tr>
<tr>
<td>C</td>
<td>Output sample</td>
</tr>
</tbody>
</table>
CHAPTER 1

INTRODUCTION

1.1 Overview

The demand for high speed and low power consumption for today’s applications has forced dramatic changes in the design and manufacturing methodologies for very large scale integration (VLSI) circuits (Celik et al. 2002; Ekekwe 2010; ITRS 2012; 2013). To meet the demand, the number of devices (i.e. transistors) on a single chip must be increased and this requires decrease of the device size and also will need a larger layout area to support huge amounts of devices.

As the size of devices decreases and the device operates at a higher speed, the interconnect delay becomes much more significant compared to the device delay. Most of the delay in integrated circuits is due to the time it takes to charge and discharge the capacitance of the wires and the gates of the transistors. The resistance \( R = rl \) of a wire increases linearly with its length \( l \) and so does its capacitance \( C = cl \). Where \( c \) and \( r \) are unit capacitance and unit resistance respectively. Hence, the RC delay of the wire is \( D = \frac{1}{2}RC = \frac{1}{2}rcl^2 \) (van Ginneken 1990). Clearly, the delay increases quadratically with the length of the wire (Saxena et al. 2004; ITRS 2012).

One of the effective techniques to reduce the interconnect delay is by inserting a buffer to restore the signal strength along the interconnect tree. As design dimensions continue to shrink, more and more buffers are needed to improve the performance. However, buffer itself consumes power and it has been shown that power dissipation overhead due to optimal buffer insertion is significantly high (Ekekwe 2010).
According to (Saxena et al. 2004), the critical inter buffer length (the minimum wire segment length where the buffer is required) decreased at the rate of 68% when the VLSI technology migrates from 90 nm to 45 nm. This inter buffer length scaling significantly outpaces the VLSI technology scaling which is roughly 0.5 times for every two generations. The total block cell count made up of buffers will reach 35% in the 45-nm technology node and 70% in 32-nm technology.

The dramatic buffer scaling undoubtedly generates large and profound impact on VLSI circuit design. With millions of buffers required per chip, almost nobody can afford to neglect the importance of optimal buffer insertion as compared to a decade ago when only a few thousands of buffers are needed for a chip (Cong 1997). Because of this importance, buffer insertion algorithms and methodologies need to be deeply studied on various aspects. First, a buffer insertion algorithm should deliver solutions of high quality because interconnect and circuit performance largely depend on the way that buffers are placed. Second, a buffer insertion algorithm needs to be sufficiently fast so that millions of nets can be optimized in reasonable time. Third, accurate delay models are necessary to ensure that buffer insertion solutions are reliable. Fourth, buffer insertion techniques are expected to simultaneously handle multiple objectives, such as timing, power and signal integrity (Alpert et al. 2009).

1.2 Problem statement

Interconnect is a wiring system that distributes clock and other signals to the various functional blocks of a CMOS integrated circuit. When the VLSI technology is scaled down, gate delay and interconnect delay change in opposite directions. Smaller devices lead to less gate switching delay. In contrast, thinner wire leads to increased wire resistance and greater signal propagation delay along wires. As a result, interconnect delay has become a dominating factor for VLSI circuit performance (ITRS 2012; 2013).
Among the available techniques, buffer insertion has been proven to be one of the best techniques to reduce the interconnect delay for a long wire. The main challenge in interconnect buffer insertion is how to determine optimal number of buffers and their placement in the given interconnect tree. The most influential and systematic technique was proposed by (van Ginneken 1990). Given the possible buffer locations, this algorithm can find the optimum buffering solution for the fixed signal routing tree that will maximize timing slack at the source according to Elmore delay model (Elmore 1948). As the number of buffers inserted in the circuits increases dramatically, an algorithm that is fast and efficient is essential for the design automation tools. van Ginneken’s algorithm utilized dynamic programming which tries to find an optimal solution to a problem by first finding optimal solutions to sub problems and then merging them to find an optimal solution to the larger problem.

Recently, many techniques to speedup van Ginneken’s algorithm and its extensions were proposed such as in (Lillis et al. 1996), (Shi and Li 2003), (Shi and Li 2005), (Li and Shi 2006) and (Li et al. 2012). However, van Ginneken’s algorithm and its extensions can only operate on fixed routing tree. They will give optimal solution when the best routing tree is given but produce a poor solution when a poor routing tree is provided especially when there are obstacles in the designs. In today’s VLSI design, some regions may be occupied by predesigned libraries such as IP blocks and memory arrays. Some of these regions do not allow buffer or wire to pass through and some regions only allow wire to go through but are restricted for any buffer insertion. Therefore, buffer insertion has to be performed with consideration of this buffer and wire obstacles (Alpert et al. 2009; Khalil-Hani and Shaikh-Husin 2009). The best way to handle the obstacles is to perform the routing and buffer insertion simultaneously using a grid graph technique. However, research has shown that simultaneous routing and buffer insertion is NP-complete (Hu et al. 2009). The available known techniques today are either explore dynamic programming to compute optimal solution in the worst-case exponential time or design efficient heuristic without performance guarantee.

The dynamic programming algorithm such as RMP (recursive merging and pruning) algorithm can find an optimal buffering solution for multi-terminal nets
(Cong and Yuan 2000), but it is not efficient when the number of sinks and the number of possible buffer locations are big as the search space is very large. Indeed, (Hu et al. 2003) show that the searching in RMP is NP-complete, and they also proposed a heuristic algorithm to solve multi-pin nets buffer insertion problem by constructing a performance driven Steiner tree and create an alternative Steiner node if the original Steiner node is inside the obstacle. The algorithm is called RIATA for Repeater Insertion with Adaptive Tree Adjustment. RIATA is very fast because it operates on a fixed tree. However, the quality of the solution may not be good enough if many paths of the adjusted tree still overlap with the buffer obstacles as illustrated in Figure 1.1.

Figure 1.1 shows example of possible solutions for a net with a tree structure (multi-terminal) where the grey areas represent buffer obstacles. It has three sinks \( s_1, s_2 \) and \( s_3 \) with \( S_0 \) as the source. In this illustration, appropriate parameters for wires and buffers are applied (will discuss in detail in Chapter 2). Figure 1.1(a) shows the solution from van Ginneken’s algorithm where the slack at source is -899.74 ps (the slack is the required arrival time at sink minus the accumulated delay). This means that the timing is not met because most of the routing paths are inside the buffer obstacles where buffer insertion is not allowed. One can rerout the tree such that all the paths avoid the buffer obstacles as shown in Figure 1.1(b). The slack is improved to -44.39 ps but still violates the timing requirement due to increased wire length. The tree adjustment technique according to RIATA produces a solution as shown in Figure 1.1(c). Now the timing requirement is met, with slack at source of 11.64 ps. RIATA is efficient in terms of runtime but its solution quality still depends on its newly generated tree. If most of the paths are inside the buffer obstacles, the room for timing improvement is still limited.

Instead of fully constructing the routing path simultaneously with buffer insertion like in RMP algorithm, one can utilize the simultaneous approach on the adjusted tree. Figure 1.1(d) illustrates the routing path generated by this approach. The slack obtained at source is improved to 217.65 ps. Clearly, this hybrid technique produces the best result compared to the techniques that perform buffer insertion on the fixed routing path like van Ginneken’s algorithm (and its extensions) and RIATA. The runtime of this hybrid technique can be improved by adopting the technique called
look-ahead proposed by (Shaikh-Husin 2008; Khalil-Hani and Shaikh-Husin 2009) to solve the simultaneous routing and buffer insertion for two terminals (single-sink) net problems.

![Figure 1.1](image)

**Figure 1.1** (a) Buffer insertion on fixed routing tree that ignores buffer obstacles (b) buffer insertion on fixed routing tree that avoids obstacles (c) buffer insertion on the fixed routing tree with tree adjustment (RIATA) and (d) simultaneous routing tree and buffer insertion on the adjusted tree

Another issue that the previous dynamic programming algorithms did not take into consideration is power consumed by the buffers inserted along the interconnect tree. It has been found that power dissipation overhead due to optimal buffer insertion is significantly high and can be as high as 20% of total chip power dissipation (Nalamalpu and Burleson 2001). Hence, in addition to timing performance, power dissipation constraint should also be integrated into buffer insertion algorithm (Nalamalpu and Burleson 2001; Ekekwe 2010). Many methodologies to optimize propagation delay with power constraint have been proposed, (Nalamalpu and
Burleson 2001; Banerjee and Mehrotra 2002; Wason and Banerjee 2005; Li et al. 2005; Narasimhan and Sridhar 2010) but none of them can be integrated into buffer insertion algorithm that is based on dynamic programming on grid graph. The grid graph technique is used because the simultaneous routing and buffer insertion utilizes the maze search algorithm that is best implemented using the graph search algorithm (Cormen et al. 2009). Furthermore, the uniform grid graph allows the buffers to be inserted anywhere (except in buffer obstacle areas), hence, improve the solution quality. Meanwhile, the advantage of dynamic programming is that it allows the use of multiple buffer types.

From the discussion above, the problem is now summarized as follows; buffering in a multi-terminal net is known to be NP-complete and the existing available algorithms that give an optimum solution is too slow while heuristic algorithms are fast but produce poor solutions. Even though buffer insertion is one of the most studied problems in VLSI physical design, finding an efficient algorithm with provably good performance still remains an active research area. Also, as design dimensions continuously shrink, more and more buffers are needed to improve the performance (i.e. speed and signal integrity) of the designs but the buffer itself consumes power. Therefore, we need a new algorithm that is capable to handle these constraints efficiently.

1.3 Research objectives

The objectives of this research are as follows:

1) To propose an efficient graph-based maze routing and buffer insertion algorithm for nanometer VLSI layout designs. The algorithm is designed for multi-terminal nets and multi-constraint optimization. The constraints are as follows; routing obstacles, timing performance and power dissipation.

2) To propose a power computation scheme for the proposed algorithm that can be computed iteratively based on dynamic programming framework.
1.4 Problem formulation

The simultaneous routing and buffer insertion problem in VLSI layout design is essentially a buffered routing path search problem. In this work, it is formulated as a shortest-path problem in a weighted graph specified as follows. Given a routing grid graph $G = (V, E)$ corresponding to VLSI layout where $v \in V$ and $e \in E$ is a set of internal vertices and a set of internal edges respectively, with a source vertex $S_0 \in V$, $n$ sink vertices $s_1, s_2, \ldots, s_n \in V$, $n - 1$ Steiner vertices $m_1, m_2, \ldots, m_{n-1} \in V$, required arrival time $\text{RAT}(s_1), \text{RAT}(s_2), \ldots, \text{RAT}(s_n)$, a power constraint $P_c$, a buffer library $B$, and a wire parameter $W$. The goal is to find a routing tree simultaneously with buffer insertion such that the slack at source and power dissipation of buffers satisfy the given constraints. A vertex $v_i \in V$ may belong to the set of buffer obstacle vertices, denoted $V_{OB}$ or a set of wire obstacle vertices, denoted as $V_{OW}$. A buffer library $B$ contains different types of buffer. For each edge $e = u \rightarrow v$, signal travels from $u$ to $v$, where $u$ is the upstream vertex and $v$ is the downstream vertex and $u, v \not\in V_{OW}$. A uniform grid graph illustrating some of the parameters for the problem formulation is shown in Figure 1.2.

The proposed algorithm is called HRTB-LA which stands for Hybrid Routing Tree and Buffer insertion with Look-Ahead. Instead of a fixed routing tree as in van Ginneken’s algorithm and RIATA, we use maze routing to find the solution. However, HRTB-LA will not explore the entire 2D graph as in RMP because we use an initial tree as a reference for determining the Steiner nodes as in RIATA. We also incorporate the technique of graph pruning and look-ahead to speed up the runtime of the algorithm.
1.5 Scope of works

The scopes of this research are as follows:

(a) Elmore delay metric is used to calculate the interconnect delay due to its high fidelity and speed (Alpert et al. 2007; Li et al. 2012; ITRS 2012; 2013).

(b) Uniform grid graph is used to represent VLSI layout and maze routing (Zhou et al. 2000; Khalil-Hani and Shaikh-Husin 2009) is used for path search.

(c) There are many algorithms for tree construction in VLSI routing (i.e. Steiner minimal tree) and the Steiner tree construction itself is a hard problem. Therefore, we assume that the pre-processing tree is available.

(d) The performance of the proposed algorithm is benchmarked with available similar algorithms.
1.6 Research contributions

We propose a new algorithm for simultaneous tree construction and buffer insertion with multi-constraint optimization. The contributions of this research can be listed as follows:

(a) The concept of look-ahead scheme (Khalil-Hani and Shaikh-Husin 2009) which is proven to be efficient for two-terminal (single-sink) nets is adopted into this work such that it can handle multi-terminal nets.

(b) The algorithm is designed such that it can also optimize multiple constraints such as obstacles, timing and power dissipation of the buffered interconnect tree.

(c) The iterative computation of power dissipation in dynamic programming framework is proposed.

(d) In algorithm design, the time complexity of the algorithm is used to measure the efficiency of the algorithm. Therefore, the time complexity of HRTB-LA is analysed and presented.

1.7 Thesis outline

This thesis consists of seven chapters. Chapter 1 presents research background, problem formulation and objectives of the research. The literature review is presented in Chapter 2 which discusses the evolution of interconnect optimization techniques ranging from two-terminal to multi-terminal nets. Next, the post routing optimization (focusing on buffer insertion) is discussed. In this section, we reviewed the buffer insertion algorithms on fixed tree followed by the simultaneous routing and buffer insertion algorithms. Lastly, the buffer insertion algorithms with multi-constraint optimization and other delay models are discussed.
Chapter 3 presents research background and the theories associated with this research. First, the concept of algorithm and its complexity analysis is presented followed by Dijkstra’s shortest path algorithm. Next, the Elmore delay and power dissipation in buffered interconnect are discussed. The details of buffering algorithms are also discussed in this chapter. Lastly, the delay and power formulation for the proposed algorithm is presented followed by the fundamental concept of multi-constraint routing and the look-ahead scheme.

Chapter 4 presents the design description of the proposed algorithm, HRTB-LA. The main stages of HRTB-LA are discussed in detail. The path expansion process, which is the core of the algorithm, is presented with the aid of numerical examples. We present two types of path expansion which are (1) the normal path expansion without look-ahead scheme and (2) path expansion with look-ahead scheme. The numerical examples demonstrate the advantages of the novel look-ahead scheme in HRTB-LA.

Chapter 5 gives detail descriptions of the software design of HRTB-LA. It focuses on the data structures that are used by the algorithm which are; array data structure, linked list data structure and priority queue implemented using a heap data structure. The pseudo-code of HRTB-LA’s main functions are also presented in this chapter.

Chapter 6 presents the verification and performance test of the proposed algorithm. HRTB-LA is benchmarked with other similar algorithms and the results are presented. And finally, Chapter 7 concludes the research and recommendations for future works are given.
REFERENCES


