# HIGH-LEVEL DESIGN AND SYNTHESIS OF VLSI CELL PLACEMENT ALGORITHM

OTHMAN HANAFI BIN YUSOFF

UNIVERSITI TEKNOLOGI MALAYSIA

# HIGH-LEVEL DESIGN AND SYNTHESIS OF VLSI CELL PLACEMENT ALGORITHM

## OTHMAN HANAFI BIN YUSOFF

A project report submitted in partial fulfilment of the requirements for the award of the degree of Master of Engineering (Computer and Microelectronic Systems)

> School of Electrical Engineering Faculty of Engineering Universiti Teknologi Malaysia

> > JULY 2022

## DEDICATION

I dedicate this thesis to my family who has always supported me throughout my journey. To my teachers and lecturer, I outcome of this report represents the continuation of your effort toward the road of knowledge contribution.

There is no victory without sacrifice.

#### ACKNOWLEDGEMENT

In the name of Allah, the Most Gracious, and the Most Merciful.

In the journey of finishing this thesis, I would like to express my gratitude to my supervisor, Dr. Ab Al-Hadi Ab Rahman for his guidance and advice. Without his continued support and interest, this thesis would not have been the same as presented here.

I also would like to extend my appreciation to Universiti Teknologi Malaysia (UTM) for their facilities and assistance in making this journey possible especially in supplying relevant resources and kinds of literature.

Not to be forgotten are my fellow classmate and friends who have provided assistance on various occasions with their knowledge and tips. I am also grateful to all my family members for their unconditional love and support which have been my courage to finish this thesis.

#### ABSTRACT

Nowadays, chip manufacturers are concerned with the fast time-to-market of the integrated circuit (IC), therefore fast time cycle from design to manufacturing is essential to achieve this goal. The physical design of Very Large-Scale Integration (VLSI) placement is the process of determining the position of each cell on a die surface such that there are no cell overlaps with each other. Moreover, this process is also identifying which affects the timing, routability, power consumption, and performance of a chip. In VLSI cell placement, the most time-consuming task is the IC physical design flow as it involves finding the optimum placement of millions of standard cells and macros in a chip floorplan. The purpose of this study is to improve modern VLSI placement algorithms by using Hardware (HW)/ Software (SW) codesign and High-Level Synthesis (HLS) methodology. The methods in chip floorplan placements can be generally divided into three categories: partition-based placement methods, simulated annealing based methods, and analytical approaches. In this research, the placement algorithm is based on the simulated annealing, and C/C++ programming is developed and validated using standard academic benchmarks from the International Symposium on Physical Design (ISPD) design competition. Some of the critical functions such as the wirelength calculation are synthesized from C to Register-Transfer Level (RTL) using Vivado HLS software for a custom HW implementation and the rest of the algorithm such as data parsing and memory accesses will remain in C, and co-simulated with the custom hardware block. Therefore, this project offers the possibility of using HLS design for VLSI cell placement process where it is proven that using RTL design has improved the execution time in certain functions such as wirelength calculation. Moreover, HLS offers more options in terms of design space exploration as compared to traditional RTL methodology.

#### ABSTRAK

Pada masa kini, pengeluar cip mementingkan masa untuk memasarkan litar bersepadu (IC) dengan cepat, justeru itu jangka masa dari proses rekaan sehingga proses pembuatan dan pengeluaran yang pantas adalah penting untuk mencapai matlamat ini. Penghasilan reka bentuk fizikal penempatan litar bersepadu skala besar (VLSI) adalah sebuah proses untuk mengenal pasti kedudukan setiap sel dipermukaan die supaya tidak bertindih antara satu sel dengan sel yang lain. Tambahan pula, proses ini juga mengenal pasti beberapa perkara yang memberi kesan kepada masa, kebolehsanan, penggunaan tenaga serta prestasi sesebuah cip. Dalam penempatan sel VSLI, tugas yang mengambil masa yang banyak adalah rekaan bentuk aliran fizikal IC kerana ianya melibatkan proses pencarian lokasi yang optima bagi penempatan berjuta-juta sel standard dan mikro didalam susun atur cip. Tujuan kajian ini adalah bagi menambah baik penempatan algorithma VSLI moden dengan menggunakan perkakas (HW) / perisian (SW) dan reka bentuk bersama dalam kaedah sintesis tahap tertinggi. Kaedah dalam penempatan susun atur cip boleh dibahagikan kepada tiga katergori: kaedah penempatan berasaskan pembahagian, kaedah simulated annealing, dan pendekatan analitikal. Dalam kajian ini, penempatan algorithma akan meggunakan simulated annealing, pengaturcaraan C/C++ akan dihasilkan dan disahkann menggunakan penanda aras akademik standard dari pertandingan reka bentuk Simposium Antarabangsa bagi Reka Bentuk Fizikal (ISPD). Beberapa fungsi kritikal seperti kiraan gelombang panjang akan disintesis daripada C kepada Tahap Pemindahan Daftar (RTL) menggunakan perisan Vivado HLS daripada pelaksanaan HW khas dan algorithma selebihnya seperti data parsing dan akses memori akan kekal di dalam C, dan stimulasi bersama blok perkakas khas. Oleh itu, projek ini menawarkan kemungkinan menggunakan reka bentuk HLS untuk proses penempatan sel VLSI di mana terbukti bahawa menggunakan reka bentuk RTL telah meningkatkan masa pelaksanaan dalam fungsi tertentu seperti pengiraan panjang wayar. Selain itu, HLS menawarkan lebih banyak pilihan dari segi penerokaan ruang reka bentuk berbanding dengan metodologi RTL tradisional.

# TABLE OF CONTENTS

PAGE

| DECLARATION     |                     | iii  |
|-----------------|---------------------|------|
| DE              | iv                  |      |
| AC              | CKNOWLEDGEMENT      | V    |
| AB              | STRACT              | vi   |
| AB              | STRAK               | vii  |
| ТА              | BLE OF CONTENTS     | viii |
| LIS             | ST OF TABLES        | X    |
| LIST OF FIGURES |                     | xi   |
| LIS             | ST OF ABBREVIATIONS | xii  |
| LIST OF SYMBOLS |                     | xiii |
| LI              | ST OF APPENDICES    | xiv  |
| CHAPTER 1       | INTRODUCTION        | 1    |
| 1.1             | Problem Background  | 1    |
| 1.2             | Problem Statement   | 3    |
| 1.3             | Research Objectives | 3    |
| 1.4             | Thesis Outline      | 4    |
| CHAPTER 2       | LITERATURE REVIEW   | 5    |
| 2.1             | Introduction        | 5    |

| 2.1 | Introduction |                                  | 5 |
|-----|--------------|----------------------------------|---|
|     | 2.1.1        | Research on Cell Placement       | 5 |
|     | 2.1.2        | Research on Simulated Annealing  | 6 |
|     | 2.1.3        | Research on High-Level Synthesis | 7 |
|     | 2.1.4        | Research on HW/SW Co-Design      | 7 |
| 2.2 | Research Gap |                                  | 7 |

| CHAPTER 3      | <b>RESEARCH METHODOLOGY</b>      | 8       |
|----------------|----------------------------------|---------|
| 3.1            | Introduction                     | 8       |
| 3.2            | Proposed Method                  | 9       |
|                | 3.2.1 Programming and Algorithm  | 11      |
|                | 3.2.2 Tools and Platforms        | 14      |
|                | 3.2.3 Benchmark Format and Files | 16      |
| 3.3            | Gant Chart                       | 17      |
| 3.4            | Chapter Summary                  | 18      |
| CHAPTER 4      | <b>RESULT AND DISCUSSION</b>     | 19      |
| 4.1            | Introduction                     | 19      |
|                | 4.1.1 Benchmark Files            | 20      |
|                | 4.1.2 Simulated Annealing        | 21      |
|                | 4.1.3 C/RTL Synthesize           | 24      |
|                | 4.1.4 Resource Consumption       | 27      |
|                | 4.1.5 Execution Time             | 28      |
|                | 4.1.6 Power Consumption          | 31      |
| CHAPTER 5      | CONCLUSION AND RECOMMENDATIONS   | 32      |
| 5.1            | Research Outcomes                | 32      |
| REFERENCES     |                                  | 34      |
| Appendices A – | В                                | 36 - 39 |

# LIST OF TABLES

| TABLE NO. | TITLE                                              | PAGE |
|-----------|----------------------------------------------------|------|
| Table 3.1 | Hardware specification.                            | 15   |
| Table 3.2 | ISPD Benchmark Description                         | 16   |
| Table 4.1 | Benchmark files.                                   | 20   |
| Table 4.2 | Simulated annealing of Benchmark files.            | 21   |
| Table 4.3 | Resource Consumed.                                 | 27   |
| Table 4.4 | C- simulation execution time for HPWL calculation. | 28   |
| Table 4.5 | RTL execution time for HPWL calculation.           | 29   |
| Table 4.6 | Power report summaries.                            | 31   |

# LIST OF FIGURES

| FIGURE NO. | TITLE                                                        | PAGE |
|------------|--------------------------------------------------------------|------|
| Figure 1.1 | IC design flow.                                              | 1    |
| Figure 3.1 | Project flow.                                                | 10   |
| Figure 3.2 | Example of simulated annealing with a local minimum.         | 11   |
| Figure 3.3 | Flowchart of the simulated-annealing algorithm.              | 12   |
| Figure 3.4 | C/RTL Verification Flow.                                     | 13   |
| Figure 3.5 | Flow chart of implementation steps using Vitis HLS.          | 14   |
| Figure 3.6 | Gant chart for this project.                                 | 17   |
| Figure 4.1 | Result of initial and final solution of simulated annealing. | 22   |
| Figure 4.2 | BM3 simulated annealing iteration result.                    | 23   |
| Figure 4.3 | Schedule viewer for standard design.                         | 24   |
| Figure 4.4 | Schedule viewer for dataflow design.                         | 25   |
| Figure 4.5 | Dataflow viewer of dataflow design.                          | 26   |
| Figure 4.6 | Differences of resource consumed.                            | 27   |
| Figure 4.7 | RTL Execution time.                                          | 29   |
| Figure 4.8 | Execution time of C-simulation and RTL.                      | 30   |
| Figure 4.9 | On-chip Power Consumed.                                      | 31   |

# LIST OF ABBREVIATIONS

| UTM  | - | Universiti Teknologi Malaysia              |
|------|---|--------------------------------------------|
| IC   | - | Integrated Circuit                         |
| HW   | - | Hardware                                   |
| SW   | - | Software                                   |
| SA   | - | Simulated Annealing                        |
| VLSI | - | Very Large-Scale Integration               |
| HLS  | - | High-Level Synthesis                       |
| RTL  | - | Register-Transfer Level                    |
| ISPD | - | International Symposium on Physical Design |
| DSE  | - | Design Space Exploration                   |

## LIST OF APPENDICES

| APPENDIX   | TITLE                                           | PAGE |
|------------|-------------------------------------------------|------|
| Appendix A | main.cpp – C to RTL synthesis for HPWL function | 36   |
| Appendix B | Simulated Annealing Algorithm                   | 39   |

### **CHAPTER 1**

### **INTRODUCTION**

#### 1.1 Problem Background

An integrated circuit (IC) is a microscopic array of electronic circuits and components that has been implanted or diffused onto the surface of a semiconducting material such as silicon. It is called an IC because the components, circuits, and base material are all made together or integrated onto a single piece of silicon wafer, as opposed to a discrete circuit in which the components are made separately from different materials and assembled later. ICs range in complexity from simple logic modules and amplifiers to complete microcomputers containing millions of elements.



Figure 1.1 IC design flow.

The ongoing crisis of global chip shortage from 2020 in which the demand for IC commonly known as semiconductor chips is greater than the supply has affecting more than 169 industries [2] and led to major price increment, supply shortages and queues amongst consumers that involve semiconductors products.[3][4][5]

Cell placement is a stage in the Very Large-Scale Integration (VLSI) design flow where cell locations are identified which affects timing, routability, power consumption and performance of a chip [6]. An industry strength placer should not only be able solve the placement problem in a reasonable time but also present a solution that is of high quality [6]. In the physical design, VLSI cell placement is the process of determining the position of each cell on a die surface such that no cell overlaps with the other [7]. Debugging a faulty VLSI chip is both difficult and time consuming, since the turnaround time for design changes takes from several weeks to months. Long design times may lead to miss the opportunities of marketing the chip ahead from other competitors and losing the investment. This leads batching of design changes and the use of design methods that enforce perfect designs.

One of the most popular tasks related to Computer Aided Design (CAD) is the VLSI floor planning. The main purpose of cell placement is to provide the best conditions for the further routing. Criteria and estimations are introduced and optimized in terms of the best placement conditions [8]. High-Level Synthesis (HLS), sometimes referred to as C synthesis, algorithmic synthesis or behavioral synthesis, is an automated design process that convert an algorithmic description of a behavior and creates digital hardware that implements that behavior efficiently. Due to the need of increasing design productivity, HLS has finally become mainstream in most VLSI design companies [9].

### 1.2 Problem Statement

VLSI cell placement is the most time-consuming task in the IC physical design flow as it involves finding the optimum placement of millions of standard cells and macros in a chip floorplan. Nowadays, chip manufacturers are also concerned with fast time-to-market of the ICs, therefore fast cycle time from design to manufacturing and market their products is essential to achieve this goal.

If look into the Classical RTL methods, this method is time consuming in terms of design space exploration (DSE) and possibilities compared to HLS method. The cost of this high level of flexibility is the fact that RTL design is difficult, time consuming in developing and debugging and therefore expensive. This is exactly the reason why HLS tools are more interested than RTL in terms of time consume. Lastly, different designs may require different cells sizes and different design styles may introduce different constraints, thus there are many variations in the VLSI placement problem. [10]

### 1.3 Research Objectives

To overcome the problem stated, the objectives of the research are:

- (a) To design & implement VLSI cell placement algorithm using HW/SW codesign and HLS methodology.
- (b) To perform design space exploration for using HW/SW configuration in placement algorithm.
- (c) To analyse and compare execution time, power consumption and wirelength of the design floorplans using HLS and HW/SW co-design.

### 1.4 Thesis Outline

This thesis consists of five main chapters. Chapter 1 is discussing on the project introduction which consist of problem background, problem statement, research objectives, scopes and report outline. Next in Chapter 2, the discussion of project literature review along with the previous work done and studies are being discussed based on four main part which is the Cell Placement, Simulated Annealing, High-Level Synthesis and HW/SW Co-Design. In Chapter 3, the research methodology to implement, validate and simulate on HLS of VLSI Cell Placement are explained in detail together with project plan. The main tools used to carry out the simulation is also presented in this chapter. In Chapter 4, the results are presented from simulation and synthesize work. This chapter also will discuss on the result obtain. Lastly in chapter 5, conclusion on overall of this project is presented and also be listing recommendation on future work.

#### REFERENCES

[1] A. B. Kahng, J. Lienig, I. L. Markov and J. Hu, VLSI Physical Design: From Graph Partitioning to Timing Closure. New York, NY: Springer, 2011.

[2] Howley, Daniel, "These 169 industries are being hit by the global chip shortage". Yahoo Finance. Retrieved 2021-10-17.

[3] "Global shortage in computer chips 'reaches crisis point'". The Guardian. 2021-03-21. Retrieved 2021-10-17.

[4] Shead, Sam, "The global chip shortage is starting to have major real-world consequences". CNBC. Retrieved 2021-10-17.

[5] Leprince-Ringuet, Daphne, "The global chip shortage is a much bigger problem than everyone realised. And it will go on for longer, too". ZDNet. Retrieved 2021-10-17.

[6] S. Pawanekar, G. Trivedi and K. Kapoor, "A Nonlinear Analytical Optimization Method for Standard Cell Placement of VLSI Circuits," 2015 28th International Conference on VLSI Design, 2015, pp. 423-428

[7] J. Chen, Z. Peng and W. Zhu, "A VLSI global placement solver based on proximal alternating direction method," 2015 IEEE 11th International Conference on ASIC (ASICON), 2015, pp. 1-4

[8] E. V. Kuliev, V. V. Kureichik and I. O. Kursitys, "Decision making in VLSI components placement problem based on grey wolf optimization," 2019 IEEE East-West Design & Test Symposium (EWDTS), 2019, pp. 1-4

[9] S. Xu, J. Chen and B. C. Schafer, "HW/SW co-design experimental framework using configurable SoCs," 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2017, pp. 1-6

[10] J. Chen and W. Zhu, "An Analytical Placer for VLSI Standard Cell Placement," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 8, pp. 1208-1221, Aug. 2012.

[11] Y. Lin, W. Li, J. Gu, H. Ren, B. Khailany and D. Z. Pan, "ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 12, pp. 5083-5096, Dec. 2020

[12] S. Dhar, L. Singhal, M. Iyer and D. Pan, "FPGA Accelerated FPGA Placement," 2019 29th International Conference on Field Programmable Logic and Applications (FPL), 2019, pp. 404-410

[13] M. Sanjabi, A. Jahanian, S. Amanollahi and N. Miralaei, "ParSA: Parallel simulated annealing placement algorithm for multi-core systems," The 16th CSI International Symposium on Computer Architecture and Digital Systems (CADS 2012), 2012, pp. 19-24.

[14] A. Choong, R. Beidas and J. Zhu, "Parallelizing Simulated Annealing-Based
Placement Using GPGPU," 2010 International Conference on Field
Programmable Logic and Applications, 2010, pp. 31-34.

[15] Y. Choi and J. Cong, "HLS-Based Optimization and Design Space Exploration for Applications with Variable Loop Bounds," 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018, pp. 1-8.

[16] I. Shin, S. Paik, D. Shin and Y. Shin, "HLS-dv: A High-Level Synthesis
Framework for Dual-Vdd Architectures," in IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 20, no. 4, pp. 593-604, April 2012.

 S. Yousuf and A. Gordon-Ross, "An Automated Hardware/Software Co-Design Flow for Partially Reconfigurable FPGAs," 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016, pp. 30-35.

[18] "No Title," [Online]. Available: http://www.ispd.cc/contests/11/ispd2011 \_contest.html.