# AN EMBEDDED SYSTEM FOR NETWORKING SECURITY APPLYING CRYPTOGRAPHIC ACCELERATION IN FIELD PROGRAMMABLE GATE ARRAY HARDWARE

VISHNU A/L PARAMASIVAM

UNIVERSITI TEKNOLOGI MALAYSIA

# AN EMBEDDED SYSTEM FOR NETWORKING SECURITY APPLYING CRYPTOGRAPHIC ACCELERATION IN FIELD PROGRAMMABLE GATE ARRAY HARDWARE

### VISHNU A/L PARAMASIVAM

A thesis submitted in fulfilment of the requirements for the award of the degree of Master of Engineering (Electrical)

> Faculty of Electrical Engineering Universiti Teknologi Malaysia

> > NOVEMBER 2009

Dedicated to my beloved family

#### ACKNOWLEDGEMENT

First and foremost, I would like to extend my deepest gratitude to my project supervisor, Prof. Dr. Mohamed Khalil bin Mohd Hani, for giving me the opportunity to work the area of work that interests me the most. His constant encouragement, critics and guidance were key to bringing this project to a fruitful completion. I have learned and gained much, not only in the field of research, but also in the lessons of life.

My sincerest appreciation goes to my co-supervisors, Prof. Madya Munim Zabidi and Dr. Nadzir Marsono, and my seniors Jasmine Hau Yuan Wen and Ilyasak for their support, help, and technical advices. I have learned much from them, as well as receiving plenty of guidance and motivation.

I would also like to thank all those who have contributed directly and indirectly to the completion of this research and thesis. This includes my fellow postgraduate students, Mohd Nazrin and Arif Irwansyah, who provided me with help and company during my study here.

Finally, I would like to thank my family for always being there for me, through thick and thin. Their role in my life is something I will always need.

#### ABSTRACT

The Internet is an insecure medium. The Secure Socket Layer (SSL) protocol and its successor Transport Layer Security (TLS) can be used to secure applications that communicate over a network. The most widely deployed, freely available implementation of the SSL/TLS protocol is the OpenSSL library. When using the SSL/TLS protocol, the computational power required is typically too much for most embedded systems, because cryptographic functions are computationally extensive. The solution to this problem would be to perform hardware acceleration of computationally intensive cryptographic functions. This thesis proposes an embedded cryptosystem with Field Programmable Gate Array based hardware acceleration for networking security, applying the OpenSSL cryptographic protocol. The key cryptographic functions used in SSL/TLS-driven connections are Advanced Encryption Standard (AES), Secure Hash Algorithm (SHA), Rivest-Shamir-Adleman (RSA), and Random Number Generation (RNG). The AES hardware symmetric cryptographic hardware core is newly designed, the SHA-1, SHA-2, RNG, and RSA cores are improved from previous work, and the system bus interface of these hardware cores are upgraded. All of these hardware cores are integrated into an embedded system implemented as a System-on-Chip. Finally, the OpenSSL cryptographic library is accelerated using this cryptosystem to improve the performance of networking security. Nios2-Linux Real Time Operating System is used within the embedded system. It provides native support for Ethernet, Universal Serial Bus, multitasking, standard Linux functions, and has a large collections of ready-to-use libraries, which includes the OpenSSL library. Applications are written to test, verify, and benchmark the embedded cryptosystem. Results show an improvement in performance by 9 to 278 times of the OpenSSL crypto library, depending on the algorithm accelerated. The performance for networking security using the SSL/TLS protocol through the OpenSSL library is also improved.

#### ABSTRAK

Internet adalah suatu bahantara yang tidak selamat. Protokol Secure Socket Layer (SSL) dan penggantinya Transport Layer Security (TLS) biasanya digunakan untuk aplikasi yang perlu berkomunikasi secara selamat dalam rangkaian Internet. Sistem pelaksanaan protokol SSL/TLS yang digunakan secara meluas dan percuma adalah melalui rutin perpustakaan OpenSSL. Bila menggunakan protokol SSL/TLS, kuasa pengiraan yang dikehendaki lazimnya terlalu tinggi untuk sistem komputer terbenam disebabkan fungsi-fungsi kriptografi yang kompleks. Penyelesaian yang sesuai untuk masalah ini adalah dengan menggunakan pecutan litar logik untuk fungsifungsi kriptografi yang intensif. Tesis ini mencadangkan satu sistem kripto terbenam dengan mengunakan Field Programmable Gate Array yang berasaskan pecutan litar logik untuk keselamatan perangkaian perhubungan dengan mengaplikasikan protokol kriptografi OpenSSL. Fungsi-fungsi kriptografi yang penting dalam sambungan SSL/TLS adalah Advanced Encryption Standard (AES), Secure Hash Algorithm (SHA), Rivest-Shamir-Adleman (RSA), dan Random Number Generation (RNG). Pecutan perkakasan untuk fungsi kriptografi AES direka, pecutan litar logik SHA-1, SHA-2, RNG, dan RSA dinaik taraf daripada projek-projek sebelumnya, dan sistem antaramuka untuk litar-litar logik tersebut juga dinaik taraf. Kesemua litar logik ini disepadukan dalam sebuah sistem terbenam yang dikenali sebagai System-on-Chip. Akhirnya, rutin perpustakaan kriptografi OpenSSL dipecut dengan menggunakan sistem kripto ini untuk meningkatkan kelajuan keselamatan perangkaian. Sistem pengendalian masa sebenar Nios2-Linux digunakan dalam sistem terbenam ini. Ia menyediakan infrastruktur untuk Ethernet, Universal Serial Bus, pengendalian berbilang tugas, fungsi-fungsi lazim Linux, dan koleksi besar rutin perpustakaan yang sedia digunakan, termasuk OpenSSL. Beberapa aplikasi ditulis untuk menguji, mengesah, dan mengukur kelajuan sistem kripto terbenam ini. Berdasarkan keputusan yang diberi oleh applikasi ini, peningkatan sebanyak 9 hingga 278 kali ganda diperhatikan dalam pengedalian rutin perpustakaan OpenSSL. Secara tidak langsung, kelajuan keselamatan perangkaian SSL/TLS melalui rutin perpustakaan OpenSSL juga dipertingkatkan.

# TABLE OF CONTENTS

| CHAPTER | TITLE       |                                              | PAGE         |
|---------|-------------|----------------------------------------------|--------------|
|         | DECLARATION |                                              |              |
|         | DEDIC       | CATION                                       | iii          |
|         | ACKN        | OWLEDGEMENT                                  | iv           |
|         | ABST        | RACT                                         | v<br>vi<br>x |
|         | ABST        | RAK                                          |              |
|         | TABL        | E OF CONTENTS                                |              |
|         | LIST (      | OF TABLES                                    | xii          |
|         | LIST (      | OF FIGURES                                   | XV           |
|         | LIST (      | OF ABBREVIATIONS                             | xvi          |
|         | LIST (      | OF APPENDICES                                | xviii        |
|         |             |                                              |              |
| 1       | INTRO       | DUCTION                                      | 1            |
|         | 1.1         | Background                                   | 1            |
|         | 1.2         | Problem Statement                            | 3            |
|         | 1.3         | Objectives                                   | 4            |
|         | 1.4         | Scope of Work                                | 5            |
|         | 1.5         | Methodology                                  | 6            |
|         | 1.6         | Research Contribution                        | 8            |
|         | 1.7         | Thesis Organization                          | 8            |
| 2       | LITEF       | RATURE REVIEW AND BACKGROUND                 | 10           |
|         | 2.1         | Previous Work                                | 10           |
|         | 2.2         | Nios2-Linux                                  | 12           |
|         | 2.3         | The SSL/TLS Protocol                         | 14           |
|         |             | 2.3.1 OpenSSL                                | 17           |
|         | 2.4         | Certificates                                 | 17           |
|         | 2.5         | Cryptography: The AES Symmetric Block Cipher | 19           |
|         | 2.6         | Cryptography: Hashing                        | 21           |
|         | 2.7         | Public Key Cryptography                      | 22           |

23

viii

3

4

| THE  | EMBE    | DDED CRYPTOSYSTEM - SYSTE                 | $\mathbf{M}$ |
|------|---------|-------------------------------------------|--------------|
| OVER | VIEW AN | ND NIOS2-LINUX                            | 25           |
| 3.1  | Introdu | ction                                     | 25           |
| 3.2  | System  | Layers                                    | 26           |
|      | 3.2.1   | The Application Layer                     | 27           |
|      | 3.2.2   | The Operating System Layer                | 28           |
|      | 3.2.3   | The Hardware Layer                        | 30           |
| 3.3  | Nios2-I | Linux: Setup and Implementation           | 32           |
|      | 3.3.1   | Compiling the Nios2-Linux Kernel          | 34           |
|      | 3.3.2   | Nios2-Linux Root Filesystem               | 35           |
|      | 3.3.3   | Nios2-Linux Boot Sequence                 | 35           |
|      | 3.3.4   | Cross-Compiling Programs for Nios2-Linux  | 36           |
|      | 3.3.5   | IO Programming in User Space              | 37           |
|      | 3.3.6   | Customizing the Kernel and Bundled Appli- |              |
|      |         | cations                                   | 38           |

HARDWARE DESIGN OF THE EMBEDDED CRYPTOSYS-TEM

| TEM |         |                                          | 39 |
|-----|---------|------------------------------------------|----|
| 4.1 | Overvi  | ew                                       | 39 |
| 4.2 | Nios II | Processor and the Avalon Interface       | 39 |
|     | 4.2.1   | Avalon Interface Specifications          | 41 |
|     | 4.2.2   | Designing the Avalon Interface Unit      | 43 |
|     | 4.2.3   | Device Driver Design                     | 48 |
| 4.3 | Design  | of the AES Cryptographic Core            | 50 |
|     | 4.3.1   | The Key Expansion Block                  | 53 |
|     | 4.3.2   | The AES Core Avalon Interface and Device |    |
|     |         | Driver                                   | 58 |
| 4.4 | SHA-1   | Cryptographic Core                       | 60 |
|     | 4.4.1   | Device Driver Design for the SHA-1 Core  | 62 |
| 4.5 | SHA-2   | Cryptographic Core                       | 65 |
|     | 4.5.1   | Device Driver Design for the SHA-2 Core  | 65 |
| 4.6 | RSA C   | ryptographic Core                        | 67 |
|     | 4.6.1   | Device Driver Design for the RSA Core    | 69 |
| 4.7 | RNG H   | Iardware Core                            | 71 |
|     | 4.7.1   | Device Driver Design for the RNG Core    | 72 |
| 4.8 | USB C   | hip Controller                           | 73 |

| 4.9  | Custom           | Clock Counter Unit                           | 76         |
|------|------------------|----------------------------------------------|------------|
| 4.10 | Etherne          | et and Memory Device Controllers             | 77         |
| SOFT | WARE SI          | UBSYSTEM DESIGN AND OPENSSL INTE             | -          |
| GRAT |                  |                                              | 78         |
| 5.1  | Introdu          | ction                                        | 78         |
| 5.2  | Configu          | uring the OpenSSL Library for the Target     |            |
|      | Enviror          |                                              | 80         |
| 5.3  | Modifie          | cation of the Cryptographic Functions in the |            |
|      |                  | SL Library                                   | 81         |
|      | 5.3.1            | Modification of the AES Function             | 86         |
|      | 5.3.2            | Modification of the SHA-1 Function           | 88         |
|      | 5.3.3            | Modification of the SHA-2 Function           | 89         |
|      | 5.3.4            | Modification of the RSA Function             | 91         |
|      | 5.3.5            | Calculating the Montgomery Value             | 96         |
| 5.4  | Certific         | tate Generation                              | 98         |
| 6.1  | Verifica         | ation of the Hardware Cryptographic Cores    | 101        |
|      |                  | FICATION AND SYSTEM VALIDATION DISCUSSION    | 101        |
| 0.1  |                  | Verification of the AES Core                 | 101        |
|      | 6.1.1<br>6.1.2   |                                              | 102<br>104 |
|      |                  |                                              |            |
|      | 6.1.3            |                                              | 105        |
|      | 6.1.4            | Verification of the RSA Core                 | 107        |
| ()   | 6.1.5<br>Derferr | Verification of the RNG Core                 | 109        |
| 6.2  |                  | nance Test and Results                       | 110        |
|      | 6.2.1            | Performance of the AES Core                  | 110        |
|      | 6.2.2            | Performance of the Hardware Accelerated      | 110        |
| 6.2  | Crustan          | OpenSSL Cryptographic Functions              | 112        |
| 6.3  | -                | Validation with Application Prototypes       | 114        |
|      | 6.3.1            | Complete Cryptographic Test Program          | 114        |
|      | 6.3.2            | USB File Encryption and Decryption Program   | 116        |
|      | 6.3.3            | Secure Bank Check Transfer Program           | 117        |
| CONC | CLUSION          | S                                            | 122        |
| 7.1  | Conclu           | ding Remarks                                 | 122        |
|      |                  | 8                                            |            |

7.2Future Work124

ix

| REFERENCES       | 125       |
|------------------|-----------|
| Appendices A – G | 128 – 165 |

## LIST OF TABLES

| TABLE NO. |  |
|-----------|--|
|           |  |

# TITLE

### PAGE

| 2.1  | Time taken for several network functions on the RCM2100            |      |
|------|--------------------------------------------------------------------|------|
|      | embedded system [2].                                               | 11   |
| 4.1  | Avalon interface register table for the AES core.                  | 45   |
| 4.2  | Example of cached and uncached addresses for hardware cores.       | 50   |
| 4.3  | Avalon interface register table for the SHA-1 core.                | 63   |
| 4.4  | Avalon interface register table for the SHA-2 core.                | 66   |
| 4.5  | Avalon interface register table for the RSA core.                  | 70   |
| 4.6  | Avalon interface register table for the clock counter unit.        | 76   |
| 5.1  | Example 8-bit memory address pointers and its equivalent 32-bit    |      |
|      | memory address pointers when converted.                            | 85   |
| 6.1  | AES key expansion execution time comparison in clock cycles.       | 111  |
| 6.2  | AES core statistics and performance.                               | 111  |
| 6.3  | Comparison of AES core latency with other designs in clock cycles  | .111 |
| 6.4  | AES core performance in the Nios2-Linux environment.               | 112  |
| 6.5  | Execution time in clock cycles for the AES algorithm.              | 112  |
| 6.6  | Performance results for the Rijndael algorithm on the              |      |
|      | TMS320C6201 [4].                                                   | 113  |
| 6.7  | Execution time in clock cycles for the SHA-1/SHA-2 hash            |      |
|      | algorithm.                                                         | 113  |
| 6.8  | Execution time in clock cycles for the RSA private key encryption. | 113  |
| 6.9  | Execution time in clock cycles for the RSA public key encryption.  | 114  |
| 6.10 | Execution time in seconds for the complete cryptographic test      |      |
|      | program.                                                           | 115  |
| 6.11 | File encryption and decryption in AES for a large file.            | 117  |
| 6.12 | File encryption and decryption in AES for a small file.            | 117  |
| 6.13 | Execution time in seconds for the secure bank check transfer       |      |
|      | application.                                                       | 121  |

# LIST OF FIGURES

### FIGURE NO.

# TITLE PAGE

| 1.1 | Illustration of a man in the middle attack.                     | 2  |
|-----|-----------------------------------------------------------------|----|
| 1.2 | Layers of the proposed embedded cryptosystem.                   | 5  |
| 1.3 | Secure communication between the embedded system and a PC.      | 6  |
| 1.4 | Methodology used to achieve research objectives.                | 7  |
| 2.1 | Diagram of the connection between Nios2-Linux, FPGA, Nios       |    |
|     | II CPU and the external hardware.                               | 13 |
| 2.2 | The SSL protocol [16].                                          | 15 |
| 2.3 | The AES encryption and decryption process [20].                 | 20 |
| 2.4 | One iteration in a SHA-2 family compression function [21].      | 21 |
| 2.5 | Illustration of how communication with public key cryptography  |    |
|     | works.                                                          | 23 |
| 3.1 | Secure communication between the embedded system and a PC.      | 26 |
| 3.2 | Layers of the proposed embedded cryptosystem and its relation   |    |
|     | with the prototyping hardware.                                  | 27 |
| 3.3 | Nios2-Linux RTOS functional block diagram.                      | 28 |
| 3.4 | Functional block diagram of the hardware layer.                 | 30 |
| 3.5 | Nios2-Linux hello world program.                                | 36 |
| 3.6 | Nios2-Linux memory pointers to hardware.                        | 37 |
| 3.7 | Nios2-Linux version of the IORD and IOWR macros.                | 37 |
| 4.1 | Avalon slave interface to a PIO with only output signals [27].  | 42 |
| 4.2 | Slave read and write transfers with fixed wait-states [27].     | 42 |
| 4.3 | The AES core connected to an Avalon interface.                  | 44 |
| 4.4 | Verilog design of the Avalon Interface (main part).             | 46 |
| 4.5 | Sending the input data to the AES core using C-language macros. | 49 |
| 4.6 | Reading the output data from the AES core using C-language      |    |
|     | macros.                                                         | 49 |
| 4.7 | The AES encryption and decryption process.                      | 51 |

| 4.8  | AES top level FSM flowchart.                                    | 52  |
|------|-----------------------------------------------------------------|-----|
| 4.9  | AES cryptographic core functional block diagram.                | 53  |
| 4.10 | The Rijndael key schedule.                                      | 53  |
| 4.11 | <i>Rcon</i> lookup table in Verilog.                            | 55  |
| 4.12 | Multiplexed inputs for <i>S</i> -box with rotated output.       | 57  |
| 4.13 | Multiplexing the first <i>S</i> -box with rotation in Verilog.  | 57  |
| 4.14 | Key expansion functional block diagram.                         | 58  |
| 4.15 | Key expansion IO block diagram.                                 | 58  |
| 4.16 | AES core block diagram.                                         | 59  |
| 4.17 | The AES core device driver.                                     | 60  |
| 4.18 | The SHA-1 cryptographic core with Avalon interfacing.           | 61  |
| 4.19 | The modified SHA-1 core.                                        | 62  |
| 4.20 | The SHA-1 core driver.                                          | 64  |
| 4.21 | The SHA-2 cryptographic core with Avalon interfacing.           | 65  |
| 4.22 | The SHA-2 core driver.                                          | 67  |
| 4.23 | The RSA cryptographic core with Avalon interfacing.             | 68  |
| 4.24 | The RNG cryptographic core with Avalon interfacing.             | 71  |
| 4.25 | The RNG core driver.                                            | 72  |
| 4.26 | Microtronix USB-Ethernet connections to Santa Cruz Header.      | 73  |
| 4.27 | IO for Avalon interface and the physical connections.           | 74  |
| 4.28 | Declaring registers and assigning the Avalon signals to the     |     |
|      | corresponding ISP1161A pins.                                    | 74  |
| 4.29 | Pin assignments in respect to the status of the reset signal.   | 75  |
| 4.30 | Functional block diagram of the clock counter unit.             | 76  |
| 4.31 | The custom clock counter device driver.                         | 77  |
| 5.1  | The OpenSSL library and its connections to the other layers.    | 79  |
| 5.2  | The <i>linuxsystem.h</i> common header file.                    | 83  |
| 5.3  | Illustration on byte endianness.                                | 84  |
| 5.4  | Memory copying function to convert an 8-bit pointer into as 32- |     |
|      | bit pointer to avoid memory address misalignment.               | 86  |
| 5.5  | Using OpenSSL to do a AES encryption operation.                 | 87  |
| 5.6  | Using OpenSSL to do a SHA-1 hashing operation.                  | 88  |
| 5.7  | Using OpenSSL to do a SHA512 hashing operation.                 | 90  |
| 5.8  | Using the OpenSSL command line tool to generate RSA key pairs.  | 93  |
| 5.9  | Using OpenSSL to do RSA2048 encryption and decryption.          | 94  |
| 5.10 | The Montgomery value calculation pseudocode.                    | 97  |
| 5.11 | The Montgomery value calculation function.                      | 98  |
|      |                                                                 |     |
| 6.1  | Print screen of the AES core test program showing the results.  | 103 |
|      |                                                                 |     |

| 6.2  | Print screen of the SHA-1 core test program showing the results. | 105 |
|------|------------------------------------------------------------------|-----|
| 6.3  | Print screen of the SHA-2 core test program showing the results. | 106 |
| 6.4  | Print screen of the RSA core test program showing the results.   | 108 |
| 6.5  | Print screen of the RSA core test program showing the results.   | 110 |
| 6.6  | The USB file encryption and decryption program.                  | 116 |
| 6.7  | The secure bank check transfer program.                          | 118 |
| 6.8  | Generating a plaintext electronic check.                         | 118 |
| 6.9  | Hashing and signing an electronic check.                         | 119 |
| 6.10 | Signature verification of the electronic check.                  | 119 |
| 6.11 | Signature verification successful notification on bank server.   | 120 |
| 6.12 | Signature verification failure notification on bank server.      | 120 |
| A.1  | Slave read and write transfers with waitrequest.                 | 129 |
| G.1  | Lookup table for the <i>Rcon</i> function.                       | 165 |

## LIST OF ABBREVIATIONS

| 3DES | _ | Triple Data Encryption Algorithm        |
|------|---|-----------------------------------------|
| AES  | _ | Advanced Encryption Standard            |
| API  | _ | Application Programming Interface       |
| ASIC | _ | Application Specific Integrated Circuit |
| BN   | _ | Big Number                              |
| CA   | _ | Certificate Authority                   |
| CPU  | _ | Central Processing Unit                 |
| DES  | _ | Data Encryption Standard                |
| ECC  | _ | Elliptic Curve Cryptography             |
| FIFO | _ | First In First Out                      |
| FIPS | _ | Federal Information Processing Standard |
| FPGA | _ | Field Programmable Gate Array           |
| FSM  | _ | Finite State Machine                    |
| GPU  | _ | Graphic Processing Unit                 |
| GUI  | _ | Graphical User Interface                |
| HAL  | _ | Hardware Abstraction Layer              |
| HDL  | _ | Hardware Development Language           |
| IC   | _ | Integrated Circuit                      |
| I/O  | _ | Input/Output                            |
| IP   | _ | Intellectual Property                   |
| LE   | _ | Logic Elements                          |
| Mbit | _ | Mega Bits                               |
| MD5  | _ | Message Digest Algorithm (Fifth Series) |
|      |   |                                         |

| MHz   | _ | Mega Hertz                                                            |
|-------|---|-----------------------------------------------------------------------|
| ms    | _ | millisecond                                                           |
| PC    | _ | Personal Computer                                                     |
| PCI   | _ | Peripheral Component Interconnect                                     |
| PIO   | _ | Parellel Input Output                                                 |
| PKI   | _ | Public Key Infrastructure                                             |
| RISC  | _ | Reduced Instruction Set Computer                                      |
| RNG   | _ | Random Number Generator                                               |
| ROM   | _ | Read Only Memory                                                      |
| RSA   | _ | Rivest-Shamir-Adleman                                                 |
| RTL   | _ | Register Transfer Level                                               |
| SDRAM | _ | Synchronous Dynamic Random Access Memory                              |
| SHA-1 | _ | Secure Hash Algorithm - 1                                             |
| SHA-2 | _ | Secure Hash Algorithm - 2                                             |
| SoC   | _ | System-on-Chip                                                        |
| SOPC  | _ | System-on-Programmable-Chip                                           |
| SSL   | _ | Secure Sockets Layer                                                  |
| TLS   | _ | Transport Layer Security                                              |
| UART  | _ | Universal Asynchronous Receiver Transmitter                           |
| UNIX  | - | Uniplexed Information and Computing System (originally spelled UNICS) |
| USB   | _ | Universal Serial Bus                                                  |
| VHDL  | _ | Very High Speed Integrated Circuit Hardware Description Language      |
| VLSI  | _ | Very Large Scale Integration                                          |
| VoIP  | _ | Voice over Internet Protocol                                          |

### LIST OF APPENDICES

| APPENDIX | TITLE                                       | PAGE |
|----------|---------------------------------------------|------|
| A        | AVALON-MM INTERFACE SPECIFICATION           | 128  |
| В        | AES CORE SOURCE CODE                        | 131  |
| С        | AVALON INTERFACE SOURCE CODES               | 143  |
| D        | PORTING THE OPENSSL LIBRARY                 | 149  |
| E        | OPENSSL CRYPTOGRAPHIC LIBRARY MODIFICATIONS | 152  |
| F        | TEST APPLICATION SOURCE CODES AND RSA TEST  | ,    |
|          | VECTORS                                     | 158  |
| G        | MISCELLANEOUS HARDWARE SOURCE CODES         | 165  |

### **CHAPTER 1**

#### **INTRODUCTION**

This chapter provides background information, the problem statement, objectives, work scope, methodology, and research contribution of this thesis, which is about building an embedded cryptosystem for network security applying hardware acceleration in Field Programmable Gate Array (FPGA).

### 1.1 Background

The Internet is an insecure medium [1]. It is possible to intercept and modify data on the Ethernet wire, and the process of doing so is not complicated. An average hacker will have an easy time eavesdropping if the data transferred is sent in plaintext or other standard formats. This can be done using *man in the middle attacks*, as depicted in Figure 1.1. This is because data on a network is broadcast, hence even amateur hackers can listen in. Hence, applications that do not properly protect data when using an untrusted medium are very vulnerable.

Network security consists of the provisions made in an underlying computer network infrastructure, policies adopted by the network administrator to protect the network and the network-accessible resources from unauthorized access and the effectiveness (or lack) of these measures combined together. Currently, the most widely deployed method for Internet security is the Secure Sockets Layer (SSL) cryptographic protocol. The most popular implementations are built using the OpenSSL library.

SSL is an excellent protocol, but it is a lot slower than a traditional unsecured TCP/IP connection [1]. Cryptographic functions on embedded systems



Figure 1.1: Illustration of a man in the middle attack.

are computationally extensive, especially asymmetric cryptography such as Rivest-Shamir-Adleman (RSA) for key generation. Symmetric cryptography like Advanced Encryption Standard (AES) and Triple Data Encryption Algorithm (3DES) works on a per block basis. Every data transferred must be encrypted/decrypted. Therefore, when using the SSL cryptographic protocol, the computational power required is typically too much for most embedded systems [2]. This includes the Altera Nios II which only runs on a 50MHz clock.

This is important, because the CPU has to spend its resources on doing other tasks. One particularly good example would be Voice over Internet Protocol (VoIP) and credit card transactions. Voice data has to be transferred as fast to deliver seamlessly, the data must be encrypted in real time. This is not possible without cryptographic acceleration. On a different take, credit card transactions are very slow due to the use of multiple cryptographic protocols (especially RSA or ECC), typically taking 20-40 seconds for a 50MHz embedded system [2]. A cash transaction would definitely be faster. An embedded system with cryptographic acceleration would definitely be useful in these situations, because a single transaction would take less than a second, assuming there is no network congestion.

Another advantage of an embedded system is that hardware computations are definitely much safer than that done in software. This is due to attacks that use timing analysis, known as the timing attack [3]. In cryptography, a timing attack is a side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms. Every logical operation in a computer takes time to execute, and the time can differ based on the input; with precise measurements of the time for each operation, an attacker can work backwards to the input. Information can leak from a system through measurement of the time it takes to respond to certain queries. How much such information can help an attacker depends on many variables: cryptosystem design, the CPU running the system, the algorithms used, assorted implementation details, timing attack countermeasures, the accuracy of the timing measurements, etc.

Timing attacks are often overlooked in the design phase because it is very dependent on the implementation [3]. Software implementations use several methods to prevent this type of attack. For example, RSA uses a method known as blinding. RSA blinding uses extra processing time, but effectively 'blinds' a timing attack by changing the amount of time taken for the RSA operation. However, with accelerated hardware cores, the amount of time taken for by hardware accelerated cores are the same regardless of the data processed. In other words, the amount of clock cycles taken for any cryptographic operation done in hardware is set in stone. So technically, a timing attack is useless to make an accurate prediction.

#### **1.2** Problem Statement

Firstly, data security comes at a high cost for embedded systems. The performance of most embedded CPUs (such as ARM and Nios II) are just not adequate enough to perform cryptography as well as do other tasks such as image or signal processing.

Secondly, the most computer intensive aspect of networking security is cryptography. This becomes very apparent when high strength cryptography is applied in the security protocol. High strength cryptography is typically required in banking and military based applications.

A possible solution to this problem would be to perform hardware acceleration of the computer intensive cryptographic functions. The advantage of this approach is, not only performance is enhanced, but also the level of security is increased, since the cryptographic functions are computed in hardware. In computing, hardware acceleration is the use of hardware to perform some function faster than is possible in software running on a general purpose CPU. Examples of hardware acceleration include blitting acceleration functionality in graphics processing units (GPUs) and instructions for complex operations in CPUs. A good cryptographic protocol is vital for ensuring security to provide confidentiality, authenticity, integrity, and non-repudiation. The Secure Socket Layer (SSL) protocol and its successor Transport Layer Security (TLS) can be used to secure web based applications that need to communicate over a network [1].

The key cryptographic functions used in SSL-driven connections are AES, SHA-1, SHA-2, RSA, and random number generation. Theoretically, if these functions are accelerated, it will greatly increase the speed of an SSL connection. All these functions can be accelerated, by designing hardware cryptographic accelerator cores. These hardware cores are designed using HDL and can be deployed in Field Programmable Gate Array (FPGA) based systems.

To work with an embedded system supported with the OpenSSL library, Ethernet, and Universal Serial Bus (USB) a real time operating system is needed. There are many freely available embedded operating systems available in the market, but Nios2-Linux (based on  $\mu$ Clinux) is the most compatible one to be applied in this work. Nios2-Linux provides native support for Ethernet, TCP/IP stack, USB, multitasking, standard UNIX functions, and has a large collections of ready-to-use libraries, which includes the OpenSSL library.

### 1.3 Objectives

- 1. To design an embedded cryptosystem with FPGA-based hardware acceleration for networking security, applying the OpenSSL cryptographic protocol. This involves the following sub-objectives:
  - (a) Customize the design of the AES hardware symmetric cryptographic hardware core to be dynamic and able to cater for every type of situation.
  - (b) Upgrade and modify the SHA-1, SHA-2, RNG, and RSA cores so that they are faster and better suited to work with the OpenSSL library.
  - (c) Improve the system bus interface of these hardware cores.
  - (d) Integrate all the above cores into an embedded system implemented as a System-on-Chip (SoC), as depicted in Figure 1.2.
  - (e) Accelerate the OpenSSL cryptographic library using this cryptosystem to improve the performance of networking security.

- 2. To design the following applications using the embedded cryptosystem proposed in the first objective:
  - (a) A complete cryptographic function test program.
  - (b) Secure bank check transfer system using networking security.
  - (c) File encryption and decryption program using the AES algorithm.



Figure 1.2: Layers of the proposed embedded cryptosystem.

### 1.4 Scope of Work

- Verilog HDL is applied in the design of cryptographic hardware cores and the interface to the Nios II processor.
- The OpenSSL library is used for key exchange, symmetric block cipher, asymmetric block cipher, and hashing algorithms.
- Nios2-Linux operating system is deployed in the embedded system to complement the OpenSSL library with Ethernet, USB, and filesystem functionality.
- Test applications are limited to C/C++ programs linked to the OpenSSL library.
- Cryptographic acceleration is limited to the following algorithms:
  - RSA 512/1024/2048 bits.
  - SHA 160 bits.
  - SHA 224/256/38/512 bits

- AES 128/192/256 bits
- Random Number Generation.
- The prototype is designed and implemented on the Altera Stratix II FPGA development board, with the Nios II embedded processor running at 50 MHz.

#### 1.5 Methodology

This project integrates cryptographic (crypto) acceleration into embedded systems for use in networking security. The embedded system environment proposed in this work consists of a FPGA-based system on chip running on the Nios II CPU, coupled with hardware crypto accelerator cores for AES, SHA-1, SHA-2, RSA, and a random number generator (RNG). The hardware runs on Nios2-Linux (real-time operating system), which is configured with Ethernet connectivity, USB functionality, and a working build of the OpenSSL library. The hardware cores are directly linked to the OpenSSL cryptographic functions, and effectively accelerate all incoming and outgoing SSL secured connections.

Figure 1.3 illustrates the FPGA-based embedded system connected to a PC through an untrusted Ethernet medium. It uses SSL connections for data communication with the PC. The OpenSSL library is deployed in this project because it is open-source, free, and works on any Linux based operating system.



Figure 1.3: Secure communication between the embedded system and a PC.

Figure 1.4 is a diagram depicting the approach to design the embedded cryptosystem. To summarize the illustration, the approach taken comes down to preparing all the cryptographic cores and integrating them into an SoC. At the same time the Nios2-Linux operating system and the OpenSSL library is prepared. Finally the system is setup and a security application for it is designed. Finally, the speed improvement is measured and the results are reported.



Figure 1.4: Methodology used to achieve research objectives.

### **1.6** Research Contribution

- 1. The major contribution of this research is that it is the only work done to accelerate the OpenSSL cryptographic library in embedded systems. Therefore the speed improvements can only be measured against itself (running software routines) and other similar works such as [4].
- Laying the basic foundation for secure embedded systems through Nios2-Linux and the OpenSSL library. This provides a TCP/IP stack, networking security, USB host functionality, filesystem stack, as well as open up the possibilities of porting and using Linux based open source libraries.
- 3. Improving available cryptographic hardware cores, as well as pioneering the integration of hardware coprocessors and the OpenSSL software library. This provides accelerated networking security.
- 4. Detailing methods of how to optimize data transfer from CPU to hardware coprocessor. Methods used are endian switching, designing a robust Avalon interface, and eliminating unnecessary communication.

### **1.7** Thesis Organization

This thesis is organized into seven chapters. The first chapter is the introductory chapter. It presents background information, the problem statement, objectives, work scope, methodology, and research contribution of this thesis.

Chapter 2 is the literature review and background chapter. It presents the basic theory of the building blocks used in this project such as embedded Linux, cryptographic algorithms, and communication mediums such as Ethernet and USB.

Chapter 3 gives an overview of the embedded cryptosystem proposed in this thesis. This includes an introduction to the system as well as the applications targeted for it, the layers of the system based on the TCP/IP model, and the Nios2-Linux RTOS implementation in the Altera FPGA-based development system.

Chapter 4 is the hardware design chapter. It explains the design of the components in the hardware layer of the proposed embedded cryptosystem. This

includes all the cryptographic hardware cores, the bus interface, and all the hardware modules connected to it.

Chapter 5 is the OpenSSL and software subsystem chapter. It describes the integration the OpenSSL library in the proposed embedded cryptosystem. This includes the porting of the OpenSSL library to the Nios2-Linux platform, the modifications performed on the cryptographic library source code, certificate generation, and the integration of the library with the hardware cryptographic cores.

Chapter 6 is the results chapter. It describes the verification and the speed improvement of all the hardware cryptographic cores. Implementation of several real applications that use cryptography are also described here.

The final chapter is the conclusion chapter, which concludes the findings of this project and discusses the potential future work to further enhance the overall embedded system.