Universiti Teknologi Malaysia Institutional Repository

An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence

Ahmad, Nor Azhar (2010) An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems.



The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, the transfer of online data is not the best solution anymore. For research center that has a low speed of Internet connection, the transfer is almost impossible to implement. This study proposed an enhancement of LZ77 algorithm, which is the common non-greedy, data dictionary type, using sliding windows concept for alphabethical data compression. By introducing sectioning sliding windows with hash table approach, the proposed compression algorithm can solve the storage problem of large DNA sequences. This implementation can speed up time and improve data compression rates. Two formats of DNA data (binary and FASTA) are tested and analysed. Simulation proved that, data compression rate shows promising results, with the addition of proportional size of the DNA, where it can compress at the rate of 56% per bit. Comparing to the LZ77 based DNA compression algorithm, BioCompress which has 44% of compress rate; the proposed algorithm has outperformed by 12%. Implications from this study will allow cost reduction in handling large scale DNA data.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia, 2010; Supervisor : Assoc. Prof. Abd. Manan Ahmad
Uncontrolled Keywords:data compression (Computer science), DNA data
Subjects:Q Science > Q Science (General)
Q Science > QA Mathematics > QA76 Computer software
Divisions:Computer Science and Information System (Formerly known)
ID Code:21289
Deposited On:25 Jan 2012 00:37
Last Modified:18 Sep 2017 05:09

Repository Staff Only: item control page