Universiti Teknologi Malaysia Institutional Repository

Generic code clone detection model for java applications

Mubarak-Ali, Al-Fahim and Sulaiman, Shahida (2020) Generic code clone detection model for java applications. In: 6th International Conference on Software Engineering and Computer Systems, ICSECS 2019, 25 - 27 September 2019, Kuantan, Pahang.

[img]
Preview
PDF
740kB

Official URL: http://dx.doi.org/10.1088/1757-899X/769/1/012023

Abstract

Code clone is a common term used for codes that are repeated multiple times in a program. There are Type 1, Type 2, Type 3 and Type 4 code clones. Various code clone detection approaches and models have been used to detect a code clone. However, a major challenge faced in detecting code clone using these models is the lack of generality in detecting all clone types. To address this problem, Generic Code Clone Detection (GCCD) model that consists of five processes which are Preprocessing, Transformation, Parameterization, Categorization and Match Detection process is proposed. Initially, a pre-processing process produces source units through the application of five combinatorial rules. This is followed by the transformation process to produce transformed source units based on the letter to number substitution concept. Next, a parameterization process produces parameters used in categorization and match detection process. Next, a categorization process groups the source units into pools. Finally, a match detection process uses a hybrid exact matching with Euclidean distance to detect the clones. Based on these processes, a prototype of the GCCD was developed using Netbeans 8.0. The model was compared with the Generic Pipeline Model (GPM). The comparisons showed that the GCCD was able to detect clone pairs of Type-1 until Type-4 while the GPM was able to detect clone pair for Type-1 only. Furthermore, the GCCD prototype was empirically tested with Bellons benchmark data and it was able to detect clones in Java applications with up to 203,000 line of codes. As a conclusion, the GCCD model is able to overcome the lack of generality in detecting all code clone types by detecting Type 1, Type 2, Type 3 and Type 4 clones.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:code clone, Java applications
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:92334
Deposited By: Widya Wahid
Deposited On:28 Sep 2021 07:36
Last Modified:28 Sep 2021 07:36

Repository Staff Only: item control page