Universiti Teknologi Malaysia Institutional Repository

Classification and regression tree in classifying and predicting students' academic performance

Ho, Su Juih (2013) Classification and regression tree in classifying and predicting students' academic performance. Masters thesis, Universiti Teknologi Malaysia, Faculty of Science.


Official URL: http://dms.library.utm.my:8080/vital/access/manage...


In this study, Classification and Regression Tree (CART) is used to classify and predict student who is likely to pass or fail in the final exam of Engineering Statistic course. However, two problems typical surfaced when applying CART algorithm on highly dimensional data: misclassification error and overfitting problem. Thus this research aims to reduce its misclassification error and overfitting problem for better accuracy in prediction and classification. Based on this study, different data proportion such as re-substitution method, hold-out method and 10-fold cross validation method are used for building and evaluating the decision tree. The results are compared in terms of prediction accuracy, sensitivity and specificity as well as tree structures. Based on the results obtained, 10-fold cross validation achieves the highest prediction accuracy (least misclassification error) of 85.11%. Hence, it is selected for further overfitting analysis by conducting error rate plot and cost complexity pruning methods in order to reduce the misclassification error. From the results obtained, the final pruned tree has shown to improve the prediction accuracy (87.23%). We have identified three rules generated from the final tree to identify the relationship of the attributes. Consequently, this study indicates that application of CART algorithm by 10-fold cross validation method can produce a better accuracy in classifying and predicting students? academic performance. In addition, lecturers can use such method to identify students who perform poorly in this course so that actions can be taken to avoid more failures in this course.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Matematik)) - Universiti Teknologi Malaysia, 2013; Supervisor : Dr. Norhaiza Ahmad
Uncontrolled Keywords:regression analysis, discriminant analysis
Subjects:Q Science > QA Mathematics
ID Code:33100
Deposited By: Kamariah Mohamed Jong
Deposited On:23 Feb 2014 10:13
Last Modified:11 Sep 2017 14:25

Repository Staff Only: item control page