Universiti Teknologi Malaysia Institutional Repository

Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets

Hussain, Saddam (2022) Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets. PhD thesis, Universiti Teknologi Malaysia.

[img] PDF
383kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Non-Technical Losses (NTLs) in electrical utilities, primarily related to electrical theft, significantly impact energy supplier companies and the nation’s overall economy. Power distribution companies worldwide rely on time-consuming, laborious, and inefficient random onsite inspections to catch and penalise these fraudster consumers. To address the NTL problem, artificial intelligence-based data mining methods have been extensively researched worldwide. However, most of such theft detection methods explored in the literature have yielded poor accuracy and detection rates. As such, this thesis presents a novel sequentially executed theft detection framework for both labelled and unlabelled dataset scenarios using a realistic approach with comparatively greater accuracy and detection rate. For labelled data class scenarios, a supervised Machine Learning (ML) approach is adopted where the intelligence of the Category and Boosting (CatBoost) algorithm is utilised to categorise the consumers distinctly as “suspicious” and “non-suspicious”. On the other hand, an unsupervised ML method is used to accomplish the same task for the unlabelled dataset employing the Robust Principal Component Analysis (ROBPCA) algorithm in conjunction with the Outlier Removal Clustering (ORC) algorithm. In the case of a labelled dataset scenario, the Synthetic Minority Oversample technique with the Tomek link (SMOTETomek) method is used to balance data class distribution initially. Afterwards, a Feature Extraction based on the Scalable Hypothesis (FRESH) algorithm is implemented to extract and select the most relevant features to facilitate the classifier in comprehending complex and overlapping data patterns. Finally, the intelligence of the CatBoost algorithm is used to build a ML model on the developed feature engineered labelled dataset to distinguish two classes of consumers. In the case of an unlabelled dataset, consumers with the most similar features are grouped into two categories using the ROBPCA algorithm initially. Afterwards, the division boundary between the two newly formed groups is reinforced with the help of the ORC algorithm to achieve a clear distinction between healthy and fraudster consumers. The effectiveness of the proposed theft detection methods is validated by comparing their performance with the few of the most widely used outlier detection methods based on seven of the most prominent performance evaluation metrics. The accuracy of the proposed unsupervised and supervised classifiers is calculated as 91% and 93%, respectively, while their detection rates are estimated as 91% and 92%, respectively.

Item Type:Thesis (PhD)
Uncontrolled Keywords:time-consuming, theft detection framework, Machine Learning (ML)
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions:Electrical Engineering
ID Code:102153
Deposited By: Narimah Nawil
Deposited On:07 Aug 2023 08:15
Last Modified:07 Aug 2023 08:15

Repository Staff Only: item control page