Universiti Teknologi Malaysia Institutional Repository

Malware behavior profiling from unstructured data

Yoong, Jien Chiam and Maarof, Mohd. Aizaini and Kassim, Mohamad Nizam and Zainal, Anazida (2020) Malware behavior profiling from unstructured data. In: 11th International Conference on Soft Computing & Pattern Recognition (SOCPAR 2019), 13 – 15 December 2019, Hyderabad, India.

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/978-3-030-49345-5_14

Abstract

Recently, the emergence of the new malware has caused a major threat especially in finance sector in which many of the online banking data was stolen by the adversaries. The malware threats information needs to be collected immediately after its outbreak. Early detection can save others from being the victims. Unfortunately, there is time delay to get the new malware information into the Malware Database such as ExploitDB. A pre-emptive way needs to be taken to gather the first-hand information of the new malware as a preventive measure. One of the methods is by extracting information from open source data such as online news by using Named Entity Recognition (NER). However, the existing NER system is incapable to extract the domain specific entities from the online news accurately. The aim of this paper is to extract the malware entities and its behaviour attributes using extended version of NER with HMM and CRF. A malware annotated corpus is produced in order to conduct the supervise learning for the machine learning approach of the name entity tagger. The results show CRF performs slightly better than HMM. Few experiments are performed in order to optimize the performance of CRF in terms of feature extraction. Finally, the malware behaviour information is visualized onto a dashboard by combining few statistical graphs using matplotlib. The purpose of visualizing the malware behaviour profile extracted from the online news is to help cyber security experts to better understand the malware behaviour.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Cyber threat intelligent, Natural language processing
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:92351
Deposited By: Widya Wahid
Deposited On:28 Sep 2021 07:38
Last Modified:28 Sep 2021 07:38

Repository Staff Only: item control page