Universiti Teknologi Malaysia Institutional Repository

Bi-view semi-supervised active learning for cross-lingual sentiment classification

Hajmohammadi, Mohammad Sadegh and Ibrahim, Roliana and Selamat, Ali (2014) Bi-view semi-supervised active learning for cross-lingual sentiment classification. Information Processing and Management, 50 (5). pp. 718-732. ISSN 0306-4573

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1016/j.ipm.2014.03.005

Abstract

Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.

Item Type:Article
Uncontrolled Keywords:active learning, co-training, cross-lingual, density measure, sentiment classification
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:52022
Deposited By: Siti Nor Hashidah Zakaria
Deposited On:01 Feb 2016 03:53
Last Modified:30 Nov 2018 07:00

Repository Staff Only: item control page