Universiti Teknologi Malaysia Institutional Repository

A conceptual framework for malay-english mixed-language question answering system

Lim, H. T. and Huspi, S. H. and Ibrahim, R. (2021) A conceptual framework for malay-english mixed-language question answering system. In: 2021 International Congress of Advanced Technology and Engineering, ICOTEN 2021, 4 July 2021 - 5 July 2021, Virtual, Online.

[img]
Preview
PDF
3MB

Official URL: http://dx.doi.org/10.1016/j.memsci.2021.119707

Abstract

Mixed language has turned into a current trend of language which refers to combining two or more languages either in spoken or written form. It has been widely used in social media forums to improve communication and for a greater range of expression. The current question answering (QA) system only supports monolingual queries, which restricts the capability of multilingual users to have a natural interaction with the system. In recent years, there has been a rise of interest in multilingual QA systems where translation models merged with machine learning algorithms in question classification are the commonly used solution. However, using words from other languages in a single sentence has led to the problem of the inability to identify code-switch from the monolingual sentence; this has also caused the problem of limited captured language context from machine translation processed mistranslated questions. The informal mixed-language representation that disobeys the natural linguistic rule in particular languages provides a challenge for automated QA systems, as the systems would need to translate and extract answers for the given questions. Additionally, lack of public resources such as Chunker, POS Tagger, and WordNet for mixed-language, especially for Malay-English, leads to low performance of the translation and classification model. Furthermore, the use of machine learning algorithms in question classification requires a large number of structured training data to ensure performance. This is impracticable in the Malay-English mixed-language domain since the availability of the mixed-language dataset is still an issue. To solve these problems, we aim to propose a framework consisting of the combination of enhanced translation models with deep learning; by using Convolutional Neural Networks (CNN) to address the Malay-English mixed-language question classification to generate the best answer. The first part will study the machine translation model, where word-level language identification and text normalization towards Malay-English mixed-language questions will be developed. The second part will focus on the deep learning algorithm, where we will explore CNN as the classification model to assist in the translated questions to provide the best answer. Thus, in this paper, a framework consisting of an enhanced translation model for Malay-English, and also an end-to-end mixed-language question answering system for the Malay-English QA system, is presented. This research will provide a significant contribution to a multilingual forum platform and also to intelligent QA systems (chatbots).

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:code-switching, deep learning, malay-english translation
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:95670
Deposited By: Narimah Nawil
Deposited On:31 May 2022 13:04
Last Modified:31 May 2022 13:04

Repository Staff Only: item control page