Universiti Teknologi Malaysia Institutional Repository

A crowdsourcing-based framework for the development and validation of machine readable parallel corpus for sign languages

Farooq, U. and Mohd. Rahim, M. S. and Khan, N. S. and Rasheed, S. and Abid, A. (2021) A crowdsourcing-based framework for the development and validation of machine readable parallel corpus for sign languages. IEEE Access, 9 . ISSN 2169-3536

[img]
Preview
PDF
3MB

Official URL: http://dx.doi.org/10.1109/ACCESS.2021.3091433

Abstract

Sign languages are used by the deaf and mute community of the world. These are gesture based languages where the subjects use hands and facial expressions to perform different gestures. There are hundreds of different sign languages in the world. Furthermore, like natural languages, there exist different dialects for many sign languages. In order to facilitate the deaf community several different repositories of video gestures are available for many sign languages of the world. These video based repositories do not support the development of an automated language translation systems. This research aims to investigate the idea of engaging the deaf community for the development and validation of a parallel corpus for a sign language and its dialects. As a principal contribution, this research presents a framework for building a parallel corpus for sign languages by harnessing the powers of crowdsourcing with editorial manager, thus it engages a diversified set of stakeholders for building and validating a repository in a quality controlled manner. It further presents processes to develop a word-level parallel corpus for different dialects of a sign language; and a process to develop sentence-level translation corpus comprising of source and translated sentences. The proposed framework has been successfully implemented and involved different stakeholders to build corpus. As a result, a word-level parallel corpus comprising of the gestures of almost 700 words of Pakistan Sign Language (PSL) has been developed. While, a sentence-level translation corpus comprising of more than 8000 sentences for different tenses has also been developed for PSL. This sentence-level corpus can be used in developing and evaluating machine translation models for natural to sign language translation and vice-versa. While the machine-readable word level parallel corpus will help in generating avatar based videos for the translated sentences in different dialects of a sign language.

Item Type:Article
Uncontrolled Keywords:crowdsourcing, HamNoSys, parallel corpus
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:94868
Deposited By: Narimah Nawil
Deposited On:29 Apr 2022 21:54
Last Modified:29 Apr 2022 21:54

Repository Staff Only: item control page