Universiti Teknologi Malaysia Institutional Repository

extending our sense of cyberspace language plurality: the value of the language observatory (LO) project

Abd. Rozan, Mohd. Zaidi and Mikami, Yoshiki (2005) extending our sense of cyberspace language plurality: the value of the language observatory (LO) project. In: The 10th International Conference on Translation, 2 - 4 August 2005, Kota Kinabalu, Sabah.

[img] PDF
213kB

Official URL: http://umconference.um.edu.my/PPA16

Abstract

As the World Wide Web (WWW) grows exponentially, multilingual web pages are flooding the cyberspace at a tremendous rate. Most probably many of us would guess that the main medium of language on the Web is English. On the contrary, according to glreach.com [1], there are 801.4 million people online with at least 510 million non-English and the remainder are English speakers. As a big step towards comprehending web page dimensions regarding languages in cyberspace, we have officially launched a project called “Language Observatory (LO)" in February 2004. We have made several experimental runs using Ubicrawler, some of which were dedicated to the 57 Organization of the Islamic Conference country code Top Level Domains (ccTLD). It is interesting to note that we covered at least 42 million web pages compared to almost 17 million indexed by two well known search engines and this covers nearly triple the amount containing multiple dimensions such as languages, script and character set encoding. Furthermore, data mining activities by LO yield significant findings that further provide a snapshot of cyberspace. This will offer contents that are often created in particular domains hence this provide practical information: language preferences and source documentations in cyberspace. The potency of LO in producing indispensable information must be taken into account because these are factors that should not be absent within the value chain of translation activities.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:language observatory, web pages, language scenes, web intensity, translation, language, script, character set, crawler, language digital divide
Subjects:H Social Sciences > H Social Sciences (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System
ID Code:3403
Deposited By: Mrs Rozilawati Dollah @ Md Zain
Deposited On:24 May 2007 00:00
Last Modified:29 Aug 2017 08:28

Repository Staff Only: item control page