Universiti Teknologi Malaysia Institutional Repository

A data structure for representing multi-version texts online

Schmidt, Desmond and Colomb, Robert (2009) A data structure for representing multi-version texts online. International Journal of Human Computer Studies, 67 (6). pp. 497-514. ISSN 1071-5819

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1016/j.ijhcs.2009.02.001


The digitisation of cultural heritage and linguistics texts has long been troubled by the problem of how to represent overlapping structures arising from different markup perspectives ('overlapping hierarchies') or from different versions of the same work ('textual variation'). These two problems can be reduced to one by observing that every case of overlapping hierarchies is also a case of textual variation. Overlapping textual structures can be accurately modelled either as a minimally redundant directed graph, or, more practically, as an ordered list of pairs, each containing a set of versions and a fragment of text or data. This 'pairs-list' representation is provably equivalent to the graph representation. It can record texts consisting of thousands of versions or perspectives without becoming overloaded with data, and the most common operations on variant text, e.g. comparison between two versions, can be performed in linear time. This representation also separates variation or other overlapping structures from the document content, leading to a simplification of markup suitable for wiki-like web applications.

Item Type:Article
Uncontrolled Keywords:overlapping hierarchies, textual variation, data structure
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:11771
Deposited By: Nor Asmida Abdullah
Deposited On:17 Jan 2011 12:34
Last Modified:17 Jan 2011 12:34

Repository Staff Only: item control page