Abstract
Although multi-lingual text processing is required for digital libraries, language education, and natural language processing, it has been difficult to realize for the confusion of a glyph with a character. This paper defines the relation between a glyph and a character, and it describes the processing of Mongolian scripts, an extended case of Perso-Arabic scripts handled by our system in ways which generalize to many complicated scripts. Mongolian scripts have particularly complicated orthographies and are almost impossible to encode. However, separating glyph defining information from encoding position solves some important problems arising from these and other scripts (including mixed languages) which may require multiple direction rendering. The study of many scripts led us to store, attached to the Wide Characters of POSIX, attributes which support not only the information for text manipulation (to be applied to a character) but glyph information as well such as variant and position necessary for display. Moreover, the information which is not available in a character code is provided from the database of our system to be embedded into a WC's attribute.
An arrow added to each script name shows the direction to which it is written. Paspa script was invented by Paspa, a Tibetan, in the age of the Yuan Dynasty, to be intended as the International Phonetic Alphabet at that time. This script is not discussed here, but we only add that it belongs to the Devanagari Script group. Manchu, the official literal language in the Ching Dynasty, and its descendent Sibo have their base on Mongolian script: Mongolian script family.
When introduced for Mongolian people in 13c., the classic Uigur script, which itself was borrowed from the Sogdians in 8c, turned 90 degrees to the left and was written vertically from the top. For the present Uigur script, see below.
Preview
Unable to display preview. Download preview PDF.
References
ISO/IEC 2022: 1986, Information processing 7-bit and 8-bit coded character sets Code extension techniques
ISO/IEC 6249 (1992), Information processing Control functions for coded character sets
TIS 620-2533 (1990), Thai Character Codes for Computers, Thai Industrial Standards Institute, Ministry of Industry, Thailand
IS 13194 (1991), Indian Script Code for Information Interchange ISCII, Bureau of Indian Standards, India
Kataoka, Tomoko, I., et al. Problems and Solutions for Mongolian Related Scripts Text Processing. Proceedings of the Symposium of Humanities and Computers '96, 1996, 81–96
Kataoka, Yutaka et al. The Essentials for Developing Multilingual Computer Environment. Proceedings of the 2nd Workshop on Multilinguality in software Industry: The AI Contribution (MULSAIC'97), August, 1997
Uezono, Kazutomo et al. I18N Drawing Functions for a Text Containing All Characters/Languages in the World. IPSJ SIG Notes, Human Interface, 97-hi-70, 1997, 55–62
Haramlambous, Yannis. “Typesetting Khmer”, EP-odd Journal, Vol. 7, No. 4, Dec. 1994, 197–215
Becker, Zeev and Daniel Berry, “Triroff, an Adapatation of the Device-Independent Troff for formatting tri-directional text,” EP-odd Journal, Vol. 2, No. 3, Oct. 1989, 119–142
Srouji, Johny and Daniel Berry, “Arabic Formatting with Ditroff/Ffortid,” EP-odd Journal, Vol. 5, No. 4, Dec. 1992, 163–208
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kataoka, T.I., Kataoka, Y., Uezono, K., Ohara, H. (1998). Internationalized text manipulation covering perso-arabic enhanced for mongolian scripts. In: Hersch, R.D., André, J., Brown, H. (eds) Electronic Publishing, Artistic Imaging, and Digital Typography. RIDT 1998. Lecture Notes in Computer Science, vol 1375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053279
Download citation
DOI: https://doi.org/10.1007/BFb0053279
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64298-5
Online ISBN: 978-3-540-69718-3
eBook Packages: Springer Book Archive