This document, the "Digital Technical Journal: Product Internationalization" (Summer 1993, Volume 5, Number 3), focuses on the multifaceted challenges and solutions involved in adapting software for global markets.
Core Theme: The central theme is the necessity and complexity of internationalization – designing software from the ground up to be culturally neutral and easily adaptable to different languages and conventions, as opposed to costly localization (re-engineering existing, typically English-centric, products). This shift aims to reduce development costs, shorten time-to-market, and improve product quality for a global user base.
Key Areas Covered:
International Cultural Differences in Software: This section highlights the vast cultural variations that impact software design. It delves into the intricacies of:
- Written Languages: Exploring ideographic (Chinese, Japanese, Korean), syllabic (Japanese Kana), and alphabetic (Latin) systems, emphasizing non-linear character placement (e.g., Korean Hangul blocks, Thai multi-level characters) and presentation variants (e.g., Arabic character forms).
- Text Input: Discussing specialized keyboard layouts and multi-stroke input methods for ideographic languages.
- National Conventions: Illustrating diverse formats for dates, times, numbers, and currencies (e.g., separators, symbol placement).
- User Interface Design: Examining how cultural contexts influence the interpretation of visual elements (geometry, images, symbols, colors) and auditory cues (sounds).
- Functional Differences: Showing how basic operations like "delete" or "case change" can vary significantly across languages.
Unicode: A Universal Character Code: This article introduces Unicode (aligned with ISO/IEC 10646) as a strategic solution for universal character encoding.
- Principles: It's a 16-bit, fixed-width encoding designed to cover all major written languages with a unique code for each abstract character, rather than mere visual glyphs.
- Features: It defines explicit semantics for characters, rules for displaying bidirectional text (like Hebrew and Arabic, including handling nested directions and special formatting codes), and mechanisms for combining characters (e.g., accents).
- Implementation: It discusses how systems like Microsoft Windows NT adopt Unicode through "dual-path" programming techniques to ensure compatibility with existing 8-bit encodings while embracing Unicode as the native text encoding.
The X/Open Internationalization Model: This section describes a standardized model for creating internationalized applications, particularly within UNIX environments.
- Components: It outlines mechanisms for locale announcement (specifying language, territory, code set), locale databases for cultural data, and internationalized library routines for sensitive formatting and string manipulation.
- Strengths & Limitations: While providing portable interfaces and supporting multibyte character sets, the model faces limitations with distributed, multithreaded, and truly multilingual applications due to its primarily global (process-wide) locale concept and certain assumptions about character types. Proposed changes aim to address these limitations with unique locale naming, registries, and more flexible "text object" manipulation.
The Ordering of Universal Character Strings: This paper explores the challenge of sorting words and names according to diverse cultural expectations.
- Complexity: It highlights that human ordering often relies on contextual knowledge (pronunciation, meaning) beyond simple character-by-character comparison, leading to culture-specific rules that differ even for the same script (e.g., Japanese Kanji ordering).
- Multilevel Method: The state-of-the-art "table-driven multilevel method" is explained, where strings are compared in multiple passes, each considering different character attributes (e.g., ignoring case/accents first, then applying them).
- Unicode Impact: The vastness and mixing of scripts in Unicode introduce new complexities for ordering algorithms, requiring formal definitions and potentially preprocessing steps to standardize strings for comparison.
International Distributed Systems—Architectural and Practical Issues: This article discusses the architectural considerations for building multilingual distributed systems.
- Safe Software Practices: It emphasizes avoiding "hardwired" literals in code, using opaque data types for culturally sensitive information (like money or dates), and parameterizing user preferences.
- Modular RTLs: A key technique is to build modular run-time libraries (RTLs) with universal application programming interfaces (APIs) that abstract language-specific complexities, allowing for incremental improvements and greater component reuse.
- System Services: It describes the need for multilingual system services like font servers, directory services, and diagnostic tools, and the importance of registering user preferences and locales in a global name service.
Supporting Chinese, Japanese, and Korean Languages in OpenVMS & Character Internationalization in Databases: These sections provide case studies of Digital's practical internationalization efforts.
- OpenVMS: It details the adaptation of the OpenVMS operating system for Asian languages, including handling multibyte character sets, specialized input methods (e.g., Kana-to-kanji conversion), high-resolution fonts, and modifications to core system components like DCL and SORT/MERGE utilities. It highlights the engineering challenges of re-engineering legacy systems and the benefits of a "co-engineering" approach to integrate Asian-specific code into the main product.
- Databases: It focuses on the unique challenges of character internationalization in database management systems (like DEC Rdb), which must manage character sets in user data, metadata, and source code. It discusses SQL-92's role in providing a standard foundation and how Digital's co-engineering strategy successfully reduced development time and costs for its internationalized Rdb product.
In summary, the document comprehensively explores the shift from ad-hoc localization to fundamental internationalization in software development, detailing the cultural nuances involved, the technical standards and architectural solutions (like Unicode and X/Open), and the practical engineering efforts required to build global-ready computing systems.