Common Desktop Environment: Internationalization Programmer's Guide
Contents of Chapter:
- Overview of Internationalization
- Current State of Internationalization
- Internationalization Standards
- Common Internationalization System
- Fonts, Font Sets, and Font Lists
- Font Specification
- Font Set Specification
- Font List Specification
- Base Font Name List Specification
- Text Drawing
- Input Methods
- Preedit Area
- Status Area
- Auxiliary Area
- MainWindow Area
- Focus Area
Internationalization is the designing of computer systems and applications for users around the world. Such users have different languages and may have different requirements for the functionality and user interface of the systems they operate. In spite of these differences, users want to be able to implement enterprise-wide applications that run at their sites worldwide. These applications must be able to interoperate across country boundaries, run on a variety of hardware configurations from multiple vendors, and be localized to meet local users' needs. This open, distributed computing environment is the reasoning behind common open software environments. The internationalization technology identified within this specification provides these benefits to a global market.
- Interclient Communications Conventions (ICCC)
Multiple environments may exist within a common open system for support of different national languages. Each of these national environments is called a locale, which considers the language, its characters, fonts, and the customs used to input and format data. The Common Desktop Environment is fully internationalized such that any application can run using any locale installed in the system.
A locale defines the behavior of a program at run time according to the language and cultural conventions of a user's geographical area. Throughout the system, locales affect the following:
An internationalized application contains no code that is dependent on the user's locale, the characters needed to represent that locale, or any formats (such as date and currency) that the user expects to see and interact with. The desktop accomplishes this by separating language- and culture-dependent information from the application and saving it outside the application.
- Encoding and processing of text data
- Identifying the language and encoding of resource files and their text values
- Rendering and layout of text strings
- Interchanging text that is used for interclient text communication
- Selecting the input method (which code set will be generated) and the processing of text data
- Encoding and decoding for interclient text communication
- Bitmap/icon files
- Actions and file types
- User Interface Definition (UID) files
Figure 1-1 shows the kinds of information that should be external to an application to simplify internationalization.
Figure 1-1 Information external to the application
By keeping the language- and culture-dependent information separate from the application source code, the application does not need to be rewritten or recompiled to be marketed in different countries. Instead, the only requirement is for the external information to be localized to accommodate local language and customs.
An internationalized application is also adaptable to the requirements of different native languages, local customs, and character-string encodings. The process of adapting the operation to a particular native language, local custom, or string encoding is called localization. A goal of internationalization is to permit localization without program source modifications or recompilation.
For a quick overview of internationalization, refer to X/Open CAE Specification System Interface Definition, Issue 4, X/Open Company Ltd., 1992, ISBN: 1-872630-46-4.
Current State of Internationalization
Previously, the industry supplied many variants of internationalization from proprietary functions to the new set of standard functions published by X/Open. Also, there have been different levels of enabling, such as simple ASCII support, Latin/European support, Asian multibyte support, and Arabic/Hebrew bidirectional support.
The interfaces defined within the X/Open specification are capable of supporting a large set of languages and territories, including:
Furthermore, the goal of the Common Desktop Environment is that localization of these technologies (translation of messages and documentation and other adaptation for local needs) be done in a consistent way, so that a supported user anywhere in the world will find the same common localized environment from vendor to vendor. End users and administrators can expect a consistent set of localization features that provide a complete application environment for support of global software.
- Latin Language
- Americas, Eastern/Western European
- East Asia
- Japanese, Korean, and Chinese
- Arabic and Hebrew
Through the work of many companies, the functionality of the internationalization application program interface has been standardized over time to include additional requirements and languages, particularly those of East Asia. This work has been centered primarily in the Portable Operating System Interface for Computer Environments (POSIX) and X/Open specifications. The original X/Open specification was published in the second edition of the X/Open Portability Guide (XPG2) and was based on the Native Language Support product released by Hewlett-Packard. The latest published X/Open internationalization standard is referred to as XPG4.
It is important that each layer within the desktop use the proper set of standards interfaces defined for internationalization to ensure end users get a consistent, localized interface. The definition of a locale and the common open set of locale-dependent functions are based on the following specifications:
Within this environment, software developers can expect to develop worldwide applications that are portable, can interoperate across distributed systems (even from different vendors), and can meet the diverse language and cultural requirements of multinational users supported by the desktop standard locales.
- X Window System, The Complete Reference to Xlib, Xprotocol, ICCCM, XLFD - X Version, Release 5, Digital Press, 1992, ISBN 1-55558-088-2.
- ANSI/IEEE Standard Portable Operating System Interface for Computer Environments, IEEE.
- OSF Motif 1.2 Programmer' Reference, Revision 1.2, Open Software Foundation, Prentice Hall, 1992, ISBN 0-13-643115-1.
- X/Open CAE Specification Commands and Utilities, Issue 4, X/Open Company Ltd., 1992, ISBN 1-872630-48-0.
Common Internationalization System
Figure 1-2 shows a view of how internationalization is pervasive across a specific single-host system. The goal is that the applications (clients) are built to be shipped worldwide for the set of locales supported in the underlying system. Using standard interfaces improves access to global markets and minimizes the amount of localization work needed by application developers. In addition, country representatives can be ensured of consistent localization within systems adhering to the principles of the desktop.
Figure 1-2 Common internationalized system
Most single-display clients operate in a single locale that is determined at run time from the setting of the environment variable, which is usually
$LANG or the
xnlLanguage resource. Locale environment variables, such as LC_ALL,
LANG, can be used to control the environment. See "Xt Locale Management" for more information.
LC_CTYPE category of the locale is used by the environment to identify the locale-specific features used at run time. The fonts and input method loaded by the toolkit are determined by the
Programs that are enabled for internationalization are expected to call the
XtSetLanguageProc() function (which calls
setlocale() by default) to set the locale desired by the user. None of the libraries call the
setlocale() function to set the locale, so it is the responsibility of the application to call
XtSetLanguageProc() with either a specific locale or some value loaded at run time. If applications are internationalized and do not use
XtSetLanguageProc(), obtain the locale name from one of the following prioritized sources to pass it to the
The empty string makes the
- A command-line option
- A resource
- The empty string ("")
setlocale() function use the $LC_* and
$LANG environment variables to determine locale settings. Specifically, setlocale (
LC_ALL, "") specifies that the locale should be checked and taken from environment variables in the order shown in Table 1-1 for the various locale categories.
Table 1-1 Locale Categories
The toolkit already defines a standard command-line option (
-lang) and a resource (xnlLanguage). Also, the resource value can be set in the server RESOURCE_MANAGER, which may affect all clients that connect to that server.
All X clients use fonts for drawing text. The basic object used in drawing text is XFontStruct, which identifies the font that contains the images to be drawn.
The desktop already supports fonts by way of the
XFontStruct data structure defined by Xlib; yet, the encoding of the characters within the font must be known to an internationalized application. To communicate this information, the program expects that all fonts at the server are identified by an X Logical Font Description (XLFD) name. The XLFD name enables users to describe both the base characteristics and the charset (encoding of font glyphs). The term charset is used to denote the encoding of glyphs within the font, while the term code set means the encoding of characters within the locale. The charset for a given font is determined by the CharSetRegistry and CharSetEncoding fields of the XLFD name. Text and symbols are drawn as defined by the codes in the fonts.
A font set (for example, an
XFontSet data structure defined by Xlib) is a collection of one or more fonts that enables all characters defined for a given locale to be drawn. Internationalized applications may be required to draw text encoded in the code sets of the locale where the value of an encoded character is not identical to the glyph index. Additionally, multiple fonts may be required to render all characters of the locale using one or more fonts whose encodings may be different than the code set of the locale. Since both code sets and charsets may vary from locale to locale, the concept of a font set is introduced through
While fonts are identified by their XLFD name, font sets are identified by a list of XLFD names. The list can consist of one or more XLFD names with the exception that only the base characteristics are significant; the encoding of the desired fonts is determined from the locale. Any charsets specified in the XLFD base name list are ignored and users need only concentrate on specifying the base characteristics, such as point size, style, and weight. A font set is said to be locale-sensitive and is used to draw text that is encoded in the code set of the locale. Internationalized applications should use font sets instead of font structs to render text data.
A font list is a libXm Toolkit object that is a collection of one or more font list entries. Font sets can be specified within a font list. Each font list entry designates either a font or a font set and is tagged with a name. If there is no tag in a font list entry, a default tag (
XmFONTLIST_DEFAULT_TAG) is used. The font list can be used with the
XmString functions found in the libXm Toolkit library. A font list enables drawing of compound strings that consist of one or more segments, each identified by a tag. This allows the drawing of strings with different base characteristics (for example, drawing a bold and italic string within one operation). Some non-
XmString-based widgets, such as
XmText of the libXm library, use only one font list entry in the font list. Motif font lists use the suffix : (colon) to identify a font set within a font list.
The user is generally asked to specify either a font list (which may contain either a font or font set) or a font set. In an internationalized environment, the user must be able to specify fonts that are independent of the code set because the specification can be used under various locales with different code sets than the character set (charset) of the font. Therefore, it is recommended that all font lists be specified with a font set.
The font specification can be either an X Logical Function Description (XLFD) name or an alias for the XLFD name. For example, the following are valid font specifications for a 14-point font:
Font Set Specification
The font set specification is a list of names (XLFD names or their aliases) and is sometimes called a base name list. All names are separated by commas, with any blank spaces before or after the comma being ignored. Pattern-matching (wildcard) characters can be specified to help shorten XLFD names.
Remember that a font set specification is determined by the locale that is running. For example, the ja_JP Japanese locale defines three fonts (character sets) necessary to display all of its characters; the following identifies the set of Gothic fonts needed.
The preceding two cases can be used with a Japanese locale as long as fonts exist that match the base name list.
Font List Specification
A font list specification can consist of one or more entries, each of which can be either a font specification or a font set specification.
Each entry can be tagged with a name that is used when drawing a compound string. The tags are application-defined and are usually names representing the expected style of font; for example,
bigbold. A null tag is used to denote the default entry and is associated with the
XmFONTLIST_DEFAULT_TAG identifier used in
A font tag is identified when it is prefixed with an = (equal sign); for example,
=bigbold (this matches the first font defined at the server). If an = is specified but there is no name following it, the specification is considered the default font list entry.
A font set tag is identified when it is prefixed with a : (colon); for example,
:bigbold (this matches the first server set of fonts that satisfy the locale). If a : is specified but no name is given, the specification is considered the default font list entry. Within a font list entry specification, a base name list is separated by ; (semicolons) rather than by , (commas).
Example Font List Specification
For the Latin 1 locales, enter:
The base font name list is a list of base font names associated with a font set as defined by the locale. The base font names are in a comma-separated list and are assumed to be characters from the portable character set; otherwise, the result is undefined. Blank space immediately on either side of a separating comma is ignored.
-*-r-*-14-*: ,\ # default font list entry
-*-b-*-18-*:bigbold # Large Bold fonts
Use of XLFD font names permits international applications to obtain the fonts needed for a variety of locales from a single locale-independent base font name. The single base font name specifies a family of fonts whose members are encoded in the various charsets needed by the locales of interest.
An XLFD base font name can explicitly name the font's charset needed for the locale. This enables the user to specify an exact font for use with a charset required by a locale, fully controlling the font selection.
If a base font name is not an XLFD name, an attempt is made to obtain an XLFD name from the font properties for the font.
The following algorithm is used to select the fonts that are used to display text with font sets.
For each charset required by the locale, the base font name list is searched for the first of the following cases that names a set of fonts that exist at the server.
For example, assume a locale requires the following charsets:
- The first XLFD-conforming base font name that specifies the required charset or a superset of the required charset in its CharSetRegistry and CharSetEncoding fields.
- The first set of one or more XLFD-conforming base font names that specify one or more charsets that can be remapped to support the required charset. The Xlib implementation can recognize various mappings from a required charset to one or more other charsets and use the fonts for those charsets. For example, JIS Roman is ASCII with the ~ (tilde) and \ (backslash) characters replaced by the yen and overbar characters; Xlib can load an ISO8859-1 font to support this character set if a JIS Roman font is not available.
- The first XLFD-conforming font name, or the first non-XLFD font name for which an XLFD font name can be obtained, combined with the required charset (replacing the CharSetRegistry and CharSetEncoding fields in the XLFD font name). In the first instance, the implementation can use a charset that is a superset of the required charset.
- The first font name that can be mapped in some locale-dependent manner to one or more fonts that support imaging text in the charset.
You can supply a base font name list that explicitly specifies the charsets, ensuring that specific fonts are used if they exist, as shown in the following example:
You can supply a base font name list that omits the charsets, which selects fonts for each required code set, as shown in the following example:
Alternatively, the user can supply a single base font name that selects from all available fonts that meet certain minimum XLFD property requirements, as shown in the following example:
The desktop provides various functions for rendering localized text, including simple text, compound strings, and some widgets. These include functions within the Xlib and Motif libraries.
The Common Desktop Environment provides the ability to enter localized input for an internationalized application that is using the Xm Toolkit. Specifically, the
XmText[Field] widgets are enabled to interface with input methods provided by each locale. In addition, the
dtterm client is enabled to use input methods.
By default, each internationalization client that uses the libXm Toolkit uses the input method associated with a locale specified by the user. The
XmNinputMethod resource is provided as a modifier on the locale name to allow a user to specify any alternative input method.
The user interface of the input method consists of several elements. The need for these areas is dependent on the input method being used. They are usually needed by input methods that require complex input processing and dialogs. See Figure 1-3 for an illustration of these areas.
Figure 1-3 Example of VendorShell widget with auxiliary (Japanese)
A preedit area is used to display the string being preedited. The input method supports four modes of preediting: OffTheSpot, OverTheSpot (default), Root, and None.
Note: A string that has been committed cannot be reconverted. The status of the string is moved from the preedit area to the location where the user is entering characters..
In OffTheSpot mode preediting using an input method, the location of preediting is fixed at just below the MainWindow area and on the right side of the status area as shown in Figure 1-4. A Japanese input method is used for the example.
Figure 1-4 Example of OffTheSpot preediting with the VendorShell widget (Japanese)
In the system environment, when preediting using an input method, the preedit string being preedited may be highlighted in some form depending on the input method.
To use OffTheSpot mode, set the
XmNpreeditType resource of the
VendorShell widget either with the
XtSetValues() function or with a resource file. The
XmNpreeditType resource can also be set as the resource of a
DialogShell widget, all of which are subclasses of the
VendorShell widget class.
In OverTheSpot mode, the location of the preedit area is set to where the user is trying to enter characters (for example, the insert cursor position of the
Text widget that has the current focus). The characters in a preedit area are displayed at the cursor position as an overlay window, and they can be highlighted depending on the input method.
Although a preedit area may consist of multiple lines in OverTheSpot mode. The preedit area is always within the MainWindow area and cannot cross its edges in any direction.
Keep in mind that although the preEdit string under construction may be displayed as though it were part of the
Text widget's text, it is not passed to the client and displayed in the underlying edit screen until preedit ends. See Figure 1-5 for an illustration.
To use OverTheSpot mode explicitly, set the
XmNpreeditType resource of the
VendorShell widget either with the
XtSetValues() function or with a resource file. The
XmNpreeditType resource can be set as the resource of a
DialogShell widget because these are subclasses of the
VendorShell widget class.
Figure 1-5 Example of OverTheSpot preediting with the VendorShell widget (Japanese)
In Root mode, the preedit and status areas are located separate from the client's window. The Root mode behavior is similar to OffTheSpot. See Figure 1-6 for an illustration.
Figure 1-6 Example of Root preediting with the VendorShell widget (Japanese)
A status area reports the input or keyboard status of the input method to the users. For OverTheSpot and OffTheSpot styles, the status area is located at the lower left corner of the VendorShell window.
- If Root style, the status area is placed outside the client window.
- If the preedit style is OffTheSpot mode, the preedit area is displayed to the right of the status area.
VendorShell widget provides geometry management so that a status area is rearranged at the bottom corner of the VendorShell window if the VendorShell window is resized.
An auxiliary area helps the user with preediting. Depending on the particular input method, an auxiliary area can be created. The Japanese input method in Figure 1-3 creates the following types of auxiliary areas:
A MainWindow area is the widget used as the working area of the input method. In the system environment, the sole child of the
- JIS NUMBER
- Switching conversion method
VendorShell widget is the
MainWindow widget. It can be any container widget, such as a
RowColumn widget. The user creates the container widget as the child of the
A focus area is any descendant widget under the
MainWindow widget subtree that currently has focus. The Motif application programmer using existing widgets does not need to worry about the focus area. The important information to remember is that only one widget can have input method processing at a time. The input method processing moves to the window (widget) that currently has the focus.
The Interclient Communications Conventions (ICCC) defines the mechanism used to pass text between clients. Because the system is capable of supporting multiple code sets, it may be possible that two applications that are communicating with each other are using different code sets. ICCC defines how these two clients agree on how the data is passed between them. If two clients have incompatible character sets (for example, Latin1 and Japanese (JIS)), some data may be lost when characters are transported.
However, if two clients have different code sets but compatible character sets, ICCC enables these clients to pass information with no data lost. If code sets of the two clients are not identical, CompoundText encoding is used as the interchange with the
COMPOUND_TEXT atom used. If data being communicated involves only portable characters (7-bit, ASCII, and others) or the ISO8859-1 code set, the data is communicated as is with no conversion by way of the
Titles and icon names need to be communicated to the Window Manager using the
COMPOUND_TEXT atom if nonportable characters are used; otherwise, the
XA_STRING atom can be used. Any other encoding is limited to the ability to convert to the locale of the Window Manager. The Window Manager runs in a single locale and supports only titles and icon names that are convertible to the code set of the locale under which it is running.
The libXm library and all desktop clients should follow these conventions.