Internationalization and Localization with PHP
While everyone who programs in PHP has to learn some English eventually to get a handle on its function names and language constructs, PHP can create applications in just about any human language. Some applications need to be used by speakers of many different languages. PHP’s internationalization and localization support makes it easier to make an application written for French speakers useful for German speakers.
Internationalization (often abbreviated I18N–there are 18 letters between the first “i” and the last “n”) is the process of taking an application designed for just one locale and restructuring it so that it can be used in many different locales. Localization (often abbreviated L10N–there are 10 letters between the first “l” and the “n”) is the process of adding support for a new locale to an internationalized application.
Localizing different kinds of content requires different techniques. This article covers an object-oriented method for localizing plain text messages and images. The PHP Cookbook contains additional recipes for dates, times, and currency. There are also recipes on using GNU gettext and other I18N and L10N topics.
Locales
A locale is a group of settings that describe text formatting and language customs in a particular area of the world. A locale name generally has three components. The first, an abbreviation that indicates a language, is mandatory. For example, “en” stands for English and “pt” for Portuguese. An optional country specifier comes next, after an underscore, to distinguish between different versions of the same language spoken in different countries. For example, “en_US” and “en_GB” specify U.S. and British English respectively, while “pt_BR” and “pt_PT” identify Brazilian and Portugese Portuguese. Finally, after a period, comes an optional character-set specifier. Taiwanese Chinese using the Big5 character set is encoded as “zh_TW.Big5“. Note that while most locale names follow these conventions, some don’t.
Message Catalog
To incorporate I18N support into your program, maintain a message catalog of words and phrases and retrieve the appropriate string from the message catalog before printing it. Here’s a simple message catalog with foods in American and British English and a function to retrieve words from the catalog:
<?php
$messages = array (
‘en_US’=> array(
‘My favorite foods are’ =>
‘My favorite foods are’,
‘french fries’ => ‘french fries’,
‘biscuit’ => ‘biscuit’,
‘candy’ => ‘candy’,
‘potato chips’ => ‘potato chips’,
‘cookie’ => ‘cookie’,
‘corn’ => ‘corn’,
‘eggplant’ => ‘eggplant’
),
‘en_GB’=> array(
‘My favorite foods are’ =>
‘My favourite foods are’,
‘french fries’ => ‘chips’,
‘biscuit’ => ’scone’,
‘candy’ => ’sweets’,
‘potato chips’ => ‘crisps’,
‘cookie’ => ‘biscuit’,
‘corn’ => ‘maize’,
‘eggplant’ => ‘aubergine’
)
);
function msg($s) {
global $LANG;
global $messages;
if (isset($messages[$LANG][$s])) {
return $messages[$LANG][$s];
} else {
error_log(”l10n error:LANG:” .
“$lang,message:’$s’”);
}
}
?>
