Character encoding (Generation III): Difference between revisions

m
Text replacement - "'''character encoding'''" to "'''character encoding'''"
m (Text replacement - "'''character encoding'''" to "'''character encoding'''")
(14 intermediate revisions by 4 users not shown)
Line 1: Line 1:
The [[Generation III]] games use a proprietary '''character encoding''' to store text data. The Generation III encoding is greatly different from the encodings used in previous generations, with characters corresponding to different bytes. Versions of the games in different languages may use different encodings, some more different than others.
The [[Generation III]] games use a proprietary '''[[character encoding]]''' to store text data. The Generation III encoding is greatly different from the encodings used in previous generations, with characters corresponding to different bytes. Versions of the games in different languages may use different encodings, some more different than others.


Some text strings are stored in fixed-length structures while others are stored in a block of text with separate strings simply terminated by 0xFF. In the large, variable-length blocks, usually another structure will have pointers to the appropriate string(s) within that block of text. In the fixed-length structures, strings are still terminated by 0xFF, but any remainder of the allotted space is padded out with 0x00.
Some text strings are stored in fixed-length structures while others are stored in a block of text with separate strings simply terminated by 0xFF. In the large, variable-length blocks, usually another structure will have pointers to the appropriate string(s) within that block of text. In the fixed-length structures, strings are still terminated by 0xFF, but any remainder of the allotted space is padded out with 0x00.


==Character sets==
==Character sets==
Every international game in Generation III (English, French, Italian, German, and Spanish games) contains two character sets: their native set and the Japanese set. The different international character sets are mostly identical save for a few [[#Regional differences|regional differences]].
Every Western game in Generation III (English, French, Italian, German, and Spanish games) contains two character sets: their native set and the Japanese set. The different Western character sets are mostly identical, with only a few [[#Regional differences|regional differences]].


For most text, the game's native character set is used, but if a Pokémon's origin {{DL|Pokémon data structure in Generation III|language}} is Japanese, its nickname and its [[Original Trainer]]'s name use the Japanese character set. The Japanese games only have the Japanese character set, but with the exception of <code>0xB8</code>, all inputtable text is identical between international and Japanese character sets.
For most text, the game's native character set is used, but if a Pokémon's origin {{DL|Pokémon data structure (Generation III)|language}} is Japanese, its nickname and its [[Original Trainer]]'s name use the Japanese character set. The Japanese games only have the Japanese character set, but almost all user-enterable characters from the Western versions are encoded to roughly equivalent characters in the Japanese encoding. The key differences are <code>0xB8</code> (a comma in the Western versions but a period in Japanese), <code>0xAE</code> (a hyphen-minus in the Western versions but a {{wp|chōonpu}} in Japanese, which is visually similar), and <code>0xAD</code> and <code>0xB0</code>-<code>0xB4</code> (which display as the Japanese equivalents of the Western characters).


Note that 0x00 in the following tables is a space (" "), not empty.
Note that 0x00 in the following tables is a space (" "), not empty.


===International===
===Western===
The table below shows the English character set in Pokémon Emerald. Some differences do exist between different [[#Differences between games and revisions|revisions and games]] and between different [[#Regional differences|languages]], detailed afterward.
The table below shows the English character set in Pokémon Emerald. Some differences do exist between different [[#Differences between games and revisions|revisions and games]] and between different [[#Regional differences|languages]], detailed afterward.


Characters on a white background are the only characters that can be input in names; <code>0xF1</code> - <code>0xF6</code> are only available for input in German games. Those on a light gray background may be used in other text strings (such as dialogue) depending on the language of the game. Characters on a dark gray background are unused values that mostly display as spaces in FireRed, LeafGreen, and Emerald; in Ruby and Sapphire, they are holdovers from the Japanese encoding. Characters with a dotted underline differ between regions.
Characters on a white background are the only characters that can be input in names; <code>0xF1</code> - <code>0xF6</code> are only available for input in German games. Those on a light gray background may be used in other text strings (such as dialogue) depending on the language of the game. Characters on a dark gray background are unused values that mostly display as spaces in Pokémon FireRed, LeafGreen, and Emerald; in Pokémon Ruby and Sapphire, they are holdovers from the Japanese encoding.


:{| style="text-align: center; border-collapse:collapse" cellpadding="2px" width="375px"
{| class="wikitable" style="text-align: center; border-collapse:collapse" cellpadding="2px" width="375px"
|- style="white-space: nowrap"
! || -0 || -1 || -2 || -3 || -4 || -5 || -6 || -7 || -8 || -9 || -A || -B || -C || -D || -E || -F
|-
|-
! || -0 || -1 || -2 || -3 || -4 || -5 || -6 || -7 || -8 || -9 || -A || -B || -C || -D || -E || -F
! 0-
|- style="background:#ddd"
| &nbsp;
! style="background:#fff" | 0-
|style="background: #ddd"| À
| style="background:#fff" | &nbsp;
|style="background: #ddd"| Á
| À
|style="background: #ddd"| Â
| Á
|style="background: #ddd"| Ç
| Â
|style="background: #ddd"| È
| Ç
|style="background: #ddd"| É
| È
|style="background: #ddd"| Ê
| É
|style="background: #ddd"| Ë
| Ê
|style="background: #ddd"| Ì
| Ë
|style="background: #bbb"|
| <font style="font-family: monospace">Ì</font>
|style="background: #ddd"| Î
| style="background:#bbb" |
|style="background: #ddd"| Ï
| <font style="font-family: monospace">Î</font>
|style="background: #ddd"| Ò
| <font style="font-family: monospace">Ï</font>
|style="background: #ddd"| Ó
| Ò
|style="background: #ddd"| Ô
| Ó
| Ô


|- style="background:#ddd"
|-
! style="background:#fff" | 1-
! 1-
| Œ
|style="background: #ddd"| Œ
| Ù
|style="background: #ddd"| Ù
| Ú
|style="background: #ddd"| Ú
| Û
|style="background: #ddd"| Û
| Ñ
|style="background: #ddd"| Ñ
| ß
|style="background: #ddd"| ß
| à
|style="background: #ddd"| à
| á
|style="background: #ddd"| á
| style="background:#bbb" |
|style="background: #bbb"|
| ç
|style="background: #ddd"| ç
| è
|style="background: #ddd"| è
| é
|style="background: #ddd"| é
| ê
|style="background: #ddd"| ê
| ë
|style="background: #ddd"| ë
| ì
|style="background: #ddd"| ì
| style="background:#bbb" |
|style="background: #bbb"|


|- style="background:#ddd"
|-
! style="background:#fff" | 2-
! 2-
| î
|style="background: #ddd"| î
| ï
|style="background: #ddd"| ï
| ò
|style="background: #ddd"| ò
| ó
|style="background: #ddd"| ó
| ô
|style="background: #ddd"| ô
| œ
|style="background: #ddd"| œ
| ù
|style="background: #ddd"| ù
| ú
|style="background: #ddd"| ú
| û
|style="background: #ddd"| û
| ñ
|style="background: #ddd"| ñ
| º
|style="background: #ddd"| º
| ª
|style="background: #ddd"| ª
| ᵉʳ
|style="background: #ddd"| ᵉʳ
| &
|style="background: #ddd"| &
| +
|style="background: #ddd"| +
| style="background:#bbb" |
|style="background: #bbb"|


|- style="background:#bbb"
|-
! style="background:#fff" | 3-
! 3-
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
| style="background:#ddd" | <div style="border-bottom:1px dotted" title="Differs between regions"><small>Lv</small></div>
|style="background: #ddd"| <div style="border-bottom:1px dotted" title="Differs between regions"><small>Lv</small></div>
| style="background:#ddd" | =
|style="background: #ddd"| =
| style="background:#ddd" | ;
|style="background: #ddd"| ;
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|
|
|style="background: #bbb"|


|- style="background:#bbb"
|-
! style="background:#fff" | 4-
! 4-
| || || || || || || || || || || || || || || ||
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|


|- style="background:#ddd"
|-
! style="background:#fff" | 5-
! 5-
| style="background:#ddd" | ▯
| style="background:#ddd" | ▯
| style="background:#ddd" | ¿
| style="background:#ddd" | ¿
| style="background:#ddd" | ¡
| style="background:#ddd" | ¡
| {{PK}}
| style="background:#ddd" | {{PK}}
| {{MN}}
| style="background:#ddd" | {{MN}}
| <sup>P</sup><sub>O</sub>
| style="background:#ddd" | <sup>P</sup><sub>O</sub>
| <sup>K</sup><sub>é</sub>
| style="background:#ddd" | <sup>K</sup><sub>é</sub>
| <div style="border-bottom:1px dotted" title="Differs between regions">[[Image:Character 0x57 iii.png]]</div>
| style="background:#ddd" | <div style="border-bottom:1px dotted" title="Differs between regions">[[File:Character 0x57 iii.png]]</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">[[Image:Character 0x58 iii.png]]</div>
| style="background:#ddd" | <div style="border-bottom:1px dotted" title="Differs between regions">[[File:Character 0x58 iii.png]]</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">[[Image:Character 0x59 iii.png]]</div>
| style="background:#ddd" | <div style="border-bottom:1px dotted" title="Differs between regions">[[File:Character 0x59 iii.png]]</div>
| style="background:#ddd" | <font style="font-family: monospace">Í</font>
| style="background:#ddd" | Í
| %
| style="background:#ddd" | %
| (
| style="background:#ddd" | (
| )
| style="background:#ddd" | )
| style="background:#bbb" | <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| style="background:#bbb" | <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| style="background:#bbb" | <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| style="background:#bbb" | <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>


|- style="background:#bbb"
|-
! style="background:#fff" | 6-
! 6-
| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
|style="background: #bbb"| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
|style="background: #bbb"| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
|style="background: #bbb"| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
|style="background: #bbb"| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
|style="background: #bbb"| <div style="border-bottom:1px dotted" title="Differs between regions">&nbsp;</div>
| || || || style="background:#ddd" | â
|style="background: #bbb"|
| || || || || || || style="background:#ddd" | í
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #ddd"| â
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #ddd"| í


|- style="background:#bbb"
|-
! style="background:#fff" | 7-
! 7-
| || || || || || || || ||
|style="background: #bbb"|
| style="background:#ddd" |
|style="background: #bbb"|
| style="background:#ddd" |
|style="background: #bbb"|
| style="background:#ddd" |
|style="background: #bbb"|
| style="background:#ddd" |
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
| style="background:#ddd" |
| style="background:#ddd" |
| style="background:#ddd" |
| style="background:#ddd" |
| style="background:#ddd" | *
| style="background:#ddd" | *
| style="background:#ddd" | *
| style="background:#ddd" | *
| style="background:#ddd" | *
| style="background:#ddd" | *


|- style="background:#bbb"
|-
! style="background:#fff" | 8-
! 8-
| style="background:#ddd" | *
| style="background:#ddd" | *
| style="background:#ddd" | *
| style="background:#ddd" | *
Line 147: Line 179:
| style="background:#ddd" | &lt;
| style="background:#ddd" | &lt;
| style="background:#ddd" | &gt;
| style="background:#ddd" | &gt;
| || || || || || || || ||
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|


|- style="background:#bbb"
|-
! style="background:#fff" | 9-
! 9-
| || || || || || || || || || || || || || || ||
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|


|-
|-
! A-
! A-
| style="background:#ddd" | ʳᵉ
| style="background:#ddd" | ʳᵉ
| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 ||! ||? || . || - || style="background:#bbb" |
| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 ||! ||? || . || - || style="background:#bbb" |
|-
|-
! B-
! B-
| ...
|
| <div style="border-bottom:1px dotted" title="Differs between regions">“</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">“</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">”</div>
| <div style="border-bottom:1px dotted" title="Differs between regions">”</div>
Line 171: Line 226:
|-
|-
! E-
! E-
| l || m || n || o || p || q || r || s || t || u || v || w || x || y || z || style="background:#ddd" |
| l || m || n || o || p || q || r || s || t || u || v || w || x || y || z || style="background:#ddd" |


|- style="background:#fff"
|- style="background:#fff"
Line 187: Line 242:
| colspan=6 style="background:#ddd" | ''Control characters''
| colspan=6 style="background:#ddd" | ''Control characters''
|}
|}
<code>0x7D</code> - <code>0x83</code>, marked by asterisks (*) above, print spaces 1-7 pixels wide (in ascending order of the hex value). While <code>0xB0</code> is "…" in the main fonts of Emerald, FireRed, and LeafGreen, it still displays as "‥" in certain other fonts that exist in the games - for example, the small font used on the party screen and the narrow font used in the Pokédex, bag, and stores.


====Differences between games and revisions====
====Differences between games and revisions====
In all revisions of Ruby and Sapphire, <code>0xB0</code> prints "‥". In certain languages, <code>0x34</code>, <code>0x57</code> - <code>0x59</code>, and <code>0x64</code> also have differences in different games, as detailed in the section below.
Codepoint <code>0xB0</code> represents an {{wp|ellipsis}}. In Pokémon Ruby, Sapphire, Colosseum, and XD, it renders as a two-dot ellipsis (<code></code>). In Pokémon FireRed, LeafGreen, and Emerald, it renders as a three-dot ellipsis (<code></code>) in the main font, but remains a two-dot ellipsis in the small font used on the party screen and the narrow font used in the Pokédex, bag, and shops. In subsequent generations, this character renders consistently as a three-dot ellipsis.


All other differences concern unused character values.
Codepoints <code>0x7D</code>-<code>0x83</code>, marked by asterisks (*) above, print spaces 1-7 pixels wide (in ascending order of the hex value). In FireRed and LeafGreen, <code>0x50</code> and <code>0x7D</code>-<code>0x83</code> are not used and print as regular spaces like other unused characters.


In FireRed and LeafGreen, <code>0x50</code> and <code>0x7D</code> - <code>0x83</code> are not used and print as spaces like other unused characters.
In certain languages, codepoints <code>0x34</code>, <code>0x57</code>-<code>0x59</code>, and <code>0x64</code> differ between games, [[#Regional differences|as detailed below]].


In Ruby and Sapphire, many values print Japanese characters, holdovers from the original Japanese encoding. These include:
In Pokémon Ruby and Sapphire, many values print Japanese characters—holdovers from the original Japanese encoding. These include:
* All unused characters (on a dark gray background above)
* All unused characters (on a dark gray background above)
* <code>0x50</code> and <code>0x7D</code> - <code>0x83</code>
* <code>0x50</code> and <code>0x7D</code> - <code>0x83</code>
Line 207: Line 260:
In the table below, the underscores (<code>_</code>) stand for spaces.
In the table below, the underscores (<code>_</code>) stand for spaces.


:{| class="wikitable" style="text-align: center; background-color: #fff;"
{| class="wikitable" style="text-align: center; background-color: #fff;"
!
!
! English
! English
Line 257: Line 310:


===Japanese===
===Japanese===
Only the characters on a white background below can be input in names. The characters on a dark gray background are printed as spaces in Pokémon FireRed, LeafGreen, and Emerald. Otherwise, the Japanese character set has no differences between games or revisions.
Only the characters on a white background below can be input in names. The characters on a dark gray background are printed as spaces in Pokémon FireRed, LeafGreen, and Emerald. Otherwise, the Japanese character set has no differences between games or revisions. Codepoint <code>0xB0</code> represents an {{wp|ellipsis}}, which renders as a two-dot ellipsis (<code lang="ja">‥</code>) in-game. In subsequent generations, this character renders consistently as a three-dot ellipsis.


:{| style="text-align: center; border-collapse:collapse" cellpadding="2px" width="375px"
{| class="wikitable" style="text-align: center; border-collapse:collapse" cellpadding="2px" width="375px"
|-
|- style="white-space: nowrap"
! || -0 || -1 || -2 || -3 || -4 || -5 || -6 || -7 || -8 || -9 || -A || -B || -C || -D || -E || -F
! || -0 || -1 || -2 || -3 || -4 || -5 || -6 || -7 || -8 || -9 || -A || -B || -C || -D || -E || -F
|-
|-
Line 297: Line 350:
|-
|-
! B-
! B-
| style="letter-spacing:-0.5em" | <small>・・</small> || 『 || 』 || 「 || 」 || ♂ || ♀ || style="background:#ddd" | 円 || style="background:#ddd" | . || style="background:#ddd" | × || / || A || B || C || D || E
| lang="ja" | || 『 || 』 || 「 || 」 || ♂ || ♀ || style="background:#ddd" | 円 || style="background:#ddd" | || style="background:#ddd" | × || || A || B || C || D || E
|-
|-
! C-
! C-
Line 306: Line 359:
|-
|-
! E-
! E-
| l || m || n || o || p || q || r || s || t || u || v || w || x || y || z || style="background:#ddd" |
| l || m || n || o || p || q || r || s || t || u || v || w || x || y || z || style="background:#ddd" |
|- style="background:#ddd"
|-
! style="background:#fff" | F-
! F-
| : || Ä || Ö || Ü || ä || ö || ü || style="background:#bbb"| ⬆ || style="background:#bbb" | ⬇ || style="background:#bbb" | ⬅ || colspan=6 | ''Control characters''
|style="background: #ddd"| :
|style="background: #ddd"| Ä
|style="background: #ddd"| Ö
|style="background: #ddd"| Ü
|style="background: #ddd"| ä
|style="background: #ddd"| ö
|style="background: #ddd"| ü
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #bbb"|
|style="background: #ddd" colspan=6 | ''Control characters''
|}
|}


Line 329: Line 392:
*0x09, the game will pause text display, and resume upon pressing a button.
*0x09, the game will pause text display, and resume upon pressing a button.
*0x0C, it will escape the byte that follows 0x0C if it is a control character and print a new character. If the second byte after 0xFC is not a control character byte, that byte prints normally.
*0x0C, it will escape the byte that follows 0x0C if it is a control character and print a new character. If the second byte after 0xFC is not a control character byte, that byte prints normally.
**When the third byte is 0xFA, "" is produced.
**When the third byte is 0xFA, "" is produced.
**When the third byte is 0xFB, "+" is produced (though in the Japanese games, within the [[Options]] screen, it produces "=").
**When the third byte is 0xFB, "+" is produced (though in the Japanese games, within the [[Options]] screen, it produces "=").
**The other control characters do not produce any characters. In the English games, nothing is printed, while in the Japanese games, miscellaneous data appears to be printed.
**The other control characters do not produce any characters. In the English games, nothing is printed, while in the Japanese games, miscellaneous data appears to be printed.
Line 430: Line 493:


===0xFD variables===
===0xFD variables===
When 0xFD is followed by the bytes below, the following variables are printed.
When 0xFD is followed by one of the following bytes, it prints a text variable or version-dependent text. Version-dependent text is only used in Pokémon Ruby, Sapphire, and Emerald; in Pokémon Emerald, all of these values are the same as Pokémon Sapphire, except the version name. The text printed by version-dependent text variables is constant within a single game, but varies between versions and languages.
 
;Text variables
*0x01: the player's name
*0x01: the player's name
*0x02, 0x03, or 0x04: whatever text has been assigned to one of three buffers using a variety of script commands
*0x02, 0x03, or 0x04: whatever text has been assigned to one of three buffers using a variety of script commands
*0x06: the rival's name
*0x06: the rival's name
*0x07: the game's name{{sup/3|RSE}}
 
*0x08: the name of the villainous team{{sup/3|RSE}}
;Version-dependent text
*0x09: the name of the non-villainous team{{sup/3|RSE}}
{| class="wikitable"
*0x0A: the name of the villainous team's leader{{sup/3|RSE}}
! rowspan=2 | Variable ID
*0x0B: the name of the non-villainous team's leader{{sup/3|RSE}}
! rowspan=2 | Description
*0x0C: the name of the villainous team's legendary Pokémon{{sup/3|RSE}}
! colspan=5 | English content
*0x0D: the name of the opposing legendary Pokémon{{sup/3|RSE}}
|-
! {{GameIcon|Ru}}
! {{GameIcon|Sa}}
! {{GameIcon|Em}}
|-
| 0x07 || the game's name || RUBY || SAPPHIRE || EMERALD
|-
| 0x08 || the name of the villainous team || MAGMA ||colspan=2| AQUA
|-
| 0x09 || the name of the non-villainous team || AQUA ||colspan=2| MAGMA
|-
| 0x0A || the name of the villainous team's leader || MAXIE ||colspan=2| ARCHIE
|-
| 0x0B || the name of the non-villainous team's leader || ARCHIE ||colspan=2| MAXIE
|-
| 0x0C || the name of the villainous team's Legendary Pokémon || GROUDON ||colspan=2| KYOGRE
|-
| 0x0D || the name of the opposing Legendary Pokémon || KYOGRE ||colspan=2| GROUDON
|}


==Trivia==
==Trivia==