ViewVC Help
View File | Revision Log | Show Annotations | Download File | View Changeset | Root Listing
root/public/ibx/trunk/doc/readme.charactersets.xhtml
(Generate patch)

Comparing ibx/trunk/doc/readme.charactersets.xhtml (file contents):
Revision 40 by tony, Tue May 17 08:14:52 2016 UTC vs.
Revision 41 by tony, Sat Jul 16 12:25:48 2016 UTC

# Line 16 | Line 16
16          .P11 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
17          .P12 { font-size:12pt; line-height:120%; margin-bottom:0.1in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
18          .P13 { font-size:12pt; line-height:120%; margin-bottom:0.1in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
19 <        .P14 { font-size:130%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.1665in; font-family:Liberation Sans; writing-mode:page; }
19 >        .P14 { font-size:115%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.139in; font-family:Liberation Sans; writing-mode:page; }
20          .P15 { font-size:115%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.139in; font-family:Liberation Sans; writing-mode:page; }
21 <        .P16 { font-size:115%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.139in; font-family:Liberation Sans; writing-mode:page; }
21 >        .P16 { font-size:130%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.1665in; font-family:Liberation Sans; writing-mode:page; }
22          .P2 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
23          .P3 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
24          .P4 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
# Line 34 | Line 34
34          .T6 { font-weight:normal; }
35          <!-- ODF styles with no properties representable as CSS -->
36          .T1 .T2 .T3  { }
37 <        </style></head><body dir="ltr" style="max-width:8.2681in;margin-top:0.7874in; margin-bottom:0.7874in; margin-left:0.7874in; margin-right:0.7874in; writing-mode:lr-tb; "><h1 class="P14"><a id="a__IBX_Strings_and_Character_Set_Handling"><span/></a>IBX Strings and Character Set Handling</h1><p class="P1">A Firebird Database can specify a wide range of character sets for character and text mode blob columns. A client application can choose to read each column in its native character set or to have the Firebird Client library transliterate on its behalf. For example, adding:</p><p class="P1">lc_ctype=UTF8</p><p class="P1">to a TIBDatabase's params property will cause all character data (other than character set 'none' or 'octetstring') to be transliterated into UTF8, <span class="T3">regardless of the column's defined character set.</span></p><p class="P1">For most applications, this is probably the easiest way to handle multiple character sets. However, some more specialist applications may need to have access to native character set encoding.</p><p class="P2">From release 1.4.0, IBX provides the encoded character set width and character set name of each character mode column. <span class="T3">It also uses this to compute the correct byte size for the field buffer without having to oversize the field's display width.</span></p><p class="P6">FPC 3.0.0 onwards.<span class="T4"> FPC 3.0.0 introduces AnsiStrings with the codepage as a property of the string </span><span class="T5">(see </span><a href="http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage" class="Internet_20_link"><span class="T5">http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage</span></a><span class="T5">)</span><span class="T4">. From release 1.4.1 onwards:</span></p><ul><li><p class="P8" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBStringField.AsString and TIBMemoField.AsString now return a string type with the code page set to reflect the returned field encoding after Firebird driver transliteration, if any.<span class="odfLiEnd"/> </p></li><li><p class="P8" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>Assigning to TIBStringField.AsString and TIBMemoField.AsString will now result in transliteration to the code page specified for the Firebird driver if the assigned string has a different code page.<span class="odfLiEnd"/> </p></li></ul><p class="P9">There is also a new TIBDatabase property UseDefaultSystemCodePage. When this is set to true, any hardcoded lc_ctype is ignored and IBX will choose the lc_ctype that best matches the DefaultSystemCodePage. This should give platform independence and ensure that there is minimal transliteration overhead. Note that if the lc_ctype is set explicitly and this is not a good match from the Default System Code Page then a second transliteration could take place due to FPC automatically translating from the string returned by Firebird and the code page used for most strings.</p><p class="P9">Unless you have specialist string handling requirements, set UseDefaultSystemCodePage to true.</p><h2 class="P15"><a id="a__The_Theory"><span/></a>The Theory</h2><p class="P2">The TDataSet model demands that a dataset provides information about its column types in a TFieldDefs collection, comprising a TFieldDef for each column identifying its type, column name and other useful information. This is used, for example, by the Lazarus IDE's Fields Editor to create a list of dataset fields. Selecting a field name in the Object Inspector also uses the field defs.</p><p class="P2">IBX subclasses the TFieldDef to create its own extended field def (TIBFieldDef). This contains the Character Set Name and Character Set Size for string and memo type fields, where the Character Set Size is a number in the range 1..4 that gives the maximum number of bytes in which a character can be encoded. <span class="T2">For example, </span>UTF8 has a character set size of 4.</p><p class="P4">The TFieldDef is also used when dataset is opened  <span class="T2">and </span>the Field objects are created, or associated with fields added to a form by the IDE's Fields Editor. In IBX, instead of the standard field types for text fields, subclassed versions are used to provide the extended information. These are:</p><ul><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBStringField<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBWideStringField<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBMemoField<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBWideMemoField<span class="odfLiEnd"/> </p></li></ul><p class="P3">These all have the additional properties:</p><ul><li><p class="P13" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>CharacterSetName<span class="odfLiEnd"/> </p></li><li><p class="P13" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>CharacterSetSize<span class="odfLiEnd"/> </p></li></ul><p class="P3">which, respectively, provide the Character Set Name and Size for the text string. The application may then process them accordingly. Note that the Wide String versions are used when the character set size is 2 (e.g. Unicode).</p><p class="P3">These field types also adjust the field size so that it returns the character width in the column <span class="T2">multiplied</span> by the character set size, while the DisplayWidth property should always return the character width of the column. This should ensure that the appropriate buffer size is allocated for the column whilst avoiding oversizing the DisplayWidth (and thus making automatic TDBGrid columns too wide).</p><p class="P3">The Memo Field types also have a new published boolean property:</p><ul><li><p class="P10" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>DisplayTextAsClassName<span class="odfLiEnd"/> </p></li></ul><p class="P4">This defaults to false. </p><p class="P4">When true, the field's DisplayText property returns the text content of the field truncated to the DisplayWidth (number of characters to be displayed).</p><p class="P3">For IBX Memo Fields, the DisplayWidth defaults to 128 characters. If this is overridden and set to zero, then the DisplayText is not truncated.</p><p class="P3">On the other hand, if the DisplayTextAsClassName is false, then the inherited behaviour is used and the DisplayText returns the classname in square brackets. The DisplayWidth is the inherited default (currently 10).</p><p class="P7">FPC 3.0.0 onwards<span class="T4">. </span><span class="T6">TIBStringField </span><span class="T4">and </span><span class="T6">TIBMemoField </span><span class="T4">also have a CodePage” property. This gives the code page used for the string returned by the “AsString” property.</span></p><h2 class="P16"><a id="a__In_Practice"><span/></a>In Practice</h2><p class="P5">New applications using IBX should just work “out of the box” as regards character set handling and, in most cases, you will have little need to understand the above unless you are having to process the character set type information for each column.</p><p class="P5">Existing applications should continue to work without a problem. However:</p><ul><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>If you want to make use of the correct DisplayWidth, or<span class="odfLiEnd"/> </p></li><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>If you want the original behaviour of <span class="T1">DisplayTextAsClassName, </span>or<span class="odfLiEnd"/> </p></li><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>You want to  process the character set type information for each column<span class="odfLiEnd"/> </p></li><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>And you have used the IDE Field's Edit<span class="T3">or</span> to create field objects as part of a form<span class="odfLiEnd"/> </p></li></ul><p class="P5">then you will need use the Fields Editor to remove the existing field object for each affected text field and to add it back again. This will create a field object using the new IBX subclassed field type. You may then make use of the extended properties.</p></body></html>
37 >        </style></head><body dir="ltr" style="max-width:8.2681in;margin-top:0.7874in; margin-bottom:0.7874in; margin-left:0.7874in; margin-right:0.7874in; writing-mode:lr-tb; "><h1 class="P16"><a id="a__IBX_Strings_and_Character_Set_Handling"><span/></a>IBX Strings and Character Set Handling</h1><p class="P1">A Firebird Database can specify a wide range of character sets for character and text mode blob columns. A client application can choose to read each column in its native character set or to have the Firebird Client library transliterate on its behalf. For example, adding:</p><p class="P1">lc_ctype=UTF8</p><p class="P1">to a TIBDatabase's params property will cause all character data (other than character set 'none' or 'octetstring') to be transliterated into UTF8, <span class="T3">regardless of the column's defined character set.</span></p><p class="P1">For most applications, this is probably the easiest way to handle multiple character sets. However, some more specialist applications may need to have access to native character set encoding.</p><p class="P2">From release 1.4.0, IBX provides the encoded character set width and character set name of each character mode column. <span class="T3">It also uses this to compute the correct byte size for the field buffer without having to oversize the field's display width.</span></p><p class="P6">FPC 3.0.0 onwards.<span class="T4"> FPC 3.0.0 introduces AnsiStrings with the codepage as a property of the string </span><span class="T5">(see </span><a href="http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage" class="Internet_20_link"><span class="T5">http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage</span></a><span class="T5">)</span><span class="T4">. From release 1.4.1 onwards:</span></p><ul><li><p class="P8" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBStringField.AsString and TIBMemoField.AsString now return a string type with the code page set to reflect the returned field encoding after Firebird driver transliteration, if any.<span class="odfLiEnd"/> </p></li><li><p class="P8" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>Assigning to TIBStringField.AsString and TIBMemoField.AsString will now result in transliteration to the code page specified for the Firebird driver if the assigned string has a different code page.<span class="odfLiEnd"/> </p></li></ul><p class="P9">There is also a new TIBDatabase property UseDefaultSystemCodePage. When this is set to true, any hardcoded lc_ctype is ignored and IBX will choose the lc_ctype that best matches the DefaultSystemCodePage. This should give platform independence and ensure that there is minimal transliteration overhead. Note that if the lc_ctype is set explicitly and this is not a good match from the Default System Code Page then a second transliteration could take place due to FPC automatically translating from the string returned by Firebird and the code page used for most strings.</p><p class="P9">Unless you have specialist string handling requirements, set UseDefaultSystemCodePage to true.</p><h2 class="P14"><a id="a__The_Theory"><span/></a>The Theory</h2><p class="P2">The TDataSet model demands that a dataset provides information about its column types in a TFieldDefs collection, comprising a TFieldDef for each column identifying its type, column name and other useful information. This is used, for example, by the Lazarus IDE's Fields Editor to create a list of dataset fields. Selecting a field name in the Object Inspector also uses the field defs.</p><p class="P2">IBX subclasses the TFieldDef to create its own extended field def (TIBFieldDef). This contains the Character Set Name and Character Set Size for string and memo type fields, where the Character Set Size is a number in the range 1..4 that gives the maximum number of bytes in which a character can be encoded. <span class="T2">For example, </span>UTF8 has a character set size of 4.</p><p class="P4">The TFieldDef is also used when dataset is opened  <span class="T2">and </span>the Field objects are created, or associated with fields added to a form by the IDE's Fields Editor. In IBX, instead of the standard field types for text fields, subclassed versions are used to provide the extended information. These are:</p><ul><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBStringField<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBWideStringField<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBMemoField<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>TIBWideMemoField<span class="odfLiEnd"/> </p></li></ul><p class="P3">These all have the additional properties:</p><ul><li><p class="P13" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>CharacterSetName<span class="odfLiEnd"/> </p></li><li><p class="P13" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>CharacterSetSize<span class="odfLiEnd"/> </p></li></ul><p class="P3">which, respectively, provide the Character Set Name and Size for the text string. The application may then process them accordingly. Note that the Wide String versions are used when the character set size is 2 (e.g. Unicode).</p><p class="P3">These field types also adjust the field size so that it returns the character width in the column <span class="T2">multiplied</span> by the character set size, while the DisplayWidth property should always return the character width of the column. This should ensure that the appropriate buffer size is allocated for the column whilst avoiding oversizing the DisplayWidth (and thus making automatic TDBGrid columns too wide).</p><p class="P3">The Memo Field types also have a new published boolean property:</p><ul><li><p class="P10" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>DisplayTextAsClassName<span class="odfLiEnd"/> </p></li></ul><p class="P4">This defaults to false. </p><p class="P4">When true, the field's DisplayText property returns the text content of the field truncated to the DisplayWidth (number of characters to be displayed).</p><p class="P3">For IBX Memo Fields, the DisplayWidth defaults to 128 characters. If this is overridden and set to zero, then the DisplayText is not truncated.</p><p class="P3">On the other hand, if the DisplayTextAsClassName is false, then the inherited behaviour is used and the DisplayText returns the classname in square brackets. The DisplayWidth is the inherited default (currently 10).</p><p class="P7">FPC 3.0.0 onwards<span class="T4">. </span><span class="T6">TIBStringField </span><span class="T4">and </span><span class="T6">TIBMemoField </span><span class="T4">also have a CodePage” property. This gives the code page used for the string returned by the “AsString” property.</span></p><h2 class="P15"><a id="a__In_Practice"><span/></a>In Practice</h2><p class="P5">New applications using IBX should just work “out of the box” as regards character set handling and, in most cases, you will have little need to understand the above unless you are having to process the character set type information for each column.</p><p class="P5">Existing applications should continue to work without a problem. However:</p><ul><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>If you want to make use of the correct DisplayWidth, or<span class="odfLiEnd"/> </p></li><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>If you want the original behaviour of <span class="T1">DisplayTextAsClassName, </span>or<span class="odfLiEnd"/> </p></li><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>You want to  process the character set type information for each column<span class="odfLiEnd"/> </p></li><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;">•</span>And you have used the IDE Field's Edit<span class="T3">or</span> to create field objects as part of a form<span class="odfLiEnd"/> </p></li></ul><p class="P5">then you will need use the Fields Editor to remove the existing field object for each affected text field and to add it back again. This will create a field object using the new IBX subclassed field type. You may then make use of the extended properties.</p></body></html>

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines