ViewVC Help
View File | Revision Log | Show Annotations | Download File | View Changeset | Root Listing
root/public/ibx/trunk/doc/readme.charactersets.xhtml
Revision: 43
Committed: Thu Sep 22 17:10:15 2016 UTC (8 years, 2 months ago) by tony
Content type: application/xhtml+xml
File size: 13654 byte(s)
Log Message:
Committing updates for Release R1-4-3

File Contents

# Content
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><!--This file was converted to xhtml by LibreOffice - see http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/xslt for the code.--><head profile="http://dublincore.org/documents/dcmi-terms/"><meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/><title xml:lang="en-US">- no title specified</title><meta name="DCTERMS.title" content="" xml:lang="en-US"/><meta name="DCTERMS.language" content="en-US" scheme="DCTERMS.RFC4646"/><meta name="DCTERMS.source" content="http://xml.openoffice.org/odf2xhtml"/><meta name="DCTERMS.creator" content="Tony Whyman"/><meta name="DCTERMS.issued" content="2016-02-12T12:10:01.845226813" scheme="DCTERMS.W3CDTF"/><meta name="DCTERMS.contributor" content="Tony Whyman"/><meta name="DCTERMS.modified" content="2016-09-22T17:51:28.067084251" scheme="DCTERMS.W3CDTF"/><meta name="DCTERMS.provenance" content="" xml:lang="en-US"/><meta name="DCTERMS.subject" content="," xml:lang="en-US"/><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" hreflang="en"/><link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" hreflang="en"/><link rel="schema.DCTYPE" href="http://purl.org/dc/dcmitype/" hreflang="en"/><link rel="schema.DCAM" href="http://purl.org/dc/dcam/" hreflang="en"/><style type="text/css">
3 @page { }
4 table { border-collapse:collapse; border-spacing:0; empty-cells:show }
5 td, th { vertical-align:top; font-size:12pt;}
6 h1, h2, h3, h4, h5, h6 { clear:both }
7 ol, ul { margin:0; padding:0;}
8 li { list-style: none; margin:0; padding:0;}
9 <!-- "li span.odfLiEnd" - IE 7 issue-->
10 li span. { clear: both; line-height:0; width:0; height:0; margin:0; padding:0; }
11 span.footnodeNumber { padding-right:1em; }
12 span.annotation_style_by_filter { font-size:95%; font-family:Arial; background-color:#fff000; margin:0; border:0; padding:0; }
13 * { margin:0;}
14 .P1 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
15 .P10 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; font-weight:normal; }
16 .P11 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
17 .P12 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
18 .P13 { font-size:12pt; line-height:120%; margin-bottom:0.1in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
19 .P14 { font-size:12pt; line-height:120%; margin-bottom:0.1in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
20 .P15 { font-size:115%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.139in; font-family:Liberation Sans; writing-mode:page; }
21 .P16 { font-size:115%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.139in; font-family:Liberation Sans; writing-mode:page; }
22 .P17 { font-size:130%; font-weight:bold; margin-bottom:0.0835in; margin-top:0.1665in; font-family:Liberation Sans; writing-mode:page; }
23 .P2 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
24 .P3 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
25 .P4 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
26 .P5 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; }
27 .P6 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; font-weight:bold; }
28 .P7 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; font-weight:bold; }
29 .P8 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; font-weight:normal; }
30 .P9 { font-size:12pt; line-height:120%; margin-bottom:0.0972in; margin-top:0in; font-family:Liberation Serif; writing-mode:page; font-weight:normal; }
31 .Bullet_20_Symbols { font-family:OpenSymbol; }
32 .Internet_20_link { color:#000080; text-decoration:underline; }
33 .T4 { font-weight:normal; }
34 .T5 { font-weight:normal; }
35 .T6 { font-weight:normal; }
36 <!-- ODF styles with no properties representable as CSS -->
37 .T1 .T2 .T3 .T7 { }
38 </style></head><body dir="ltr" style="max-width:8.2681in;margin-top:0.7874in; margin-bottom:0.7874in; margin-left:0.7874in; margin-right:0.7874in; writing-mode:lr-tb; "><h1 class="P17"><a id="a__IBX_Strings_and_Character_Set_Handling"><span/></a>IBX Strings and Character Set Handling</h1><p class="P1">A Firebird Database can specify a wide range of character sets for character and text mode blob columns. A client application can choose to read each column in its native character set or to have the Firebird Client library transliterate on its behalf. For example, adding:</p><p class="P1">lc_ctype=UTF8</p><p class="P1">to a TIBDatabase's params property will cause all character data (other than character set 'none' or 'octetstring') to be transliterated into UTF8, <span class="T3">regardless of the column's defined character set.</span></p><p class="P1">For most applications, this is probably the easiest way to handle multiple character sets. However, some more specialist applications may need to have access to native character set encoding.</p><p class="P2">From release 1.4.0, IBX provides the encoded character set width and character set name of each character mode column. <span class="T3">It also uses this to compute the correct byte size for the field buffer without having to oversize the field's display width.</span></p><p class="P6">FPC 3.0.0 onwards.<span class="T4"> FPC 3.0.0 introduces AnsiStrings with the codepage as a property of the string </span><span class="T5">(see </span><a href="http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage" class="Internet_20_link"><span class="T5">http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage</span></a><span class="T5">)</span><span class="T4">. From release 1.4.1 onwards:</span></p><ul><li><p class="P9" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>TIBStringField.AsString and TIBMemoField.AsString <span class="T7">will always transliterate characters to UTF8 regardless of the setting of the lc_ctype under the field's character set is “NONE” or “OCTETSTRING”. This is because the LCL can only correctly handle UTF8.</span><span class="odfLiEnd"/> </p></li><li><p class="P9" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>Assigning to TIBStringField.AsString and TIBMemoField.AsString will now result in transliteration to the code page specified for the Firebird driver if the assigned string has a different code page.<span class="odfLiEnd"/> </p></li><li><p class="P10" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>TIBSQL Fields and Params “AsString” property will always return a string in the character set used by Firebird for the field. When assigned to, the string will be transliterated to the Firebird Field's character set if necessary.<span class="odfLiEnd"/> </p></li></ul><p class="P8">There is also a new TIBDatabase property UseDefaultSystemCodePage. When this is set to true, any hardcoded lc_ctype is ignored and IBX will choose the lc_ctype that best matches the DefaultSystemCodePage. </p><p class="P8">Unless you have specialist string handling requirements, <span class="T7">an lc-ctype of UTF8 is probably the best choice for Lazarus</span>.</p><h2 class="P15"><a id="a__The_Theory"><span/></a>The Theory</h2><p class="P2">The TDataSet model demands that a dataset provides information about its column types in a TFieldDefs collection, comprising a TFieldDef for each column identifying its type, column name and other useful information. This is used, for example, by the Lazarus IDE's Fields Editor to create a list of dataset fields. Selecting a field name in the Object Inspector also uses the field defs.</p><p class="P2">IBX subclasses the TFieldDef to create its own extended field def (TIBFieldDef). This contains the Character Set Name and Character Set Size for string and memo type fields, where the Character Set Size is a number in the range 1..4 that gives the maximum number of bytes in which a character can be encoded. <span class="T2">For example, </span>UTF8 has a character set size of 4.</p><p class="P4">The TFieldDef is also used when dataset is opened  <span class="T2">and </span>the Field objects are created, or associated with fields added to a form by the IDE's Fields Editor. In IBX, instead of the standard field types for text fields, subclassed versions are used to provide the extended information. These are:</p><ul><li><p class="P13" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>TIBStringField<span class="odfLiEnd"/> </p></li><li><p class="P13" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>TIBMemoField<span class="odfLiEnd"/> </p></li></ul><p class="P3">These all have the additional properties:</p><ul><li><p class="P14" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>CharacterSetName<span class="odfLiEnd"/> </p></li><li><p class="P14" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>CharacterSetSize<span class="odfLiEnd"/> </p></li></ul><p class="P3">which, respectively, provide the Character Set Name and Size for the text string. The application may then process them accordingly. </p><p class="P3">These field types also adjust the field size so that it returns the character width in the column <span class="T2">multiplied</span> by the character set size, while the DisplayWidth property should always return the character width of the column. This should ensure that the appropriate buffer size is allocated for the column whilst avoiding oversizing the DisplayWidth (and thus making automatic TDBGrid columns too wide).</p><p class="P3">The Memo Field types also have a new published boolean property:</p><ul><li><p class="P11" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>DisplayTextAsClassName<span class="odfLiEnd"/> </p></li></ul><p class="P4">This defaults to false. </p><p class="P4">When true, the field's DisplayText property returns the text content of the field truncated to the DisplayWidth (number of characters to be displayed).</p><p class="P3">For IBX Memo Fields, the DisplayWidth defaults to 128 characters. If this is overridden and set to zero, then the DisplayText is not truncated.</p><p class="P3">On the other hand, if the DisplayTextAsClassName is false, then the inherited behaviour is used and the DisplayText returns the classname in square brackets. The DisplayWidth is the inherited default (currently 10).</p><p class="P7">FPC 3.0.0 onwards<span class="T4">. </span><span class="T6">TIBStringField </span><span class="T4">and </span><span class="T6">TIBMemoField </span><span class="T4">also have a CodePage” property. This gives the code page used for the string returned by the “AsString” property.</span></p><h2 class="P16"><a id="a__In_Practice"><span/></a>In Practice</h2><p class="P5">New applications using IBX should just work “out of the box” as regards character set handling and, in most cases, you will have little need to understand the above unless you are having to process the character set type information for each column.</p><p class="P5">Existing applications should continue to work without a problem. However:</p><ul><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>If you want to make use of the correct DisplayWidth, or<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>If you want the original behaviour of <span class="T1">DisplayTextAsClassName, </span>or<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>You want to  process the character set type information for each column<span class="odfLiEnd"/> </p></li><li><p class="P12" style="margin-left:0cm;"><span class="Bullet_20_Symbols" style="display:block;float:left;min-width:0.635cm;"></span>And you have used the IDE Field's Edit<span class="T3">or</span> to create field objects as part of a form<span class="odfLiEnd"/> </p></li></ul><p class="P5">then you will need use the Fields Editor to remove the existing field object for each affected text field and to add it back again. This will create a field object using the new IBX subclassed field type. You may then make use of the extended properties.</p></body></html>