A Firebird Database can specify a wide range of character sets for character and text mode blob columns. A client application can choose to read each column in its native character set or to have the Firebird Client library transliterate on its behalf. For example, adding:
lc_ctype=UTF8
to a TIBDatabase's params property will cause all character data (other than character set 'none' or 'octetstring') to be transliterated into UTF8, regardless of the column's defined character set.
For most applications, this is probably the easiest way to handle multiple character sets. However, some more specialist applications may need to have access to native character set encoding.
From release 1.4.0, IBX provides the encoded character set width and character set name of each character mode column. It also uses this to compute the correct byte size for the field buffer without having to oversize the field's display width.
FPC 3.0.0 onwards. FPC 3.0.0 introduces AnsiStrings with the codepage as a property of the string (see http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage). From release 1.4.1 onwards:
•TIBStringField.AsString and TIBMemoField.AsString now return a string type with the code page set to reflect the returned field encoding after Firebird driver transliteration, if any.
•Assigning to TIBStringField.AsString and TIBMemoField.AsString will now result in transliteration to the code page specified for the Firebird driver if the assigned string has a different code page.
There is also a new TIBDatabase property UseDefaultSystemCodePage. When this is set to true, any hardcoded lc_ctype is ignored and IBX will choose the lc_ctype that best matches the DefaultSystemCodePage. This should give platform independence and ensure that there is minimal transliteration overhead. Note that if the lc_ctype is set explicitly and this is not a good match from the Default System Code Page then a second transliteration could take place due to FPC automatically translating from the string returned by Firebird and the code page used for most strings.
Unless you have specialist string handling requirements, set UseDefaultSystemCodePage to true.
The TDataSet model demands that a dataset provides information about its column types in a TFieldDefs collection, comprising a TFieldDef for each column identifying its type, column name and other useful information. This is used, for example, by the Lazarus IDE's Fields Editor to create a list of dataset fields. Selecting a field name in the Object Inspector also uses the field defs.
IBX subclasses the TFieldDef to create its own extended field def (TIBFieldDef). This contains the Character Set Name and Character Set Size for string and memo type fields, where the Character Set Size is a number in the range 1..4 that gives the maximum number of bytes in which a character can be encoded. For example, UTF8 has a character set size of 4.
The TFieldDef is also used when dataset is opened and the Field objects are created, or associated with fields added to a form by the IDE's Fields Editor. In IBX, instead of the standard field types for text fields, subclassed versions are used to provide the extended information. These are:
•TIBStringField
•TIBWideStringField
•TIBMemoField
•TIBWideMemoField
These all have the additional properties:
•CharacterSetName
•CharacterSetSize
which, respectively, provide the Character Set Name and Size for the text string. The application may then process them accordingly. Note that the Wide String versions are used when the character set size is 2 (e.g. Unicode).
These field types also adjust the field size so that it returns the character width in the column multiplied by the character set size, while the DisplayWidth property should always return the character width of the column. This should ensure that the appropriate buffer size is allocated for the column whilst avoiding oversizing the DisplayWidth (and thus making automatic TDBGrid columns too wide).
The Memo Field types also have a new published boolean property:
•DisplayTextAsClassName
This defaults to false.
When true, the field's DisplayText property returns the text content of the field truncated to the DisplayWidth (number of characters to be displayed).
For IBX Memo Fields, the DisplayWidth defaults to 128 characters. If this is overridden and set to zero, then the DisplayText is not truncated.
On the other hand, if the DisplayTextAsClassName is false, then the inherited behaviour is used and the DisplayText returns the classname in square brackets. The DisplayWidth is the inherited default (currently 10).
FPC 3.0.0 onwards. TIBStringField and TIBMemoField also have a CodePage” property. This gives the code page used for the string returned by the “AsString” property.
New applications using IBX should just work “out of the box” as regards character set handling and, in most cases, you will have little need to understand the above unless you are having to process the character set type information for each column.
Existing applications should continue to work without a problem. However:
•If you want to make use of the correct DisplayWidth, or
•If you want the original behaviour of DisplayTextAsClassName, or
•You want to process the character set type information for each column
•And you have used the IDE Field's Editor to create field objects as part of a form
then you will need use the Fields Editor to remove the existing field object for each affected text field and to add it back again. This will create a field object using the new IBX subclassed field type. You may then make use of the extended properties.