IBX Strings and Character Set Handling

A Firebird Database can specify a wide range of character sets for character and text mode blob columns. A client application can choose to read each column in its native character set or to have the Firebird Client library transliterate on its behalf. For example, adding:

lc_ctype=UTF8

to a TIBDatabase's params property will cause all character data (other than character set 'none' or 'octetstring') to be transliterated into UTF8, regardless of the column's defined character set.

For most applications, this is probably the easiest way to handle multiple character sets. However, some more specialist applications may need to have access to native character set encoding.

From release 1.4.0, IBX provides the encoded character set width and character set name of each character mode column. It also uses this to compute the correct byte size for the field buffer without having to oversize the field's display width.

FPC 3.0.0 onwards. FPC 3.0.0 introduces AnsiStrings with the codepage as a property of the string (see http://wiki.freepascal.org/FPC_Unicode_support#DefaultSystemCodePage). From release 1.4.1 onwards:

There is also a new TIBDatabase property UseDefaultSystemCodePage. When this is set to true, any hardcoded lc_ctype is ignored and IBX will choose the lc_ctype that best matches the DefaultSystemCodePage. This should give platform independence and ensure that there is minimal transliteration overhead. Note that if the lc_ctype is set explicitly and this is not a good match from the Default System Code Page then a second transliteration could take place due to FPC automatically translating from the string returned by Firebird and the code page used for most strings.

Unless you have specialist string handling requirements, set UseDefaultSystemCodePage to true.

The Theory

The TDataSet model demands that a dataset provides information about its column types in a TFieldDefs collection, comprising a TFieldDef for each column identifying its type, column name and other useful information. This is used, for example, by the Lazarus IDE's Fields Editor to create a list of dataset fields. Selecting a field name in the Object Inspector also uses the field defs.

IBX subclasses the TFieldDef to create its own extended field def (TIBFieldDef). This contains the Character Set Name and Character Set Size for string and memo type fields, where the Character Set Size is a number in the range 1..4 that gives the maximum number of bytes in which a character can be encoded. For example, UTF8 has a character set size of 4.

The TFieldDef is also used when dataset is opened  and the Field objects are created, or associated with fields added to a form by the IDE's Fields Editor. In IBX, instead of the standard field types for text fields, subclassed versions are used to provide the extended information. These are:

These all have the additional properties:

which, respectively, provide the Character Set Name and Size for the text string. The application may then process them accordingly. Note that the Wide String versions are used when the character set size is 2 (e.g. Unicode).

These field types also adjust the field size so that it returns the character width in the column multiplied by the character set size, while the DisplayWidth property should always return the character width of the column. This should ensure that the appropriate buffer size is allocated for the column whilst avoiding oversizing the DisplayWidth (and thus making automatic TDBGrid columns too wide).

The Memo Field types also have a new published boolean property:

This defaults to false.

When true, the field's DisplayText property returns the text content of the field truncated to the DisplayWidth (number of characters to be displayed).

For IBX Memo Fields, the DisplayWidth defaults to 128 characters. If this is overridden and set to zero, then the DisplayText is not truncated.

On the other hand, if the DisplayTextAsClassName is false, then the inherited behaviour is used and the DisplayText returns the classname in square brackets. The DisplayWidth is the inherited default (currently 10).

FPC 3.0.0 onwards. TIBStringField and TIBMemoField also have a CodePage” property. This gives the code page used for the string returned by the “AsString” property.

In Practice

New applications using IBX should just work “out of the box” as regards character set handling and, in most cases, you will have little need to understand the above unless you are having to process the character set type information for each column.

Existing applications should continue to work without a problem. However:

then you will need use the Fields Editor to remove the existing field object for each affected text field and to add it back again. This will create a field object using the new IBX subclassed field type. You may then make use of the extended properties.