Utility components Universal Data Structure

Universal Data Structure (UDS)

UDS Structured data is newObjects standard for data holding and sharing. 

UDS is defined as two separate divisions - logical data structure (LDS) and persistence mechanism (PM). In most cases the LDS will point to the memory structure that holds the data and the PM will point to one or more drivers/modules/objects capable of saving/reading such a structure to certain media in certain formats.

LDS is more strictly defined, but depending on the capabilities of the environment it may support or not support all the features proposed below.

PM is more relaxed. You may have different PM-s, for example some of them may support full UDS persistence, others may support partial persistence or limited persistence in order to fulfill the limitations of the supported media. Examples:

PM that supports a binary stream format specially designed for UDS saves the LDS given with all its data.

PM designed to support Windows registry will fail, skip or convert any data elements from the structure that are not compatible with the Windows registry.

Benefits of the relaxed PM rules:  Using the same API/set of objects/modules you have the opportunity to use UDS to represent certain media (for example the Windows registry). You have limitations but on the other hand you work with it as with the other UDS. Therefore you have the ability to transfer certain media to/from UDS as long as the target media supports all the data types/element types you currently have in this UDS. You must only be careful to limit the data elements in the concerned structures to the elements supported by the most restricted persistence mechanism you are going to use.

The Logical Data Structure:

Elements

named element - every element must provide support this feature. Named elements have a textual name that must be treated as case insensitive string. Name must be optional - i.e. the element can be unnamed. The names should be only from the ASCII charset. If extended characters are used in the names there is no guarantee that they will be correctly preserved by the PM-s.

The elements

Value - named element that holds data of one type at a time. Value must be "variant" - i.e. its data and its type can be changed. The types supported are:

Required:

int - 4 byte signed integer,
int64 - 8 bytes signed integer
float - 4 byte floating point
double - 8 - byte floating point
string - in the default encoding for the implementation

Optional:

unicode string
unsigned int -
4 bytes
binary data - blocks of bytes

Record - named element that contains a set of values. Values in the record can be accessed through name and 1-based index. Their count can be obtained, enumeration (applicable for the implementation) support is recommended.

Section - named element that contains a set of records and other sections. Contained elements can be accessed by name and 1-based index. Their count can be obtained, enumeration (applicable for the implementation) support is recommended. Section supports additional information textual and optionally numeric class identification called class info

Example purpose for the class info: Section may support creation and persistence of the class determined by the class info. Section uses the mechanisms available in the implementation environment to allow the class to read its data from it and to store it in the section. For example a COM oriented implementation may treat the class info as class ID/Prog ID and use for example IPropertyBag and the related interfaces to expose the records and values to the object or may define custom interface to represent them better.

Notes: Elements don't "know" their parents. Implementation can apply additional restrictions to the element names - such as denying duplicate names, or denying empty section names. Such limitations are not recommended for the class hierarchy implementations (i.e. the in-memory implementation of the LDS) and if they are implemented there must be an option to switch them off.

Persistence mechanisms. Data storing and transferring.  

The implementation in a particular environment must supply a set of persistence mechanisms. They are called in our documentation also data transfer drivers, or also text savers. They must support storing/encoding of the structured data, but they are not required to be able to encode/decode every structured data tree. Feature limitations are controlled by standard flags defined by the standard for the C++ implementations (they are not subject of this article). Only entire sections with their sub-sections can be transferred through text savers

The developers will find helpful the base idea behind the standard. It may help to understand the limitations in certain situations. In two words structured data is accessed in one common way but it can be stored in different formats. As a tree based structure it is able to represent many other data formats - for example windows registry. However windows registry, for example, does not support all the data types supported by the UDS  - there is no floating point numbers defined by it for example. On the other hand some of the formats supported by the structured persistence mechanisms (PM) are especially designed for it and they support all the data types and structure variations. This makes possible to read without problems structured data through one PM - from any source, but saving it through another PM could be a problem. For example if the data is read from the binary format (which supports all the features) there is no guarantee that it can be saved to the registry because it may contain data types not supported by the registry. So there are different opportunities - if you want to keep this data in the registry it can be saved as a single binary key but this will be of some use only in some very special situations. Contrary keeping the registry branch structure intact when saving it allows us to work with the registry as with any other UDS. So the registry text saver saves the data as registry branch - each value as registry value, each section as registry key. The limitations are obvious - records must contain only one value, no duplicate names are allowed for the subsections of one section and even if floating point values can be saved as integers it is recommended to use only integers, strings and binary data in order to match the registry capabilities on read and write and thus avoid need to implement some special code to cope with the data types incompatible with the registry. So the conclusion is that the structured data allows us to implement different text savers and we have freedom to choose the most useful form for them - the functionality which will allow us to do more even if we need to pay some attention to the particular limitations of the media.

The ConfigFile implementation and text format.

ConfigFile is designed for simple usage - configuration or application data storing/restoring to the file system, registry or binary streams. It supports 3 formats (PM) in the current version. It is able to manage the class info data but does not support object creation and management over it. In general it can be used for internal application purposes and for editing of configurations of another applications that use the structured data standard. There is no COM based extension interface for custom PM in this version (only the internal drivers are available), but it will be supplied in the later versions (only free-threaded COM text savers will be supported). 

We will explain here in details only the text file format because it can be edited manually. The binary format supports everything in much more effective form and is compatible across the versions of the component (and with the other supported platforms - such as Windows CE) and the applications that use the C++ implementation. We recommend to use text format for configuration oriented tasks and the binary format for data transfers (between applications, on the network etc.), huge data amounts and wherever no human readable form is required. The registry should be used only if needed because of reasons beyond the current application - e.g. compatibility with other applications, editing settings of programs that use registry etc. 

The text file format is supported through the Read and Write methods.

Text file format:

{ section name: (class name)
    (type_name)RecordName[ValueName]=value
    (type_name)RecordName=value
    { nested section name: (other class name)
        ; ...
    } nested section name;
} section name;
Every statement is placed on a separate line.
Sections
are defined: begin with { section name: statement, end with } section name;. Section name must be the same in the begin and end statements. Leading spaces are ignored. Name ends with the : character.
Class name - is defined in ( ) brackets at the end of the section begin statement. It is optional and can be omitted. Class names depend on the particular application.
Records are defined by the all non-sections with the same RecordName. I.e. all the entries with equal RecordName will be represented as one record with many values in the memory.
Values - every non-section and non-comment statement defines a value. If it has no ValueName value will be unnamed, in the other case ValueName will be assigned to the value. RecordName is used to determine part of which record the value will be - all the values defined with equal RecordName become one record. The order of the values in the record is the same as in the configuration file
type_name is one of the following type names: 
int - 4 byte signed integer,
int64 - 8 bytes signed integer
float - 4 byte floating point
double - 8 - byte floating point
string - in the default encoding for the implementation
uint - 4 byte unsigned integer
Binary data uses different syntax but is generally not recommended for configuration purposes (See below)
Comments - Comment lines begin with ";" as first non-blank character.
Notes: end of the value is the end of the line thus it is important to take care about the trailing spaces and other blank characters when editing the file in a text editor. Sections can be nested without other limit than the application. Class names depend on the architecture of the application that uses the file - Jacked-Objects library contains APIs that allow automatic class creation over these names by using class creators (called class libraries). Other applications may implement their own usage of the class info, for example: VBScript (let us think of ASP application for instance) may use the class info as a routine name to be called to process the section or may create COM classes and initialize them with the contents of the section.

Binary data value representation:
[ RecordName[ValueName]:
  BYTE
  BYTE
  ...
  BYTE
] RecordName[ValueName];

ValueName is optional as for all the other value types. Each line between the beginning and the end statement contains one hex decimal byte.

Remarks

You don't need to read the following information in order to use the ConfigFile component, but it can be useful for you as general notes about the structured data architecture.

UDS architecture is the core technology in our applications. It simplifies data manipulation and allows standard components to be created. By assuming any data received/read from file/device/network is a UDS data we have built many C++ classes and components intended to process its conversion from/to its internal representation. Thus the actual logic of every application deals with the classes that implement UDS and doesn't need more than 2-3 lines of code in order to specify how and in what format data is received/sent.

XML programmers will find many similarities with XML techniques, but there are also many differences. It is possible to encode UDS as XML but in fact XML is a result of accommodation of  a text document oriented standard to the modern software development needs. Contrary UDS is developed as abstract class architecture and actual format used to transport it is not important unless cores-application compatibility is needed. The binary format is the standard that must be supported on all the platforms and guarantee the cross-application and cross-machine compatibility. We may introduce additional cross-platform compatible formats in future, but aside of them the other persistence mechanisms are intended to be a bridge between UDS and the media in concern. For example the text format described above is convenient for configurations, further if these configurations are used to construct data that will be transmitted to other machine the binary format should be used or if you need to read a registry branch and send it to another application the scenario is the same. The common in-memory representation allows you to feed it with data from different sources and save it wherever it is possible. When you need to store data in format that does not support all the UDS features you can receive it from anywhere (in binary format for example) and you only need to ensure that the application that generates it will not generate values incompatible with the target media (such as registry).

Quick comparison with XML

The LDS - the UDS representation in memory can be compared to the XML DOM. However unlike XML having UDS in memory you do not have a corresponding textual representation naturally defined. Further having in mind a particular persistence mechanism you have a scenario quite similar to the XML. A good example is the text file persistence mechanism.

The binary representation of UDS is natural - designed to hold all the UDS features. XML uses the traditional text format which poses certain difficulties when binary data values must be represented and requires encoding/decoding.

Standard transformations are not defined for the UDS at this moment.

newObjects Copyright 2001-2005 newObjects [ ]