Universal Data Structure (UDS)
UDS Structured data is newObjects standard for data holding and sharing.
UDS is defined as two separate divisions - logical data
structure (LDS) and persistence mechanism (PM).
In most cases the LDS will point to the memory structure that holds
the data and the PM will point to one or more drivers/modules/objects
capable of saving/reading such a structure to certain media in certain
formats.
LDS is more strictly defined, but depending on the capabilities of
the environment it may support or not support all the features
proposed below.
PM is more relaxed. You may have different PM-s, for example some
of them may support full UDS persistence, others may support partial
persistence or limited persistence in order to fulfill the limitations
of the supported media. Examples:
PM that supports a binary stream format specially designed for UDS
saves the LDS given with all its data.
PM designed to support Windows registry will fail, skip or convert
any data elements from the structure that are not compatible with the
Windows registry.
Benefits of the relaxed PM rules: Using the same
API/set of objects/modules you have the opportunity to use UDS to
represent certain media (for example the Windows registry). You have
limitations but on the other hand you work with it as with the other
UDS. Therefore you have the ability to transfer certain media to/from
UDS as long as the target media supports all the data types/element
types you currently have in this UDS. You must only be careful to
limit the data elements in the concerned structures to the elements
supported by the most restricted persistence mechanism you are going
to use.
The Logical Data Structure:
Elements
named element - every element must provide support
this feature. Named elements have a textual name that must be
treated as case insensitive string. Name must be optional - i.e.
the element can be unnamed. The names should be only from the
ASCII charset. If extended characters are used in the names there
is no guarantee that they will be correctly preserved by the PM-s.
The elements
Value - named element that holds data of one type
at a time. Value must be "variant" - i.e. its data and
its type can be changed. The types supported are:
Required:
int - 4 byte signed integer,
int64 - 8 bytes signed integer
float - 4 byte floating point
double - 8 - byte floating point
string - in the default encoding for the implementation
Optional:
unicode string
unsigned int - 4 bytes
binary data - blocks of bytes
Record - named element that contains a set of values.
Values in the record can be accessed through name and 1-based
index. Their count can be obtained, enumeration (applicable for
the implementation) support is recommended.
Section - named element that contains a set of records
and other sections. Contained elements can be accessed by
name and 1-based index. Their count can be obtained, enumeration
(applicable for the implementation) support is recommended. Section
supports additional information textual and optionally numeric
class identification called class info.
Example purpose for the class info: Section may support
creation and persistence of the class determined by the class
info. Section uses the mechanisms available in the
implementation environment to allow the class to read its data
from it and to store it in the section. For example a COM
oriented implementation may treat the class info as class ID/Prog
ID and use for example IPropertyBag and the related interfaces to
expose the records and values to the object or may define custom
interface to represent them better.
Notes: Elements don't "know" their parents.
Implementation can apply additional restrictions to the element
names - such as denying duplicate names, or denying empty section
names. Such limitations are not recommended for the class
hierarchy implementations (i.e. the in-memory implementation of
the LDS) and if they are implemented there must
be an option to switch them off.
Persistence mechanisms. Data storing and transferring.
The implementation in a particular environment must supply a
set of persistence mechanisms. They are called in our
documentation also data
transfer drivers, or also text savers. They must
support storing/encoding of the structured data, but they are not
required to be able to encode/decode every structured data tree.
Feature limitations are controlled by standard flags defined by
the standard for the C++ implementations (they are not subject of this article).
Only entire sections with their sub-sections can be transferred through text savers.
The developers will find helpful the base idea behind the
standard. It may help to understand the limitations in certain
situations. In two words structured data is accessed in one common
way but it can be stored in different formats. As a tree based
structure it is able to represent many other data formats - for
example windows registry. However windows registry, for example,
does not support all the data types supported by the UDS - there is no floating point numbers defined by it
for example. On the other hand some of the formats supported by
the structured persistence mechanisms (PM) are especially designed for it and
they support all the data types and structure variations. This
makes possible to read without problems structured data through
one PM - from any source, but saving it through another PM could be a problem. For example if the data is read
from the binary format (which supports all the features) there is
no guarantee that it can be saved to the registry because it may
contain data types not supported by the registry. So there are
different opportunities - if you want to keep this data in the
registry it can be saved as a single binary key but this will be
of some use only in some very special situations. Contrary keeping
the registry branch structure intact when saving it allows us to
work with the registry as with any other UDS. So the
registry text saver saves the data as registry branch - each value
as registry value, each section as registry key. The limitations
are obvious - records must contain only one value, no duplicate
names are allowed for the subsections of one section and even if
floating point values can be saved as integers it is recommended
to use only integers, strings and binary data in order to match
the registry capabilities on read and write and thus avoid need to
implement some special code to cope with the data types
incompatible with the registry. So the conclusion is that the
structured data allows us to implement different text savers and
we have freedom to choose the most useful form for them - the
functionality which will allow us to do more even if we need to
pay some attention to the particular limitations of the media.
ConfigFile is designed for simple usage - configuration or
application data storing/restoring to the file system, registry
or binary streams. It supports 3 formats (PM) in the
current version. It is able to manage the class info data but
does not support object creation and management over it. In general
it can be used for internal application purposes and for editing of
configurations of another applications that use the structured data
standard. There is no COM based extension interface for custom PM in this
version (only the internal drivers are available), but it will be
supplied in the later versions (only free-threaded COM text savers
will be supported).
We will explain here in details only the text
file format because it can be edited manually. The binary format
supports everything in much more effective form and is compatible
across the versions of the component (and with the other supported
platforms - such as Windows CE) and the applications that use
the C++ implementation. We recommend to use text
format for configuration oriented tasks and the binary format for
data transfers (between applications, on the network etc.), huge
data amounts and wherever no human readable form is required. The
registry should be used only if needed because of reasons beyond the
current application - e.g. compatibility with other applications,
editing settings of programs that use registry etc.
The text file format is supported through the Read
and Write methods.
Text file format:
{ section name: (class name)
(type_name)RecordName[ValueName]=value
(type_name)RecordName=value
{ nested section name: (other class name)
; ...
} nested section name;
} section name;
Every statement is placed on a separate line.
Sections are defined: begin with { section name:
statement, end with } section name;. Section name
must be the same in the begin and end statements. Leading spaces are
ignored. Name ends with the : character.
Class name - is defined in ( ) brackets at the end of the
section begin statement. It is optional and can be omitted. Class
names depend on the particular application.
Records are defined by the all non-sections with the same RecordName.
I.e. all the entries with equal RecordName will be
represented as one record with many values in the memory.
Values - every non-section and non-comment statement defines
a value. If it has no ValueName value will be unnamed, in the
other case ValueName will be assigned to the value. RecordName
is used to determine part of which record the value will be -
all the values defined with equal RecordName become one record.
The order of the values in the record is the same as in the
configuration file
type_name is one of the following type names:
int - 4 byte signed integer,
int64 - 8 bytes signed integer
float - 4 byte floating point
double - 8 - byte floating point
string - in the default encoding for the implementation
uint - 4 byte unsigned integer
Binary data uses different syntax but is generally not recommended
for configuration purposes (See below)
Comments - Comment lines begin with ";" as
first non-blank character.
Notes: end of the value is the end of the line thus it is
important to take care about the trailing spaces and other blank
characters when editing the file in a text editor. Sections can be
nested without other limit than the application. Class names depend
on the architecture of the application that uses the file -
Jacked-Objects library contains APIs that allow automatic class
creation over these names by using class creators (called class
libraries). Other applications may implement their own usage of the
class info, for example: VBScript (let us think of ASP application
for instance) may use the class info as a routine name to be called
to process the section or may create COM classes and initialize them
with the contents of the section.
Binary data value representation:
[ RecordName[ValueName]:
BYTE
BYTE
...
BYTE
] RecordName[ValueName];
ValueName is optional as for all the other value types. Each line
between the beginning and the end statement contains one hex decimal
byte.
Remarks
You don't need to read the following information in order to use
the ConfigFile component, but it can be useful for you as general
notes about the structured data architecture.
UDS architecture is the core
technology in our applications. It simplifies data manipulation and
allows standard components to be created. By assuming any data
received/read from file/device/network is a UDS data we have built
many C++ classes and components intended to process its conversion
from/to its internal representation. Thus the actual logic of every
application deals with the classes that implement UDS and doesn't
need more than 2-3 lines of code in order to specify how and in what
format data is received/sent.
XML programmers will find many similarities with XML techniques,
but there are also many differences. It is possible to encode UDS as
XML but in fact XML is a result of accommodation of a text
document oriented standard to the modern software development needs.
Contrary UDS is developed as abstract class architecture and actual
format used to transport it is not important unless
cores-application compatibility is needed. The binary format is the
standard that must be supported on all the platforms and guarantee
the cross-application and cross-machine compatibility. We may
introduce additional cross-platform compatible formats in future,
but aside of them the other persistence mechanisms are intended to
be a bridge between UDS and the media in concern. For example the
text format described above is convenient for configurations,
further if these configurations are used to construct data that will
be transmitted to other machine the binary format should be used or
if you need to read a registry branch and send it to another
application the scenario is the same. The common in-memory
representation allows you to feed it with data from different
sources and save it wherever it is possible. When you need to store
data in format that does not support all the UDS features you can
receive it from anywhere (in binary format for example) and you only
need to ensure that the application that generates it will not
generate values incompatible with the target media (such as
registry).
Quick comparison with XML
The LDS - the UDS representation in memory can be compared to the
XML DOM. However unlike XML having UDS in memory you do not have a
corresponding textual representation naturally defined. Further
having in mind a particular persistence mechanism you have a
scenario quite similar to the XML. A good example is the text file
persistence mechanism.
The binary representation of UDS is natural - designed to hold
all the UDS features. XML uses the traditional text format which
poses certain difficulties when binary data values must be
represented and requires encoding/decoding.
Standard transformations are not defined for the UDS at this
moment.
|