|
unirec
|
The universal ISIS record
Users of CDS/ISIS are accustomed to stuffing not only bibliographic,
but all sorts of data into ISIS records. The 1994 edition of ANSI/NISO
Z39.2 ("Information Interchange Format" alias ISO2709), after which ISIS
records are modelled (more or less), contained a "reduction of references
to 'bibliographic' data, because the standard is used for many other types".
For example, CDS/ISIS uses a ISIS database to hold the various texts
for language specific versions. Various control files like the syspar,
FDT and FST are well suited for storage in ISIS records.
Probably some implementations of CDS/ISIS internally use ISIS records
to hold that data, but none seems to be able to read/write those to/from
ISIS master files, ISO2709 or a choice of textformats, including XML.
| nesting |
But then, there is more. I already mentioned that e-mail and many
simple XML structures are conveniently stored in ISIS records.
But even data which does not all that obviously follow a tag-value-list
scheme can be fit easily into such a record. Consider what is happening
when ASN.1/BER-encoded structures are sent down the wire to a Z39.50 server,
for example structured Type-1 queries to locate bibliographic information:
They are turned into a series of tags and values. "Twain OR Clemens" is
sent as an "OR" field followed by two term fields valued Twain and Clemens.
We can do that same serialization trick, of course:
embedding one structure (i.e. record i.e. tag-value-list)
into another by simple inserting the fields.
That way we can achieve nesting of arbitrary depth no less than with XML.
One problem that comes to mind: how do we tell the boundaries?
- for structures with a fixed number of fields, like the "OR" node
having two childs, boundaries are implicit.
- the length (number of childs) may be given with the opening field.
This may be inconvenient and/or error-prone, if not computed automatically.
- a closing item like ">" may be used, e.g. a reserved field tag.
These approaches are now
discussed in more detail.
Based on such a schema, not only queries can be expressed and stored
as records, but also formats, with proper nesting of IFs, loops and so on.
This approach has a couple of advantages:
- formats may be specified in any of a couple of external representations
including XML
- the variants of the CDS/ISIS formatting language with different
names for the same functions can be supported using input filters
- formats can be stored, retrieved and exchanged using standard means
On the other hand, the formatting language could be augmented
to support substructures. A straightforward and relatively easy to
use and implement extension would be a PASCAL-style WITH r DO.
The current OpenIsis bindings, especially Tcl as preferred formatting language,
contain such support.
| external representation |
Besides CDS/ISIS master files and ISO2709 files,
there are a couple of text based formats suitable
to store or exchange ISIS records.
Most follow a name=value style and are using separators like '=',
':' and linebreaks, with different quoting rules.
Among these are
- RFC 822 emails
- Java properties
- Windows-style .INI files
- character or tabulator separated values (tsv/csv)
(think of the TAB as subfield delimiter)
Then there are XML/HTML/SGML and finally freestyle languages,
like the query or formatting language, where item boundaries
are determined depending on context.
Conversion would typically be based on one or more FDTs,
mapping between names and numbers.
Such a mapping, when used with formats, could also enable the
use of symbolic field names like author instead of v24.
The "plain" representation as preferrably used by OpenIsis
is described in the papaer on
Serialized
records.
$Id: unirec.txt,v 1.6 2003/04/07 13:12:43 kripke Exp $
|
|