Isi
unirec

The universal ISIS record


Users of CDS/ISIS are accustomed to stuffing not only bibliographic, but all sorts of data into ISIS records. The 1994 edition of ANSI/NISO Z39.2 ("Information Interchange Format" alias ISO2709), after which ISIS records are modelled (more or less), contained a "reduction of references to 'bibliographic' data, because the standard is used for many other types".
For example, CDS/ISIS uses a ISIS database to hold the various texts for language specific versions. Various control files like the syspar, FDT and FST are well suited for storage in ISIS records. Probably some implementations of CDS/ISIS internally use ISIS records to hold that data, but none seems to be able to read/write those to/from ISIS master files, ISO2709 or a choice of textformats, including XML.

nesting

But then, there is more. I already mentioned that e-mail and many simple XML structures are conveniently stored in ISIS records. But even data which does not all that obviously follow a tag-value-list scheme can be fit easily into such a record. Consider what is happening when ASN.1/BER-encoded structures are sent down the wire to a Z39.50 server, for example structured Type-1 queries to locate bibliographic information: They are turned into a series of tags and values. "Twain OR Clemens" is sent as an "OR" field followed by two term fields valued Twain and Clemens.
We can do that same serialization trick, of course: embedding one structure (i.e. record i.e. tag-value-list) into another by simple inserting the fields. That way we can achieve nesting of arbitrary depth no less than with XML.
One problem that comes to mind: how do we tell the boundaries?
  • for structures with a fixed number of fields, like the "OR" node having two childs, boundaries are implicit.
  • the length (number of childs) may be given with the opening field. This may be inconvenient and/or error-prone, if not computed automatically.
  • a closing item like "" may be used, e.g. a reserved field tag.

These approaches are now discussed in more detail.

Based on such a schema, not only queries can be expressed and stored as records, but also formats, with proper nesting of IFs, loops and so on. This approach has a couple of advantages:
  • formats may be specified in any of a couple of external representations including XML
  • the variants of the CDS/ISIS formatting language with different names for the same functions can be supported using input filters
  • formats can be stored, retrieved and exchanged using standard means

On the other hand, the formatting language could be augmented to support substructures. A straightforward and relatively easy to use and implement extension would be a PASCAL-style WITH r DO. The current OpenIsis bindings, especially Tcl as preferred formatting language, contain such support.

external representation

Besides CDS/ISIS master files and ISO2709 files, there are a couple of text based formats suitable to store or exchange ISIS records.
Most follow a name=value style and are using separators like '=', ':' and linebreaks, with different quoting rules. Among these are
  • RFC 822 emails
  • Java properties
  • Windows-style .INI files
  • character or tabulator separated values (tsv/csv) (think of the TAB as subfield delimiter)

Then there are XML/HTML/SGML and finally freestyle languages, like the query or formatting language, where item boundaries are determined depending on context.
Conversion would typically be based on one or more FDTs, mapping between names and numbers. Such a mapping, when used with formats, could also enable the use of symbolic field names like author instead of v24.
The "plain" representation as preferrably used by OpenIsis is described in the papaer on Serialized records.

$Id: unirec.txt,v 1.6 2003/04/07 13:12:43 kripke Exp $