Isi
xmlisis
some notes on the relation of XML and ISIS


XML is in widespread use as a lingua franca for glueing software components together. Several tools for this can be found at xml.apache.org.
What is missing here is an efficient, easy to use way of storing XML data. Only the most trivial cases are easily mapped onto the relational data model, which uses flat records, consisting of a fixed number of fields. The data structures modelled in XML typically have a variable number of childs. Hierarchical databases like ADABAS C are well suited and actually used by SoftwareAG in their Tamino XML DB, but aren't widely and freely available.

ISIS to XML

ISIS records can be easily and canonically converted to XML. Anything up to the first subfield delimiter is the body (a text node), subfields are attributes (strictly XML-ish this is ok only for non-repeated subfields). Other special subdivisions of field content like the typical <key word> may split to real child nodes.
The result (as generated by make pdemo) may look like:
<isisrec id="148">
	<v69>
		<key>Educational Psychology</key>
		<key>universities</key>
		<key>Kenya</key>
	</v69>
	<v70>Okatcha, F.M.M.O.</v70>
	<v30 a="1 p."/>
	<v24>Personal statement</v24>
	<v26 c="1976"/>
	<v12 p="Tbilisi, USSR" d="1976">Symposium on the Psychological Bases of Programmed Learning </v12>
</isisrec>

Instead of tag numbers and subfield characters, symbolic names from the FDT may be used.

XML to ISIS

XML data structures can be easily and efficiently mapped to the data model of ISO2709.
The general conversion (based on a SAX parser) works as follows:
  • when encountering an opening tag, look up it's name in the FDT. If there is no FDT provided, create one on the fly. If the FDT does not contain the tag name, create a new entry using tag number max(100,1+highest tag in FDT). Create a field using the tag number found and field value '+'.
  • when encountering an attribute, look up it's name in the metadata Create a new subfield entry if needed using code 'a' or 1+highest code used (for this tag). Append a subfield using the code found.
  • When encountering an empty tag (the current field ends with />), change the starting '+' to '-'.
  • When encountering a text node, add a field using tag number 0 with the node's body as value.
  • When encountering a closing tag, look up it's name as for opening tags, add a field with an empty value.
  • As additional optimization, most text nodes can be eliminated by using the initial value of a node to represent an immediatly following text node.

For example look at RDF ( http://www.w3.org/RDF , http://archive.dstc.edu.au/RDU/reports/RDF-Idiot ). A structure like
<DC:Creator parseType="Resource">
<vCard:FN> Dr Jacky J Crystal </vCard:FN>
<vCard:TITLE> Director </vCard:TITLE>
<vCard:EMAIL> jacky@dstc.com.au </vCard:EMAIL>
<vCard:ROLE> Researcher </vCard:ROLE>
</DC:Creator>
canonically maps to
100	+^aResource
101	+
0	Dr Jacky J Crystal 
101
102	+
...
or, with text-node elimination, to
100	+^aResource
101	-Dr Jacky J Crystal 
102	-Director
...
100
using about half the bytes it takes to store the original.
If they had made an attribute what can be an attribute (not substructered, not repeatable) instead of a child, it would read (with explicitly assigned subfield codes) much more efficiently like
100	^pResource^fDr Jacky J Crystal^tDirector^ejacky@dstc.com.au^rResearcher

Also see unirec and Struct

$Id: xmlisis.txt,v 1.7 2003/06/23 14:43:42 kripke Exp $