|
whatabout
|
| what makes ISIS ISIS ? |
Andrew Giles-Peters raised the important question
"What is it about ISIS that makes it ISIS?"
So here are some thougts on this topic from the OpenIsis team:
- As a database used for bibliographic data (among other),
ISIS must be able to store and retrieve records as exchanged via
ISO2709 efficiently and with no or minimal loss of information.
- Besides the ability to retrieve records by number,
ISIS must support an indexing mechanism which is essentially
"function based", that is, index entries are not the immediate
field values, but rather the values of a "view" derived by
some computation are indexed.
- ISIS must efficiently support typical query elements commonly
used on bibliographic databases, like looking up a value without
regard for the field or in several fields at once and specifying
a distance within search terms should occur.
Since these are minimal requirements,
they would not stop anybody from adding tons of features on top.
For example, it's relatively easy to store ISO2709 data in a
relational database like Sybase (used by OCLC/Pica),
each record covering several rows (mfn, field number, field occ, value),
then compute a second similar table for the index and so on.
However, there is the word "efficiently",
which practically turns out to put some restrictions on the
feature-load, especially when combined with:
- ISIS must be widely usable even in the face of *very* low budgets.
Therefore, not only the software itself must be available for at most
a nominal fee, but it also must not require very new, very powerful
or otherwise expensive hardware and system.
Even very large catalogs should get by with moderate system costs.
The OCLC/Pica system for example requires one to spend
hundreds of thousands of dollars for powerful Sun machines.
| end of story ? |
Still, it would be very nice if more areas of application could
be explored for ISIS, both for the librarians in order to be able
to use their favourite DB (i.e. ISIS) for a broader range of
tasks and also to expand the user community, possibly leading
to more support for everybody.
One important question is whether ISIS needs some fundamental changes
deep in it's guts, or whether it already has everything that's needed
to build a broad range of sophisticated solutions on top of it.
As you might expect, we are pretty well convinced of the latter.
| file formats |
Just like it doesn't harm a database much to be exported to and
imported from ISO2709, there is not much of a problem with different
file formats, as long as there do exist conversion tools.
As you know, CISIS/Unix-DBs are incompatible to WinIsis/DOS-DBs,
but may be converted via ISO files.
As long as the basic data structures are the same,
lossless conversion is just a matter of tools.
It's even less of a problem if the software itself can read
several file formats (like openisis does).
You won't care much whether your wordprocessor is reading
a .doc or .rtf file, would you?
We did an interesting and very successful study implementing
an ISIS-like DB in pure Java using a plaintext masterfile
very similar to the Mbox mailfolder format
(hope to be able to release the code soon).
Likewise there is no reason why one should not be able to read
directly from an ISO2709 file.
Besides convertible masterfile formats, one might well use other
formats for xref and index, which always can be reconstructed as needed.
There are several reasons like improved performance or robustness to do so.
So I don't think ISIS is defined in terms of detailled file formats,
but rather in terms of the basic data structures.
One problem that might come to mind when talking about file formats
are the limits. While the maximum number of records per DB as well
as the maximum total file sizes are bypassed relatively easy
by logically joining several databases, the maximum record size of
about 32K is a limit which might be unacceptable for some applications.
(Although it can partly be resolved by deploying external files
like OCLC/Pica does to circumvent Sybase's varchar limits).
Raising this limit would clearly restrict lossless conversion to one way,
from small to large DB. Where a large DB model is needed,
all parties developing ISIS software should agree on one format
to allow for as-painless-as-possible interoperability.
| so what kind of database is ISIS ? |
Classical database theory basically distinguishes ISAM,
network, hierarchical and relational database systems.
ISIS is strongly related to ISAM DBs, however it's flexible
indexing is rarely paralleled by any of these systems
and it's non-flat data model is targeted by hierarchical DBs
only (in greater generality and with much higher costs).
- Although direct joins by MFN shouldn't be too costly,
ISIS is not the database of choice when several records
typically need to be combined in queries or transactions.
However, in many application cases, only one ISIS record is
needed as opposed to several relational table rows.
In such situations, ISIS is even an excellent and efficient
transaction (OLTP) database (since save writing of an ISIS
record is much simpler than other DB's undo/redo logs).
- ISIS is not the database of choice when records are updated
by the hour. However, where only about 10% of records are
changed between two (monthly, weekly or daily) runs of backup
and compactification, the space overhead is not a big problem.
Where old versions of data need to be retained anyway
(as often needed and supported, for example, by postgres history),
you would hardly find a more efficient solution.
- ISIS is not the database of choice when it comes to high volume online
analytical processing (querying statistics on several dimensions, OLAP).
However, after reading some database books and Oracle manuals,
one learns that OLAP requires a well designed ("star schema")
database separate from the transactional one, anyway.
- ISIS does not, in itself, provide any concurrency control
(actual implementations do, to some extend). This doesn't
hurt when running a read-only multi-user catalogue,
a stand-alone application and in some insert-only situations.
For distributed multi-client update, there are mechanisms based
on timestamps or stored procedures that need to be supported
by some ISIS server to come.
While these data models are strongly tied to the logical
nature and physical organisation of the data,
newer notions like that of an 'object oriented' or 'XML'
database rather describe a way to use and access a database.
Actually OO or XML DBs are usually based on one of
the above mentioned systems (mostly relational ones).
For the most part, using a DB as OO or XML storage does require nothing
but some libraries and optionally precompilers for C++ or Java
-- these can be build on top of existing ISIS without changing it,
and ISIS will be an excellent choice for many applications.
Some aspects of increased functionality and performance will
require sort of "stored procedures" running inside the database.
In the case of a XML DB they are used for example to decomposite structures,
in the OO case they might need some sort of "magic switch" (method
overriding) to perform differently for some records than for others.
We believe that all this magic can be achieved based on ISIS.
The concepts of an ISIS database server and a scripting language as an
alternative to formatting exits are to be discussed elsewhere ...
First we want to shed some more light on the great flexibility the
ISIS database system has by it's very nature.
| ISIS is a mail database |
Looking at http://www.faqs.org/rfcs/rfc822.html (or its updates)
one will find many similarities between ISO2709 records and internet mails,
which are, after all, essentially a series of header names and values.
After assigning numbers to the 100 or 200 most commonly used headers
and some sort of subfield encoding (e.g. "^nname^vvalue",
"namevalue" or simply "name: value") to store other header lines
with a special field number, mails are easily and very efficiently
stored in an ISIS database. Given the enormous number of communication,
groupware and workflow systems that are nowadays built upon standard plain
internet mails (typically using a set of special mail headers),
this is a very large area to be served by ISIS databases.
The above mentioned Mbox-style implementation of ISIS tends towards
that direction, building upon the javax.mail standard.
IMAP mail servers could greatly benefit from the powerful indexing
and retrieval system of ISIS databases.
If also the mail sending application allows to select special headers
from an entry form prepared by a skilled librarian with thesauri
and systematics, an institution or company could really come to a new
way of using mail as a system of qualified, living information.
| ISIS is a multimedia database |
After all the mail not only has got headers, but also a body.
A plaintext body of reasonable length (some KB, like sent by nice people),
fits without problem in a field whose number means "body".
A multipart body is easily decomposed to a series of body fields.
Wether larger or non-plaintext bodies are stored within or outside
the masterfile is a matter of the actual implementation and doesn't
need to be discussed here, both approaches have their pros and cons.
Anyway, the MIME standard, up and running since 1982,
allows for storage and transmission of anything that uses bytes,
and is easily integrated with ISIS databases
(we partly did it, code to be released).
| ISIS is a XML database |
Likewise XML, which basically is text, can be stored in an ISIS database
(with respect to the implementation's maximum record length).
Add some formatting exits to address the XML node content via
a DOM-style a.b.c notation as used in javascript, use them in your FST
and you will for sure have one of the world's best indexed and fastest
XML database -- most others are using a relational DB as basis.
So indexing, retrieving and displaying XML data is more or less
simply a matter of some formatting functions.
However, when thinking about data entry forms, for example,
the dark side of the force shows up:
Even with a very sophisticated database system with the ability to make
sense out of XML DTDs, it is anyway potentially much more complicated.
XML was meant to provide arbitrary complexity in the first place.
And when it comes to DTDs like that of XHTML, which will carry just about
the same content as any HTML page, one easily understands that reasonable
automatic processing becomes nearly impossible -- that's the reason why
HTML pages are largely beefed up with headers (Dublin Core and others).
If you really desperately need it, it's good to have it,
but else using it might be looking for trouble.
When having to work with XML structures for one or the other reason,
typically because they should be imported or exported,
one should think of a mapping between XML and ISIS structures.
In many situations XML structures are shallow and can be ISIfied by
simply mapping the first level of sister nodes to ISIS fields and the
second level to subfields (may require repeated subfield support).
In other situations a closer look at the data structure may reveal
that it is not well designed with regard to Ockham's razor but contains
totally unnecessary depth which may be collapsed to the first case.
Actually, during several years of work with XML structures as
suggested by several "standards", I rarely found a reasonable
structure which can not be mapped to a field-subfield-schema.
But even if you really need XML structures "as is",
they can be stored very
efficiently
in ISIS, with all the benefits of the flexible index
(c.f.
the universal ISIS record)
.
Anyway, Dublin Core metadata or other RDF (resource description framework)
headers are conveniently stored in ISIS just like mail headers.
Maybe, as this schema was created to suit the needs of the very
old science of bibliographic knowledge management, much of that
experience was built into it.
On the other hand, XML's ancestor SGML was conceived for a document's body,
not the head, and I guess there still is it's place in spite of
programming industry's hype. The use of XML for structuring documents
that are ment to be read by humans rather than machines of course
is perfectly reasonable. Transparent access to file based data associated
with a record and a XML add-on to the formatting language could aid
in converting extracts of document contents to metadata accessible
in the ISIS database and/or it's index.
To wrap it up, I'd suggest to look at XML as an optional add-on to ISIS
rather than an integral part. ISIS already has all the functionality
needed to support any reasonable use of XML. ISIS data can much more
efficiently contain XML structures than the other way round.
| ISIS is a database for document/content management systems |
It follows that ISIS may very well support the needs of
systems for XML documents or website content in XML or HTML.
With increasing experience with such systems, people tend to
understand that content metadata should be organized according
to bibliographic principles. (Not that surprising, is it)?
In cooperation with the oc4science.org there are projects at german
universities to integrate publishing, document management and website CMS,
based on an (Open)ISIS DB and directed by the librarian.
$Id: whatabout.txt,v 1.8 2003/02/14 17:30:33 kripke Exp $
|
|