|
Views
|
NOTE: this document describes early OpenIsis proposals.
We are working to reduce this to an easy to handle subset for implementation in
Malete
Using views in OpenIsis.
A "view", like a VIEW in SQL, creates new, typically temporary records based on
existing ones by means of some transformation like selecting a subset of the
available fields (a projection), retagging fields or manipulating field values.
As general concept, a view can be implemented using any algorithm
in any of the available programming languages to create new records
(and need not only refer to record contents, but may also access other
ressources like files).
In a more narrow sense, however, a view is a special kind of transformation
defined by a "view record". The fields of a view record have tags
as they should appear in the target, typically some valid tags of the source
plus, for example, index control tags, if the view describes indexing.
In the following, the term "alphanumeric" denotes any ASCII letter or digit,
or any non-ASCII character.
"Word character" denotes any alphanumeric, hyphen '-' or underscore '_'.
The value can have one of several forms:
- if it is empty,
the tag is passed to the source record's v command (see below).
- if it starts with a %,
the rest of the value (w/o the %) is passed to the source record's v command.
If the tag is not 0, '=tag;' is prepended.
- if the value starts with any word character,
it is used literally.
- if it starts with a quote,
the rest of the value is used literally (w/o the quote).
If the value's last character is a quote, it is discarded.
- if it starts with an @,
the rest of the value names a view to be included
- if it starts with an &,
the rest of the value is the name of an extension exit to call
- if it starts with an {,
the rest of the value is a script to be executed in the host language
(after stripping an optional } as last character)
- any other form
(i.e. starting with other ASCII punctuation) is reserved for future use
Example: the view
24
70
is a simple projection selecting fields 24 and 70 from the source.
| the v command |
is described here as an abstract command.
It is available in the C-API as well as from the language bindings,
possibly with language specific variations.
It resembles the core concepts of traditional formatting,
including access to and looping over fields and subfields,
selecting substrings and attaching optional literals.
It is sort of the record's printf.
Like printf, and unlike traditional formatting,
it neither supports flow control nor screen rendering.
It takes a source and target record plus a string specifying a format.
Depending on the language environment, the source and/or target may be implicit.
If the format starts with '=tag;', where tag is a tag,
this gives the tag used in the target and as default.
Otherwise, tags from the source are used in the target and default is *.
The first (next) character is then checked for an encoding mode, see below.
The format is a series of output specifications,
consisting of a field tag (word characters, either numerical or by field name),
selectors and modifiers. The special tag * selects all fields.
Each spec may contain several subspecs, separated by commas,
using the same child context (otherwise, specs and subspecs are the same).
So the format is spec[;spec...], and a spec is spec[,subspec...].
The general operation of the v command is to loop over the record
until the last occurence was seen for all tags.
In the nth repetition, for each tag in any spec,
the (n+i)th occurence of a field with this tag is used,
where i is an offset given by an occurence selector.
Determine whether this is the last occurence.
For every iteration, a new output field is started,
and the format is processed as follows:
- loop over the (main) specifications
- loop over childs (or use the given field)
- loop over subspecs
- loop over subfields (or use the whole field)
- apply decoding
- apply substring
- apply encoding
- attach literals
- append the result to the target record
Each spec starts with an optional decoding mode,
optionally followed by a tag,
optionally followed by a child selector,
optionally followed by a subfield selector,
optionally followed by string modifiers,
optionally intermingled with occurence selectors and literals:
- , starts a new subspec
- ; starts a new spec with default context reset to the last tag seen
- . starts a child selector
- ^% start a subfield selector
- ([ start an occurence selector
- /~"'`|+ start a literal
- : starts a substring selector
- & calls an extension
- { evaluates a script
| encoding mode |
One of the following operators as first character of the format
can select an output "encoding":
- ? outputs a 1, if the selected entitity exists, 0 else
- ! the opposite of ?
- & applies HTML encoding
- % applies URL encoding
The test encodings ?! inhibit normal processing;
they immediatly return after checking the first occurence of the the first tag.
For example, using a default of all tags (*), the format consisting
solely of a '?' checks wether a record is empty.
More special characters (but not the '*') may be designated in the future,
so a format should always start with a tag (possibly explicit *).
| decoding mode |
An uppercase character before the tag may denote a decoding mode:
- H heading mode:
^x is replaced as ';' for x=a, ',' for x=b..i, '.' for others
angle brackets are removed (>< replaced by '; '), <a> or <a=b> evaluates to a
- D data mode:
in addition to heading mode, if there is no explicit literal after this field,
append ' ', if it ends in "punctuation", or '. ' else.
- X index mode
like heading, but <a> evaluates to nothing and <a=b> to b
- M traditional
For compatibility, specs reading MHx or MDx (x = L or U) set heading
or data mode, resp., as default processing (before substringing).
The case directive is ignored.
| child selector |
If a tag is immediatly followed by a dot '.' and optional tag,
field context is switched, for this spec and following specs separated by ',',
to loop over the childs with the given tag.
Tag defaults to 0, selecting text nodes in the canonical XML representation.
A * selects all childs, a second . recursively selects all childs.
| subfield selectors |
The primary subfield selector is the hat '^', followed by one character.
It can produce multiple items, like repetitions of a subfield or keywords.
If the selector character is
- alphanumeric
select the (repetitions of the) subfield tagged with this character.
- an opening pairing brace
i.e. one of '(','{','[' or the angle bracket '<',
words between pairs of this brace are selected (commonly keywords).
- a *
selects the part up to the first subfield delimiter
- a space
selects naive words as sequences of alphanum
- a )
selects parts between TABs (array mode)
- other punctuation
like / or | selects parts between pairs of this character
The percent sign '%' (think printf) works basically like the hat, but
- removes quotes surrounding values
- by default treats the TAB as subfield delimiter
- if followed by a punctuation character or space,
treats this plus surrounding whitespace as delimiter,
not separating within quotes.
- if followed by a ),
(optionally after another punctuation) goes to array mode,
that is there is no subfield indicator stripped from the values
- if followed by multiple word characters,
(including '-' and '_', optionally after an initial punctuation)
searches for subfields starting with that sequence followed by '=' or ':'
Examples:
- '^)' splits at TABs
- '%)' splits at TABs with quote removal
- '%a' selects a sequence following a TAB and 'a'
- '%,)' splits a line of comma separated values
- '%;*' selects the primary value of a MIME property
- '%;charset' selects the charset attribute of a MIME property
| occurence selector |
By default, all occurences of fields, childs and subfields are used.
One or multiple occurences can be selected explicitly following a tag,
child selector or subfield selector using brackets [] (counting from 1)
or parentheses (counting from 0) like (i) or (i..j).
- If i is ommited, it defaults to the first (1 or 0, resp.).
- If j is ommited, it defaults to last.
Alternatively occurences may be selected by contents.
The general format is an optional subfield selector,
followed by an comparision operator, followed by a literal.
Only occurences where the field or specified subfield matches
the literal according to comparision are selected.
Parentheses select all such occurences,
while brackets select the first match
and default to the first occurence if none matches.
Operators are
- = for equality
- ~ for contains
- * for starts with
- + for ends with
The equality operator may be ommited, where unambigous.
If some key subfield is known to occur at the start or end of field,
it is probably more efficient to test for +^zen than for ^z=en.
| literals |
Each tag, child or subfield selector may be followed by one or more literals.
Every literal but the / extends to the next occurence of the same
special character by which it is introduced.
This special character may be escaped using a backslash.
A literal backslash may be escaped as two (but need not, except at the end).
The special character governs when and where the literal is output:
- " before the first occurence
(of the entity in question; i.e. field, child or subfield)
- ' before each
- ` after each
- | inbetween (after each but the last)
- + after the last
- / this single-character literal starts a new output field after each occurence
- ~ this literal is used if the given entitity does NOT occur
Literals are not subject to the string modifiers.
| substring selector |
Introduced by a colon ':', it has the form :l or :o.l, where o and
l are integers denoting an offset and length to cut from the currently
selected value.
| extension exits |
An exit is a C-function (i.e., using C calling convention) in a dynamic library.
TODO: describe interface.
| script evaluation |
If a scripting environment like Tcl is available,
a {} block may contain a script to be evaluated.
TODO: describe interface.
$Id: Views.txt,v 1.4 2004/06/10 12:52:29 kripke Exp $
|
|