QueryLanguages v1.0
From Lomi
Contents |
ProLearn Query Language Definition, V.1.0
Alessandro Campi, Stefano Ceri, Peter Dolog, Erik Duval, Sam Guinea, Geert-Jan Houben, David Massart, Mikael Nilsson, Stefaan Ternier, Zhou Xuan
Introduction
This report provides the definition of the ProLearn Query Language (PLQL), a “query language for repositories of learning objects” defined in the context of the ProLearn Network of Excellence (WP4).
Previous to PLQL, ProLearn has produced the definition of SQI, "Simple Query Interface", a query transport standard that is becoming widely used within the e-learning community. One of SQI’s distinguishing features is to be (on purpose) agnostic about query languages; as such, it can be used together with any query language, but does not include any provision for expressing the query semantics. Going beyond SQI requires adding support to a specific query language, designed with the specific objective of retrieving learning objects (LOs) from collections of possibly heterogeneous learning object repositories.
PLQL aims at covering such gap. It is primarily a “query interchange format”, used by source e-learning applications (or PLQL-clients) for querying LOs repositories (or PLQL-servers); each LO can be described by means of metadata, which may be compliant to the most popular standards, such as Dublin Core, LOM, or Mpeg. For submitting the query and retrieving the results, the application can use the SQI protocol as well as other interoperabilty standards. It is up to the PLQL-client to define user interfaces; these can range from highly sophisticated interfaces to simple keyword-based forms. It is up to the PLQL-server to build PLQL-adapters to the local repository engines.
In defining PLQL, we aim at combining exact search, used for selecting LOs by using their metadata, and approximate search, used for extracting LOs by means of approximate descriptions of their content (e.g., obtained by indexing text of the learning object itself). Thus, a query in PLQL contains both "exact clauses" and "approximate clauses", where each clause is syntactically well-defined.
We give a precise description of the language's semantics concerning both kinds of clauses and their mutual relationship; such semantics is classic within each context, while the relationship between clauses is very simple. A PLQL query is then capable of performing exact search, when the structure of the metadata is known, as well as ranked retrieval when the structure of the metadata is not know; perhaps the most challenging aspect of the language's semantics is to define the meaning of a query when both query aspects should be taken into consideration.
The implementation of PLQL will use existing technology for both exact search (using, e.g., relational technology) and approximate search (using, e.g., an information retrieval engine). We envision language implementations that consist primarily in syntax-directed translators of the two kinds of clauses to the corresponding engines, while the adopters of PLQL will not be concerned with "query optimization". Sophisticated query processing capability can be instead supported by existing products; for instance, database engines may get high performances (e.g. through parallelism), while search engines may go much beyond keyword matching, (e.g. by measuring semantic distance between keywords or using fuzzy word matching).
Due to its nature, PLQL is based upon existing concepts, and actually aims at minimizing the need of introducing new concepts. Specifically, we have borrowed approximate search concepts from CQL [1], a well-established language used for library search. Given that an XML description is available for all dominant metadata standards of learning objects, we have next decided to use object-oriented paths to navigate hierarchies, easily translated to Xpath when needed.
PLQL levels
In designing PLQL, we aim at supporting also very simple repositories. Thus, one of the main concerns of the language design is providing progressive levels, supporting increasingly expressive power, so that even simple repositories can support the lower levels. The structure of the result returned by a PLQL query is also defined by levels, and it will be described in a dedicated section of this document.
Level zero is very basic and corresponds to simple approximate search. Level five is the richest level; for the time being it is undefined, but we assume it will eventually comprise all desired features for a query interchange format. This report presents the first three levels of PLQL; future versions of the specifications will address levels 3, 4, and 5. For each level we indicate:
1. Expressive power
2. Syntax
3. Examples
Level 0
Expressive Power
This layer enables the expression of conjunctive approximate queries. The target must contain all the search terms specified in the query, which can be present either in the metadata descriptions, or in the LO as represented through suitable information retrieval structures (e.g., indexes). When several approximate clauses are presented in the same query, they are considered in conjunction, therefore LOs must have all the keywords presented within an approximate clause in order to be selected; selected LOs are ranked according to the cumulative relevance of the keywords, the ranking is performed by the search engine supported on the server.
Layer zero offers the same expressive power as VSQL, the query language that is supported by a whole range of SQI targets. As an example, a VSQL query looks like:
<simpleQuery> <term>learning object</term> <term>dog</term> </simpleQuery>
The query above is equivalent to:
"learning object" and "dog"
Syntax
Following is the Backus Naur Form (BNF) definition for level 0. This is based on [2]
This level of PLQL is identified with the following URI: http://www.prolearn-project.org/PLQL/l0
0-1: PLQLQuery ::= approximateClause
0-2: approximateClause ::= operand | '(' approximateClause ')' | approximateClause 'and' approximateClause
0-3: operand ::= term1 | term2 | integer | real
0-4: term1 ::= charString1
0-5: term2 ::= charstring2
0-6: charString1 ::= Any sequence of characters that does not include any of the following:
- whitespace
- tab
- ( (open parenthesis )
- ) (close parenthesis)
- =
- <
- >
- '"' (double quote)
- /
- \
- .
If the final sequence is the reserved word 'OR' (case insensitive), its token is returned instead.
0-7 charString2 ::= Double quotes enclosing a sequence of any characters except double quote (unless preceded by backslash (\)). Backslash escapes the character following it. The resultant value includes all backslash characters except those releasing a double quote (this allows other systems to interpret the backslash character). The surrounding double quotes are not included.
0-8: integer ::= [0-9]+
0-9: real ::= [0-9]*\.[0-9]+
This level of PLQL is identified with the following URI: http://www.prolearn-project.org/PLQL/l0
Examples
Correct Queries
The following queries are correct PLQL level zero expressions:
"dog" "learning object" and dog dog and cat and jaguar (dog and cat) and jaguar "lom.general.title" and "my dog" 1.2 and dog test and 1024 "12.25 dog" "\"hello\" he said"
Incorrect Queries
These examples are incorrect expressions (that cannot be submitted to a repository using PLQL level 0):
"learning object" or "dog" "learning object" dog lom.general.title = "dog" lom.general.title or dog wrong\"
Level 1
Expressive power
In level one, in addition to the approximate searches supported by level 0, PLQL queries can express exact searches on metadata fields. The latter are denoted by means of paths. Level 1 supports paths as simple concatenations of elements (separated by dots), starting from the root, with no omission; expressions and parentheses are not allowed.
Level 1 only supports the following roots (lowercase): 'dc' (Dublin Core Metadata Element Set [3]), 'lom' (Leaning Object Metadata [4]), and 'mpeg' (Moving Picture Experts Group [5]). Generic namespaces are not supported.
This level is unaware of “types”, and attribute values cannot be composed. However, encoded strings that represent diverse dataTypes are allowed, given that their meanings can be clarified by referencing URI-identified meta-schemas. Similarly, we allow for the meaning of certain expressions to be clarified by referring to these meta-schemas.
When several exact clauses are presented in the same query, they are considered in conjunction. When both exact and approximate clauses are present in a single query, it is assumed that the exact search has a higher priority than the approximate search. The semantics of PLQL when both exact and approximate clauses are present is to apply the exact clauses first to build an initial result set, then to apply the approximate clauses to the initial result set. This produces a final result set.
However, exact queries might not parse correctly against the metadata available at the storage server. When some exact clauses cannot be parsed by the server, a return code should indicate each of them as "not executed". In particular, if no exact clauses can be parsed in the repository metadata, then the effect of the exact search is null; the repository should operate on the entire set of LOs as if no exact search had been performed.
As a variant to this semantics, requested by the application at query presentation time, the repository could be allowed to use the constant values in the exact clauses that are not parsed correctly as free keywords, so as to perform an approximate search based upon the terms indicated in the exact clauses; a return code should then indicate to the application that this case has occurred. Such variant should be evaluated experimentally, to see if it can be useful at least in certain contexts. Note that a repository unable to process exact queries against certain metadata could always resort to such query interpretation.
Syntax
Note that productions with the same number as productions at lower levels substitute for them, e.g. production 1-2 substitutes (generally extends) production 0-2.
1-1: PLQLQuery ::= exactclause | keywordclause | exactclause ';' keywordclause
1-10: exactclause ::= pathexpr |'(' exactclause ')' | exactclause 'and' exactclause
1-11: pathexpr ::= standard '.' path operator operand
1-12: path ::= term1 | path '.' path
1-13: operator ::= '='
1-14: standard ::= 'dc' | 'lom' | 'mpeg'
This level of PLQL is identified with the following URI: http://www.prolearn-project.org/PLQL/l1
Examples
Correct Queries
The following queries are correct PLQL level 1 expressions:
dc.title = “SQL” and lom.general.title = “SQL”
lom.general.title = "Design Patterns" and lom.general.language = "en"
lom.general.title = "Design Patterns" and lom.technical.format = "video/mpeg" and lom.technical.duration = "PT1H" and lom.rights.cost="no"
lom.general.title = "Design Patterns" and lom.educational.intendedEndUserRole = "learner" and lom.educational.typicalAgeRange = "15-18"
((lom.general.title = abc) and (lom.general.language="fr")) ; test
tiger
keyword1 and keyword2 and (lom.general.language = "fr" ) and (lom.educational.ageRange="10-12") keyword1 and keyword2 and (lom.general.language = en ) and (lom.educational.ageRange=10-12)
Incorrect Queries
The following queries are incorrect PLQL level 1 expressions:
keyword1 and keyword2 and (lom.general.language = "fr") or (lom.general.language = "en" ) and (lom.educational.ageRange="10-12")
dc = 12
lom.general.title
lom.general.(title = "abc")
lom.general.(title = "abc" and language="fr")
tiger or lom.general.title = "abc"
Level 2
Expressive power
Compared to levels 0 and 1, level 2 increases the expressive power of supported queries, by enabling disjunction in addition to conjunction; moreover, clauses may use arbitrary comparison predicates. With approximate search, we use the "=" symbol to denote the 'includes' operator and the "exact" symbol to denote exact string matching. Level 2 enables a limited amount of structuring of exact clauses, by supporting parenthesization within path expressions; in this way, it is possible to descend a hierarchical structure up to given nodes and then build conditions which are based upon the properties of several descendants of that node. Finally, level 2 opens to generic namespaces.
Syntax
2-2: approximateClause ::= operand | '(' approximateClause ')' | approximateClause boolean approximateClause
2-10: exactclause ::= pathexpr |'(' exactclause ')' | exactclause boolean exactclause
2-11: pathexpr ::= path operator operand | pathExp
2-13: operator ::= '=' | '>', '>=', '<', '<=', 'exact'
2-14: standard ::= 'dc' | 'lom' | 'mpeg' | term1
2-15: pathExp ::= path '.' pathExp | '(' selector boolean selector ')'
2-16: selector ::= path operator operand | selector boolean selector | '(' selector ')' | '(' selector boolean selector ')'
2-17: boolean ::= 'and' | 'or'
This level of PLQL is identified with the following URI: http://www.prolearn-project.org/PLQL/l2
Examples
Correct Queries
The following queries are correct PLQL level 2 expressions:
lom.general.identifier.(catalog=isbn and entry=xxxxx)
lom.general.(title = "Design Patterns" and (language = "it" or language = "en"))
lom.general.title = "Design Patterns" and lom.technical.(format = "video/mpeg" and duration <= "PT1H") and lom.rights.cost="false"
lom.general.title = "Design Patterns" and lom.educational.(intendedEndUserRole = "learner" or typicalAgeRange = "15-18")
Incorrect Queries
The following queries are incorrect PLQL level 2 expressions:
lom.general.(title = "abc")
lom.general.title = ("abc" and tiger)
Query Results
The result produced after invoking PLQL on a repository also is built by means of “levels”; the information returned by the server may include just the number of selected items in the results up to more specific meta-information about each item. Currently in PLQL we support four levels (ranging from 0 to 3).
Input
Any query issued in PLQL should be associated with (through the query transport method) an input parameter (ResultLevel:0-3) indicating to the target repository the level of the result that should be returned by the query. In addition, three optional parameters can specify:
- the maximum cardinality (MaxCard:integer) of the result;
- the name of the search method (Method:string) to be used at server side for result extraction, when the client has such choice.
- the name of the standard (Standard:'dc'|'lom'|'mpeg') used for returning meta information about the items in the results
Result
Results are defined, as with the queries, by means of progressive levels
At level zero, sources return at least the cardinality of the result.
At level one, sources return in addition at least the list of URIs of the elements which are extracted by the query. Retrieving the actual object referenced through the URI is left to the application. If the result is ranked, the best results must appear first.
At level two, sources return in addition some specific meta-data of the requested metadata format (e.g., lom, dc, etc.). We do not define the metadata as part of the PSQL standard, but we expect them to include the title, author, and language.
At level three, sources return in addition a numeric ranking value, and if the source supports it a reference (identifier) to the ranking method.
This information is summarized below:
ResultLevel 0: result.cardinality type integer - size of result
ResultLevel 1: result.list type array of 0-MaxCard ranked
records
result.list[i].meta URI of selected resource
ResultLevel 2: result.list[i].meta record of metadata fields
normally: position,title, language,author
ResultLevel 3: result.method type string - method used for scoring
result.list[i].rankingValue type integer - in 0-100 range
giving matching score
We expect most sources to be able to support at least level one (i.e., to return URI ordered according to their ranking).
Syntax
Using the method setResultFormat of SQI, a source can use a URI to inform a target of the expected level of the result. Optionally, the URI also indicates also the expected format of metadata.
Its BNF is:
1: PLQLRES ::= 'http://www.prolearn-project.org/PLRF/' level ‘/’ standard [ / 'method' ]
2: level ::= '0'|'1'|'2'|'3'
3: standard ::= 'dc'| 'lom' | 'mpeg'
4: method ::= string
Result example
We provide an example below of result data in XML format. Suppose that a source wants its query to return level 2 ranked results using standard lom metadata. For such a query would select the results format using the following URI:
http://www.prolearn-project.org/PLRF/2/lom
(where 2 indicates the level and 'lom' indicates the selected standard)
After the query is submitted, the target could return results like the following example:
<?xml version="1.0" encoding="UTF-8"?> <Results xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.prolearn-project.org/PLRF/ http://www.cs.kuleuven.be/~stefaan/plql/plql.xsd http://ltsc.ieee.org/xsd/LOM http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd" xmlns="http://www.prolearn-project.org/PLRF/"> <ResultInfo> <ResultLevel>http://www.prolearn-project.org/PLRF/3/lom</ResultLevel> <RankingMethod>NrOfDownloads</RankingMethod> <QueryMethod>http://www.prolearn-project.org/PLQL/l1</QueryMethod> </ResultInfo> <Record position="1" rankingValue="80"> <Metadata> <lom xmlns="http://ltsc.ieee.org/xsd/LOM"> <general> <identifier> <entry>ARID43_12395</entry> <catalog>ARIADNE</catalog> </identifier> <title> <string language="en">The history of art theft</string> </title> <language>en</language> </general> <technical> <location></location> </technical> </lom> </Metadata> </Record> </Results>
More examples are available at: http://www.cs.kuleuven.be/~stefaan/plql/
Parsing of PLQL
Parsers accepting as input PLQL queries have been developed for levels 0-2 and can be downloaded from [6]; these are distinct parsers that are designed to accept queries up to given levels (therefore: Level2-Parser supports levels 0-2 queries, Level1-Parser supports level 0-1, and Level0-Parser can only parse level 0 queries.)
Overview of the full PLQL specification
We have only partially discussed the features of levels 3, 4, and 5. We anticipate the use of: disjunction and negation, types, arbitrary path expressions in level 3; joins, quantifiers, nested queries in level 4, recursion, proximity, and other highly expressive clauses in level 5.
In this version of the language, we concentrate on one-time queries rather than a query protocol. We anticipate, however, that future versions of the specifications may include a protocol by means of which the application can interact with the user, who will indicates preferences about the query result so that the result itself can be reused for interaction with the repositories, asking for “more results of the same kind”. Merging results from various repositories will also be part of the protocol. In addition, we should study how to notify clients of server-side errors.
Specifications should now take advantage of implementations and experiments which took place during the summer (EUNLRE, ARIADNE/GLOBE, KnowledgeMarkets); another release of the specifications is planned for March 1st, 2007. The final deliverable is due on July 1st, 2007.
</location>
</technical>
</lom>
</Metadata>
</Record>
</Results>

