Setting Up OAI-PMH

=Setting Up OAI-PMH=

Check this link for the old version with support for Java 1.4.2.

Introduction
Open Archives Iniative – Protocol for Metadata Harvesting (OAI-PMH) is a protocol designed to collect metadata from various repositories.

There are two actors in this approach. On one side there is the Service Provider, who wants to collect the metadata (for eg. to enhance it or make it better reachable), and on the other side are the Metadata Provider(s), who want to make their metadata available.

In this page we will elaborate on how to set up an OAI-PMH service on top of a repository (also called an OAI-PMH target). The programming language used is Java, and the web server is Apache Tomcat.

The picture below shows an overview where the OAI-PMH target software is located in the architecture, and what functionality this service offers.



How OAI Works
To be able to offer an OAI-target, one has to set up a webservice that supports a series of so called "verbs" : -Identify : returns the identification of the OAI-repository -ListMetadataFormats : returns a list of supported metadata formats. -GetRecord : returns the metadata of the requested item. -ListIdentifiers : returns a list of identifiers. -ListRecords : returns a list of metadata objects. -ListSets : returns a list of available sets.

Note that all returned data is formatted in XML. For harvesting purposes, the Service Provider will usually invoke the ListRecords verb (with a metadata prefix and usually a datespan) on the Metadata Provider.

Getting Started
If you start from scratch, there are some steps you have to take before coding.

Download
All needed files (except for java) can be found here.

Following software is needed to start :

- Java SDK 1.5.0 : download from Sun - Jakarta Tomcat 5.5.25 : just extract the archive to a directory - Apache Ant : just extract the archive to a directory - Source code of the OAI Repository software : oaicat_5.0.zip

- Source code of the OAI repository software in PHP: phpoai2-1.8.0.zip

Setting Environment Variables
set JAVA_HOME=C:\j2sdk1.5.0 set ANT_HOME=C:\apache-ant-1.6.5 set PATH=%JAVA_HOME%\bin;%ANT_HOME%\bin;%PATH%

Building and Deploying
To build the code, open a command-prompt/terminal, go to the oaicat_5.0 directory (where a file build.xml is located) and type "ant" (without the quotes). Then there will be a file named oaitarget.war in the directory /oaicat_5.0/build/. To install the OAI Target you have to copy the oaitarget.war file to the "webapps" directory of your Tomcat installation (and fire up your tomcat installation if you haven't already done so).

Now you can already test if your webservice is running by invoking the "Identify" verb. If your Tomcat is running on the local host and running on the default port (8080), the url to perform this task is the following : http://localhost:8080/oaitarget/OAIHandler?verb=Identify

If everything is running like it should, you will get an Xml-page with info about the OAI Repository.

Similary the verb "ListMetadataFormats" will already show the supported metadata formats.

Sample Databases
In order for new users to get a better understanding of how setting up an OAI target works, there have been set up fully working examples.

!! Before you start !!

The property files can be found in the WEB-INF folder. By default, the application uses the oaicat.properties file. So if you choose on of the three example databases, you need to use the approperiate properties file (For the filesystem database use the "oaicat_filesystem.properties" file, etc). There's two ways to accomplish this : 1. Rename the correct properties file to oaicat.properties. or 2. Change the "param-value" property in the WEB-INF/web.xml file : - Open the web.xml file located at "oaicat_5.0/WEB-INF/" with a text editor. - Look for the following line : oaicat.properties. - Change oaicat.properties to the name of the property file you will use. - Close and safe the file. Please keep the name of the properties file you will use in mind when reading the rest of this paragraph.

File System

The sample metadata in xml files can be found here. - Extract the folder somewhere. - Enter the full path of this folder in the properties file, as a value of the "FileSystemLomCatalog.basePath" property. Please note that one should use forward slashes "/" in the path (eg. "c:/my/path/" or "/home/user/path/"). - Change the "FileSystemLomCatalog.ext" property if needed. This means only files with a ".xml" extension will be used. - Reload the oaitarget application or restart the tomcat server it is running on. - Go to http://localhost:8080/oaitarget/OAIHandler?verb=ListRecords&metadataPrefix=oai_lom to test the target.

Lucene Index

The sample database can be found here. - Extract the folder somewhere. - Enter the full path of this folder in the properties file, as a value of the "LuceneLomCatalog.lucenePath" property. Please note that one should use forward slashes "/" in the path (eg. "c:/my/path/" or "/home/user/path/"). - Reload the oaitarget application or restart the tomcat server it is running on. - Go to http://localhost:8080/oaitarget/OAIHandler?verb=ListRecords&metadataPrefix=oai_lom to test the target.

MySql

If you are familiar with MySql, there is SQL code available for setting up an example MySql database, which works with the sample code given in the oaitarget software. The SQL code can be found here.

- Open the properties file with a text editor. - Change the "MySqlLomCatalog" properties according to your own database. - Reload the oaitarget application or restart the tomcat server it is running on. - Go to http://localhost:8080/oaitarget/OAIHandler?verb=ListRecords&metadataPrefix=oai_lom to test the target.

Binding The Code
The only thing that has to be implemented to get the OAI target fully operational, is the binding with your database. Here we will give you some startingpoints to get this done easily.

Verbs
The verbs that have to be implemented can all be found in a single class, and are located in the source code at org.ariadne.oai.server.%INDEXATION_TYPE%.catalog.%INDEXATION_TYPE%Catalog.java, where %INDEXATION_TYPE% stands for the type of database. This class must always extend from the abstract class org.oclc.oai.server.catalog.AbstractCatalog.

In the given oaicat sourcecode there are 2 examples available, one for a LuceneIndex (org.ariadne.oai.server.lucene.catalog.LuceneLomCatalog) and one for a MySql database (org.ariadne.oai.server.mysql.catalog.MySqlLomCatalog).

GetRecord
With this verb one can ask for the metadata of one specific item. If you have the sample lucene database installed, you can try the following query : Go to the url http://localhost:8080/oaitarget/OAIHandler?verb=GetRecord&identifier=oai:oaicat.ariadne.org:BLKLKP1961&metadataPrefix=oai_lom , the metadata of an item with Identifier "BLKLKP1961" is returned. The identifier sent along with the verb request is actually constructed from 3 pieces (separated with an ":"). The "oai" is automatically added to indicate it comes from an OAI repository, the "oaicat.ariadne.org" is the identifier from the OAI repository itself and the "azerty" is the identifier of the item in question. The last parameter that is sent along is the metadataPrefix, this specifies in which format the metadata should be returned.

So to bind this to your database, all you have to do is to perform a query to get the metadata of the requested item and put this in the nativeItem Object in the code (which is sent along with the constructRecord method at the end of the getRecord method). Please note that this can be any kind of object, just use an object you can easily extract data from (in the sample code a HashMap is used). This object will be reused in the Crosswalk-class (see Crosswalks)).

In the image below, there is an overview of the flow of the code in the getRecord method. The orange circle represents the piece of code one has to replace with own code to bind this to his/her database.

ListRecords
This verb is the one that is used the most when harvesting. It is similar to the GetRecord verb, but now a date span is passed as a parameter. The result of this request should be a list of metadata that has been added or updated in the specified date span. The listRecords method has four arguments : the from and until specify the date span, the metadataPrefix is analogous to the one in getRecord and the set is an argument used when your data is divided in sets. In this implementation, there is an option to split large lists into seperate requests, to reduce the load on the database. This is done by using a resumptiontoken. So there is a second method listRecords that only has one argument, the resumptiontoken. To prevent performance and load issues, it is advised to implement this resumptiontoken. However, instead of the implementation in the sample code, the full resultlist shouldn't be cached, but the partly results should be queried from the database each time a listRecords with a resumptiontoken comes in.

ListIdentifiers
This is basically the same verb as ListRecords, but only returns the identifiers of the items instead of all the metadata.

Crosswalks
The contructRecord method will be passed along through the framework and will eventually invoke a method createMetadata in a so called crosswalk class. Crosswalks are used in this framework to ensure that records are created in the same way, no matter what verb is invoked. '''This means that you only need to do the mapping to the LOM standard once, whatever standard you're using, AND that this mapping can easily be adapted just by modifying the crosswalk class. ''' This class can be found in an analogous way to the %INDEXATION_TYPE%Catalog.java class, more specific at org.ariadne.oai.server.%INDEXATION_TYPE%.crosswalk.%INDEXATION_TYPE%2oai_lom.java. A crosswalk class must always extend from the abstract class org.oclc.oai.server.crosswalk.Crosswalk. Again, in the given oaicat sourcecode there are 2 examples available, one for a LuceneIndex (org.ariadne.oai.server.lucene.crosswalk.Lucene2oai_lom) and one for a MySql database (org.ariadne.oai.server.mysql.crosswalk.MySql2oai_lom). An overview of this flow can be found in the following image : The orange circle again represents the code you have to edit to bind this request to your database, in this case map the object you created onto a LOM object. - The Object nativeItem must be cast to the object you are using to hold the metadata (in the sample code a HashMap). - All "fields" in your object must be mapped onto a LOM field. As an example, in the sample code the identifier and the title are already set, using the data from the nativeItem object.

RecordFactory
The third and last class you have to implement/adapt when you use a different database, is a RecordFactory class. This class must always extend from the abstract class org.oclc.oai.server.catalog.RecordFactory , and is usually located at org.ariadne.oai.server.%INDEXATION_TYPE%.catalog.%INDEXATION_TYPE%RecordFactory.java In the given oaicat sourcecode there are 2 examples available, one for a LuceneIndex (org.ariadne.oai.server.lucene.catalog.LuceneLomRecordFactory) and one for a MySql database (org.ariadne.oai.server.mysql.catalog.MySqlLomRecordFactory). In this class some methods are used for creating a record. The methods that possibly have to be altered are getLocalIdentifier and getDatestamp. The adaptations are very similar to the ones in the crosswalk (see Crosswalks) class.

Properties File
At default, the software will search for the properties file in the WEB-INF directory.

Note: You can easily switch in the sample code between using a LuceneIndex and a MySql-database by changing the properties file that has to be used. This is done in the web.xml file in the WEB-INF directory. The tag where the filename of the properties file is specified is , more precisely .

Note: (concerning the Catalog,RecordFactory and Crosswalk classes) If you use a class different from the ones provided in the example code, u have to notify this in the properties file. This can be done in one of the following properties : AbstractCatalog.oaiCatalogClassName, AbstractCatalog.recordFactoryClassName or Crosswalks.oai_lom.

Note: When using the sample code to set up an OAI-target, please change the data that is exposed by invoking the "Identify" verb. This data is extracted from the properties file. It concerns the properties beginning with "Identify." and the repositoryIdentifier property (Used by the RecordFactory class).