Harvesting metadata

=Harvesting Tool= The ARIADNE Harvester has been developed to manage harvesting of one or more OAI-PMH targets in a way that is automated as much as possible. The development has been mainly driven by european eContent+ projects which are using IEEE LOM formed metadata, and require validation against a certain Application Profile. The tool is agnostic about the type of metadata to be harvested (DC, LOM, ...) and can be configured and extended for advanced use, such as creating validation reports, pushing metadata through a custom storage connector using an arbitrary protocol, on-the-fly transforming xml metadata to another structure ...

Recent build
Can be fetched here : http://www.cs.kuleuven.be/~bramv/oaiharvester.war

Source on sourceforge
The code can be browsed here : http://ariadnekps.svn.sourceforge.net/viewvc/ariadnekps/OaiHarvester/

You can check out the code using svn with this command : svn co https://ariadnekps.svn.sourceforge.net/svnroot/ariadnekps/OaiHarvester OaiHarvester

Prerequisites
Java 5 or higher needs to be installed. Also a working Apache Tomcat v5.0 or higher installation is needed.

Installation
Just put the .war file in the Tomcat webapps directory (or upload it through the Tomcat manager).

Configuration


When you browse to the harvesting application (typically something like http://localhost:8080/oaiharvester/) for the first time, you will automatically be redirected to the Configuration Wizard. If you want to change the configuration later on, go to the menu : 'Configuration' > 'Change configuration'.

This wizard will help you setting up various parameters : Follow these different steps on the screen until you reach the 'Finish' page. Then hit the 'Home' button to go to the main screen.
 * The first one is the location where to store the harvested metadata (currently out-of-the-box supported are File Store and SPI). You will be asked to enter the details of the chosen storing option.
 * After this the destination directory of the log files needs to be entered.
 * In the final step some extra options can be chosen, amongst which you can enable metadata validation. For more details about this see the next section, Metadata Validation.

Advanced users can change all settings manually by using the menu : 'Configuration' > 'Advanced Configuration'. It is also possible to upload or download a configuration file through the Configuration Menu.



Metadata Validation


The validation service, which can also be used online here, is integrated in the harvester. One can choose to validate all targets against one validation scheme by choosing this scheme in the configuration wizard (global validation settings).

A second, and complementary, option is to choose a validation scheme for an individual harvesting target. For doing so, you need to set the "Validation URI" in the details of the target itself, or signify this when the target is added (Adding a target).

NOTE : If a validation scheme is selected for an individual target, this scheme will be used for validation instead of the scheme selected in the configuration wizard. To generate validation reports, make sure the following settings are correct : Go to "Configuration" > "Advanced Configuration" :


 * repositoryLogs = true
 * afterHarvestJob.outputDir = an existing directory
 * afterHarvestJob.enabled = true

Add/Edit/Delete an OAI-PMH target




If you want to add an OAI-PMH target, go to the menu : 'Targets' > 'Add new OAI-PMH Target'. You will go to the 'add new target' page (as shown on the right), where you can fill in the details of the new target. The only mandatory parameter is the baseUrl of the OAI-PMH target. All other values are either optional or have a default value filled in.

For editing the details of a target afterwards, go to : 'Targets' > 'View all OAI-PMH Targets', and click the 'View Details' button next to the target you want to edit.

If you want to delete a target completely, on the 'all targets' page press the red X next to the target you want to delete.

Harvesting
To perform a harvesting round immediately, go to the menu : 'Harvesting' > 'Manually start harvesting'. This will also take you to the 'History' page, where you can see what the harvester is up to.

Alternatively you can set up a harvesting schedule. To use this feature, go to : 'Harvesting' > 'View/Edit harvesting schedule'. The schedule uses a crontab-like syntax as used on unix machines. See the harvesting schedule figure.

Add your own storage connector
TODO : refine

extend the abstract class org.ariadne.oai.harvestWriter.GenericWriter, and implement the following methods :


 * 1) void CreateTarget(String location);
 * 2) void pushAway(Vector lomVector,String repositoryIdentifier);
 * 3) void disconnect;
 * 4) void connect;

create properties analogue to these :

harvestToDisk.writerClassName = org.ariadne.oai.harvestWriter.FileWriter harvestToDisk.URI = c:/harvester/harvester logs/
 * 1) Properties needed to harvest to disk

change the storeTo property accordingly : Harvest.storeTo = harvestToDisk