project . docs . changes . svn . tests .
Net::OAI::Harvester

Overview

Net::OAI::Harvester is a Perl module for easily querying repositories that support the Open Archives Initiative Protcol for Metadata Harvesting (OAI-PMH). Using a set of 6 verbs OAI-PMH allows data providers to expose their digital objects so that they can be easily harvested by interested parties. The protocol is essentially XML over HTTP. Net::OAI::Harvester does XML parsing for you, handles issuing HTTP requests, and manages resumption tokens. You can get at the raw XML if you want to do your own XML processing, and you can drop in your own XML::SAX handler if you would like to do your own parsing of metadata elements.

Install

If you've got Perl you've probably got cpan on your command line so try this:

	cpan install Net::OAI::Harvester
	

If you use ActivePerl on Windows you'll probably want to use the ppm:

	ppm install Net::OAI::Harvester
	

Alternatively you can check out the code from the subversion repository at SourceForge:

	svn co https://oai-harvester.svn.sourceforge.net/svnroot/oai-harvester/
	

Simple Example

This example performs a ListRecords operation on the Internet Archive's OAI-PMH server, and iterates through all the records printing out the title of each record. Note, listAllRecords is used which handles resumption tokens for you behind the scenes.


    use Net::OAI::Harvester;

    # create a harvester for the Internet Archive
    my $harvester = Net::OAI::Harvester->new(
      baseURL => 'http://www.archive.org/services/oai2.php'
    );

    # perform a listAllRecords operation 
    my $records = $harvester->listAllRecords(
      metadataPrefix => 'oai_dc'
    );

    # iterate through the results
    while (my $record = $records->next())
    {
      # output the DC title from the record metadata
      print $record->metadata()->title(), "\n";
    }