Overview
Net::OAI::Harvester is a Perl module for easily querying repositories that support the Open Archives Initiative Protcol for Metadata Harvesting (OAI-PMH). Using a set of 6 verbs OAI-PMH allows data providers to expose their digital objects so that they can be easily harvested by interested parties. The protocol is essentially XML over HTTP. Net::OAI::Harvester does XML parsing for you, handles issuing HTTP requests, and manages resumption tokens. You can get at the raw XML if you want to do your own XML processing, and you can drop in your own XML::SAX handler if you would like to do your own parsing of metadata elements.
Install
If you've got Perl you've probably got cpan on your command line so try this:
cpan install Net::OAI::Harvester
If you use ActivePerl on Windows you'll probably want to use the ppm:
ppm install Net::OAI::Harvester
Alternatively you can check out the code from the subversion repository at SourceForge:
svn co https://oai-harvester.svn.sourceforge.net/svnroot/oai-harvester/
Simple Example
This example performs a ListRecords operation on the Internet Archive's OAI-PMH server, and iterates through all the records printing out the title of each record. Note, listAllRecords is used which handles resumption tokens for you behind the scenes.
use Net::OAI::Harvester; # create a harvester for the Internet Archive my $harvester = Net::OAI::Harvester->new( baseURL => 'http://www.archive.org/services/oai2.php' ); # perform a listAllRecords operation my $records = $harvester->listAllRecords( metadataPrefix => 'oai_dc' ); # iterate through the results while (my $record = $records->next()) { # output the DC title from the record metadata print $record->metadata()->title(), "\n"; }