Tag Archives: jisc

Resourcesync: Making things happen with callbacks

resync_logoIn a previous blog post I introduced the ResourceSync PHP API library.  This is a code library written PHP that makes it easy to interact with web sites that support the new ResourceSync standard.  The default behavior for the code when scynchronising with a server either during a baseline sync (complete sync) or a incremental sync (of only changed files since the last baseline sync) is to simply download the files and store them on disk in the same directories as they exist on the server.

However, unless you want to just store the files for backup purposes, the chances are that you’ll want to process them in some way.  There are two ways to do this, either perform the synchronisation, and then process the files, or process them as they are downloaded.

From the last post, you’ll know that by using the ResourceSync PHP library, performing a sync can be as simple as:

[php]
include ‘ResyncResourcelist.php’;
$resourcelist = new ResyncResourcelist(‘http://example.com/resourcelist.xml’);
$resourcelist->baseline(‘/resync’);
[/php]

This will process the resourcelist file by file, and download them to the /resync/ directory.

In order to process these, you need to register a ‘callback’ function with the library.  Each time an item is synchronised, the code in the callback function will be executed.

The following code snippet shows a very simple example of a callback.  This example displays the filename of the resource that has been downloaded, and prints the XML that described the file in the ResourceSync resourcelist.  The XML can be useful as it provides contextual information about the file, such as its size, checksum, last modified date, and links to related items.  Of course some of these will have already been checked by the library (such as last modified date when using the date range option, and the checksum to make sure the file has been retried successfully).

[php]
$resourcelist->registerCallback(function($file, $resyncurl) {
echo ‘  – Callback given value of ‘ .$file . "\n";
echo ‘   – XML:’ . "\n" . $resyncurl->getXML()->asXML() . "\n";
});
[/php]

When performing a baseline sync using the ResyncResourcelist class it is only possible to register a single callback.  This is called whenever any file is downloaded.

However the ResyncChangelist class allows three different callbacks to be registered, depending on the action: CREATED, UPDATED, or DELETED.

[php]
$changelist->registerCreateCallback(function($file, $resyncurl) {
echo ‘  – CREATE Callback given value of ‘ .$file . "\n";
echo ‘   – XML:’ . "\n" . $resyncurl->getXML()->asXML() . "\n";
});

$changelist->registerUpdateCallback(function($file, $resyncurl) {
echo ‘  – UPDATE Callback given value of ‘ .$file . "\n";
echo ‘   – XML:’ . "\n" . $resyncurl->getXML()->asXML() . "\n";
});

$changelist->registerDeleteCallback(function($file, $resyncurl) {
echo ‘  – DELETE Callback given value of ‘ .$file . "\n";
echo ‘   – XML:’ . "\n" . $resyncurl->getXML()->asXML() . "\n";
});
[/php]

Depending on the purpose of your code, it is likely that you would want to handle these three types of events in different ways, hence the three callback options.

In the next blog post, I’ll show an example of this code in action, as it uses the callback to look at each resource’s XML to discover whether it is a metadata file or a related resource.  It then uses this information to deposit the item into a repository using SWORD.разработка и поддержка web сайтов

The ResourceSync PHP Library

resync_logoOver the past year, thanks to funding from the Jisc, I’ve been involved with the NISO / OAI ResourceSync initiative.  The aim of ResourceSync is to provide mechanisms for large-scale synchronisations of web resources.  There are lots of use cases for this, and many reasons why it is an interesting problem.  For some background reading, I’d suggest:

The specification itself can be read at http://www.openarchives.org/rs, and a quick read will highlight very quickly that the specification is based on sitemaps (http://www.sitemaps.org/) which is no surprise, given that they were developed for the easy and efficient listing of web resources for search engine crawlers to harvest – which in itself is a specialised form of resource synchronisation.

As with anything new, the proof is always in the pudding, which in this context means that reference implementations are required in order to both test that a standard can be implemented and fulfill the original use cases it was designed to do, but also to smooth off any rough edges that only appear once you use it in anger.

My role therefore has been to develop a PHP ResourceSync client library.  The role of a client library is to allow other software systems to easily interact with a technology – in this case, web servers that support ResourceSync.  The client library therefore provides the facility to connect to a web server and synchronise the contents, and then to stay up to date by loading lists of resources that have been created, updated, or deleted.

The PHP library can be downloaded from: https://github.com/stuartlewis/resync-php

The rest of this blog post will step through the different parts of ResourceSync, and shows how they can be access by the PHP client library:

The first step is to discover whether a site supports ResourceSync.  The mechanism to do this is by using the well-known URI specification (see: RFC5785).  Put simply, if a server supports ResourceSync, it places a file at http://www.example.com/.well-known/resourcesync which then points to where the capability list exists.

The first function of the PHP ResourceSync library is therefore to support this discovery:

[php]
include(‘ResyncDiscover.php’);
$resyncdiscover = new ResyncDiscover(‘http://example.com/’);
$capabilitylists = $resyncdiscover->getCapabilities();
echo ‘ – There were ‘ . count($capabilitylists) .
‘ capability lists found:’ . "\n";
foreach ($capabilitylists as $capabilties) {
echo ‘ – ‘ . $capabilties . "\n";
}
[/php]

Zero, one, or more capability list URIs are returned.  If none are returned, then the site doesn’t support ResourceSync.  If one is returned, the next step is to examine the capability list to see which parts of the ResourceSync protocol are supported:

[php]
include(‘ResyncCapabilities.php’);
$resynccapabilities = new ResyncCapabilities(‘http://example.com/capabilitylist.xml’);
$capabilities = $resynccapabilities->getCapabilities();
echo ‘Capabilities’ . "\n";
foreach($capabilities as $capability => $type) {
echo ‘ – ‘ . $capability . ‘ (capability type: ‘ . $type . ‘)’ . "\n";
}
[/php]

The output of this is that the specific ResourceSync capabilities supported by that server will be returned.  Typically a resourcelist and a changelist will be shown.

The next step is often to perform a baseline sync (complete download of all resources).  Again, the PHP library supports this:

[php]
include ‘ResyncResourcelist.php’;
$resourcelist = new ResyncResourcelist(‘http://example.com/resourcelist.xml’);
$resourcelist->enableDebug(); // Show progress
$resourcelist->baseline(‘/resync’);
[/php]

It is possible to ask the library how many files it has downloaded, and how large they were:

[php]
echo $resourcelist->getDownloadedFileCount() . ‘ files downloaded, and ‘ .
$resourcelist->getSkippedFileCount() . ‘ files skipped’ . "\n";
echo $resourcelist->getDownloadSize() . ‘Kb downloaded in ‘ .
$resourcelist->getDownloadDuration() . ‘ seconds (‘ .
($resourcelist->getDownloadSize() /
$resourcelist->getDownloadDuration()) . ‘ Kb/s)’ . "\n";
[/php]

It is possible to also restrict the files to be downloaded to those from a certain date.  This can be useful if you only want to synchronise recently created files:

[php]
$from = new DateTime("2013-05-18 00:00:00.000000");
$resourcelist->baseline(‘/resync’, $from);
[/php]

Once a baseline sync has taken place, all of the files exposed via the ResourceSync interface will now exist on the local computer.  The next step is to routinely keep this set of resources up to date.  To do this, depending on the frequency at which the server produces change lists, these should be processed to download new or updated files, and to delete old files:

[php]
include ‘ResyncChangelist.php’;
$changelist = new ResyncChangelist(‘http://example.com/changelist.xml’);
$changelist->enableDebug(); // Show progress
$changelist->process(‘/resync’);
[/php]

Again, there are options to see what files have been processed:

[php]
echo ‘ – ‘ . $changelist->getCreatedCount() . ‘ files created’ . "\n";
echo ‘ – ‘ . $changelist->getUpdatedCount() . ‘ files updated’ . "\n";
echo ‘ – ‘ . $changelist-getDeletedCount() . ‘ files deleted’ . "\n";
echo $changelist->getDownloadedFileCount() . ‘ files downloaded, and ‘ .
$changelist->getSkippedFileCount() . ‘ files skipped’ . "\n";
echo $changelist->getDownloadSize() . ‘Kb downloaded in ‘ .
$changelist->getDownloadDuration() . ‘ seconds (‘ .
($changelist->getDownloadSize() /
$changelist->getDownloadDuration()) . ‘ Kb/s)’ . "\n";
[/php]

Also again, it is possible to only see changes since a particular date.  This can be used to keep note of when the sync was last attempted, meaning only changes made since then are processed:

[php]
$from = new DateTime("2013-05-18 00:00:00.000000");
$changelist->process(‘/resync’, $from);
[/php]

The PHP library allows in a few steps, each consisting of a few lines, for the contents of a ResourceSync enabled server to be kept in sync with a local copy.

A further two blog posts will be published in this series.  The next will show how to interact with the library so that more complex actions can be performed when resources are created, updated, or deleted.  The final blog post will show this in action, with an application of the PHP ResourceSync library making use of the resources it processes.как разместить контекстную рекламу

Thoughts on the Elevator

The JISC have been running an experimental funding system known as the JISC Elevator.  The introduction on the site’s homepage describes the concept well:

JISC elevator is a new way to find and fund innovative ways to use technology to improve universities and colleges. Anyone employed in UK higher or further education can submit an idea. If your idea proves popular then JISC will consider it for funding. The elevator is for small, practical projects with up to £10,000 available for successful ideas. So if you have a brainwave, why not pitch it on the elevator?

A small team of us from the University of Edinburgh Digital Library submitted a proposal: The Open Access Index #oaindex.  The video submission is shown below…

I’ve previously blogged about the experience of creating this submission.  This post however contains a few observations about the Elevator concept, and the proposals that have been submitted.

First off – I’m a big fan of this system for a number of reasons:

  • It gave us an avenue to submit this type of proposal for a small amount of funding (only a few thousand pounds)
  • It provided us with a public platform and forum to socialise and discuss the idea
  • It adds a more open peer review stage to the process
  • It could encourage proposals from first-time bidders (although the public nature of it might put some people off?)

It will be interesting to see if or how the concept evolves overtime.  Last week I got to chat about this with Andy Mcgregor the JISC Programme Manager in charge of this, and Owen Stephens.  A few ideas that arose include:

  • Restrict the number of votes that any one person can place – make voters think harder about which ideas are most worthy of funding as there is only a limited number of projects that can be funded
  • Perhaps allocate each voter a set of votes or mock money or shares – they decide how they invest them across the proposals (all to one great idea, or spread across a few)
  • Be more transparent about the funding each project has requested and who has voted

I’ve been following the different ideas as they’ve been submitted, and a few trends have surprised me.

The first relates to the funding band that the proposal falls into.  There are three funding bands: up to £2,500, up to £5,000, and up to £10,000.  There is a total of £30,000 being made available to fund some of the submissions.  I can’t tell what band the proposals that have already received enough votes for fall into, but for those that are still collecting votes, the breakdown is as follows:

I was surprised at the number that requested the full £10,000.  Of course, it could be that those in the ‘unknown’ category (those which  have already received the number of votes they require) are all in the lower bands, therefore require fewer votes, and are there now fully voted for.  When pitching an idea, I always consider the amount of money available, and therefore the likelihood of receiving a given share of that money.  In this case, due to the very limited funding, we chose to submit a proposal in the lowest band to (hopefully) increase our chances.

The second aspect of the submissions that struck me was the domain to which the submission relates. I’ve split these up into three very broad (and arguably very bad) categories: Learning (students / learning enhancement), IT (systems, development) and Library (materials, metrics).

The ‘education’ category received by far the most submissions, with IT and Library lagging far behind.  Indeed our #oaindex proposal seems to be the only one in the library domain.  Why is this?  Perhaps the amounts available are much lower in the IT and Library domains than we are used to bidding for?  Perhaps there are less opportunities for funding in the education domain?  Are those in the education domain better at seizing these new and innovative funding opportunities than those in the library or IT domain?  Discuss…!

When we created our video, we ensured that we mentioned who we were and which institution we worked for.  However it didn’t cross our minds to include any sort of branding in our submission.  I only thought of this when watching some of the others.

We were not alone, only a few included some.  Did we miss an opportunity here, or is the brand somewhat irrelevant to the format of submitting elevator pitches: should voters be influenced by the idea more than by the host institution?

Our submission took the format of ideas being drawn on a whiteboard, with a voice-over in the background.  I’ll openly admit that this was because none of us really wanted to stand in front of a camera for 3 minutes.  Given how much we laughed during the simple voice recording, I think doing this in front of the camera would have taken even longer.  Sorry – we didn’t keep the out-takes!!!

Voting for the proposals ends in a few days, and I’m looking forward to seeing which get funded, which don’t get enough votes, and whether or not the concept continues.  But the scheme certainly gets my vote for the periodic allocation of small amounts of funding for great ideas!

[ The data that I’ve collected on the proposals can be seen at: https://docs.google.com/spreadsheet/ccc?key=0AgXAkDGxqBWYdHR1c0l6Uzk5aDFfQzlaM0ZtV04wcVE I’d be happy to receive updates or corrections.]racer games

A tale of two bids

This is a tale of two bids; two recent JISC bids to be precise.  One submitted via the  ‘traditional’ route, and one via the experimental ‘Elevator‘ route.  This blog post is a brief reflection of my thoughts about these, and a comparison of the experience, in particular comparing the effort involved.

First, I should provide a brief explanation of the two routes:

  • The ‘traditional’ route: Traditionally JISC requires bid proposals to be submitted as text documents, usually in the range of 6 to 12 pages.  These include cover sheets, budgets, benefits and risks, a bit about the people involved, and of course an explanation of the problem that will be investigated.  On top of that, there are letters of support and FOI checklists.  As part of the recent JISC Digital Infrastructure call, we submitted a couple of bids.  What we bid for is somewhat irrelevant, but I will disclose that the two bids we submitted were requesting funding of approx £30,000 each.  These proposals will now be marked by internal and external markers, followed by a panel decision.
  • The Elevator route: JISC are currently running an experimental funding stream, known as the JISC Elevator.  The idea is that proposals should be lightweight, consisting of a brief video presentation, along with a few words.  No budgets, no letters of support, no FOI statements – just an elevator pitch about the idea.  This is the first difference.  The second difference is that ‘the crowd’, which in this case consists of anyone with a .ac.uk email address, are allowed to vote on which projects should be considered for funding.  Any that get enough votes will go forward for consideration by a panel.  The number of votes required are proportional to the amount requested, with three bands being up to £2,500, up to £5,000, and up to £10,000.  We pitched at the bottom end of this scale, meaning that we required 50 votes (which we received in less than 24 hours).

I’ll openly admit that the traditional route is often stressful.  It takes around about 1 week of effort (full time), usually spread over 3 or 4 weeks.  The final days tend to get quite frantic as everything is pulled together, we go through internal reviews and consents, seek letters of support, and pull the bid together for final submission.

In comparison, our pitch for the elevator took about half a day – an hour to refine the idea and seek approval, an hour to write a script, an hour to record the voices, an hour to make the video, and a few minutes to upload it.

The feeling at the end of the elevator process was markedly different to the end of the traditional process – and this felt good.  However, when you look at the sums (adjusted slightly to make the numbers easier)…

Elevator: 1/2 day = £3,000 potential funding
Traditional: 5 days = £30,000 potential funding

So the actual potential return per hour invested in bid writing is the same!

However if I extrapolate this to other bids I have written in the past, some of which have been for higher amounts, the trend does not seem to continue in a linear fashion.

My personal experience (your mileage may vary – it would be good to compare notes!), is that bids in the range of thirty to perhaps two hundred thousand, take a similar amount of time:  a week or so for a primary proposal author, and various time commitments from other parties.  But bids above this amount then start taking longer again, as project complexities, often around collaboration and external involvement kick in.

What can I conclude from this?  I’m not sure really!  Feel free to draw your own conclusions and to make comparisons with your own experience.

What I can say, is that we enjoyed the process of making, submitting, and then publicising our elevator pitch.  We felt that we had more freedom to be inventive with our interpretation of the submission requirements, and felt quite refreshed at the end of the process, rather than frazzled!

Now we await the outcomes of both…сколько стоит раскрутка сайта

How the West was won – 12 repositories for Wales

Last week, on the 19th January 2009 we held the launch event of the ‘Welsh Repository Network‘. It was a project funded by the JISC in their ‘Start-Up & Enhancement‘ stream and run in association with WHELF (the Wales Higher Education Libraries Forum) to create a network of twelve repositories across Wales – one repository for each higher education institution. We have been rolling these out over the past two years, and this process is now complete! 🙂

The twelve repositories are at:

Another very useful output from the project is a set of twelve case studies detailing the hardware purchased by each university to run their repository, and the rationale behind the decisions. There are a wide variety of universities in Wales in terms of size and profile, so hopefully these will be useful to other people. The case studies can be downloaded from http://hdl.handle.net/2160/1881

The launch event was held in the Drwm at the National Library of Wales. The day started with interesting talks by Glen Robson and Dan Field who are repository programmers at the National Library of Wales about some of their current projects. These were followed by a behind-the-scenes tour of the library, and lunch. The afternoon consisted of a launch speech about ‘The importance of the WRN to Wales’ given by the Librarian of the National Library, Andrew Green. This was followed by a talk entitled ‘Institutional Repositories: Essential tools for the modern research environment’ given by Professor Lyn Pykett the Pro-Vice Chancellor for Research at Aberystwyth University. Next up was Dr. Andrew Prescott the manager of Library Services at the University of Wales Lampeter who spoke about ‘Repositories and University Information Services’. Finally the presentations were concluded with a talk by the third Andrew of the day, Andrew McGregor, programme manager from JISC on ‘Looking to the future: The impact of the Start-Up & Enhancement (SUE) Projects’.

Finally an official press release is available (in English or Welsh), as is paper detailing the process taken over the past couple of years to get to this position is available:

Lewis, S., Payne, H., How the West was won, ALISS Quarterly, ISSN 1747-9258, Vol 4, no. 2, pp 18-23 (http://hdl.handle.net/2160/1882)

game rpg mobile

JISC repository aggregator site

It has been announced that JISC have commissioned the creation of a new repository aggregator site:

JISC Repository Aggregator Website 

JISC funds a wide variety of development projects on behalf of its funding bodies. These projects include consultancies and supporting studies where the main deliverable is a report and projects where the deliverables include products and services as well as these reports. 

The project involves developing a small user community to guide the development of the site, to produce the site and to develop a series of bespoke widgets to draw information from readily available sources of information. 

The overall aim of this demonstrator site will be to enable a user to search for, organise and hand submit information about a range of relevant information about repositories. The repository aggregator will provide a single destination where people interested in repositories can get information about digital repositories. 

Aims and Objectives 

The objectives of the aggregator website are to: 

  • Produce a demonstrator website that can be shown to some members of the repository community to gauge whether they would find such a service useful. Then, make the service available as a public beta offering while plans are made to develop the site further. 
  • Create a customizable and personalisable solution that can adapt to the wide range of information that a user might like to aggregate. 
  • Specifically ensure the service can aggregate with RSS feeds from relevant blogs, the Intute Repository Search service, information from the RSP site including support contacts. Statistics from OpenDOAR and ROAR, Sherpa RoMEO and JULIET, brief explanations of key topics, persistent aggregated search of sources like google scholar and technorati, subject based collection details from IESR, descriptions of useful repository software, e.g. IRstats, feedforward, sword client, manakin and RSS feeds from relevant repositories. 
  • Create focus groups in a structured way to help manage the feeback from the user community at all stages of development. 
  • Specific development requirements include the consideration of the Netvides Universal Widget API, Netvibes Universe, an authentication system, cross browser compatibility. 
  • (No, not ‘widgets as found in cans of beer‘, but widgets as in ‘web widgets‘!)

    It is an interesting development and with my repository stats hat on (http://maps.repository66.org/) I’m particulaly looking forward to seeing what this aggregation can offer, and the value it will provide.game mobi