Tag Archives: eprints

The SWORD course videos now online

I recently blogged about ‘The SWORD Course’, as the slides had been put onto slideshare.  Now, thanks to UKOLN’s Adrian Stevenson, the videos are now available too:

  1. An Introduction to SWORD: Gives an overview of SWORD, the rationale behind its creation, and details of the first three funded SWORD projects
  2. SWORD Use Cases: Provides an introduction to use cases, and examines some of the use cases that SWORD can be used for
  3. How SWORD Works: A high level overview of the SWORD protocol, lightly touching on a few technical details in order to explain how it works
  4. SWORD Clients: The reasons for needing SWORD clients are shown, followed by a tour of some of the current SWORD clients
  5. Create Your Own SWORD Client: An overview of the EasyDeposit SWORD client creation toolkit, including the chance to try it out

The complete set of videos can be found at http://vimeo.com/channels/swordappangry racer

The SWORD course slides now online

As part of the JISC-funded SWORD 3 project, I created ‘The SWORD Course’ and presented it during a two hour workshop at the recent Open Repositories 2010 conference in Madrid. The aim of the course was to empower repository managers and repository developers who knew what SWORD was, but who are not currently using it, to be able to go back to their institutions and start using it.
The course, entitled ‘Adding SWORD To Your Repository Armoury’ is made up of 5 modules:
  1. An Introduction to SWORD: Gives an overview of SWORD, the rationale behind its creation, and details of the first three funded SWORD projects
  2. SWORD Use Cases: Provides an introduction to use cases, and examines some of the use cases that SWORD can be used for
  3. How SWORD Works: A high level overview of the SWORD protocol, lightly touching on a few technical details in order to explain how it works
  4. SWORD Clients: The reasons for needing SWORD clients are shown, followed by a tour of some of the current SWORD clients
  5. Create Your Own SWORD Client: An overview of the EasyDeposit SWORD client creation toolkit, including the chance to try it out

The slides from each presentation have now been uploaded to Slideshare with a Creative Commons Attribution NonCommercial Sharealike licence. The workshop was video recorded too, and hopefully this will be posted online some soon too.

vzlomat-vse.ru

About to load test DEF repositories

One of the core aims of the ROAD project is to load test DSpace, EPrints and Fedora repositories to see how they scale when it comes to using them as repositories to archive large amounts of data (in the form of experimental results and metadata). According to ROAR, the largest repositories (housing open access materials) based on these platforms are 191,51059,715 and 85,982 respectively (as of 18th July 208). We want to push them further and see how they fare.

DSpace for instance has in the past suffered from ongoing bad publicity and its own honesty relating to some issues in early versions where they suffered from some instability and slowness under load (user load and content load). One of the downsides of the web (well, of some of it’s users really) is that old reports stay archived on the web, and are read and believed with no consideration of changes that may have taken place in the interim. Many or most of these issues have now been sorted for the sort of scale that used to cause problems (100,000 items to 1/4 million items) and we need to re-evaluate the platform to see where it now breaks. Indeed the following report set out to test DSpace with 1 million items, and found no particular issues:

Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland, USA

I’ve not looked very hard, but there was nothing obvious on the first page of Google results about EPrints scalability, but for Fedora I found this useful page: http://fedora.fiz-karlsruhe.de/docs/Wiki.jsp?page=Main

Our new load testing hardware has arrived. We have a standard spec server to perform the testing, and a beefy little number on which to run the repositories:

  • Two quad-core XEON processors
  • 16GB RAM
  • 6TB raw SATA disk (yes its slow, but cheap!)

We’ve not yet decided what tests we’ll run (get in contact if you have any suggestions!), but we have decided we’ll be using SWORD to perform the test deposits it allows us to throw identical packages at all three repositories which provides us with a level playing field.

We’ve done some initial work which showed some of the repositories fell down as soon as we tried to deposit more than a couple of items concurrently using SWORD, and others fell down at 50 concurrent deposits, but these are small implementation issues which have now been fixed, so full testing can start taking place.

More details will be blogged once we start getting some useful comparative data, however seeing as the report cited above took about 10 days to deposit 1 million items, it may be some weeks before we’re able to report data from meaningful tests on each platform.

These results will inform the next stage of the ROAD project which is to choose one of the repositories upon which to build a repository for the Robot Scientist, so the stakes are high!сайт для начинающих копирайтеров

Tracking repository searches from the inside

One of the many great features of Google Analytics is that it can shown the search terms that visitors to your site have used in search engines. This is a great tool for finding out what brings users to your repository.

Seven months ago Google launched a new feature in Google Analytics that also allows you to track the search terms used by visitors within your repository. Its very easy to set up, all you need to do is enable the feature and set the query parameter used by your repository. Follow these rules from the help pages:

  1. Log in to your Google Analytics account.
  2. Click ‘Edit’ under Website Profiles for the profile you would like to enable Site Search for.
  3. Click ‘Edit’ from the ‘Main Website Profile Information’ section of the Profile Settings page.
  4. Select the ‘Do Track Site Search’ radio button in the Site Search section of the Edit Profile Information page.
  5. Enter your ‘Query Parameter’ in the field provided. Please enter only the word or words that designate an internal query parameter such as “term,search,query”. Sometimes the word is just a letter, such as “s” or “q”. You may provide up to five parameters, separated by a comma.
  6. Select whether or not you want Google Analytics to strip out the query parameter from your URL. Please note that this will only strip out the parameters you provided, and not any other parameters in the same URL. This has the same functionality as excluding URL Query Parameters in your Main Profile – if you strip the query parameters from your Site Search Profile, you don’t have to exclude them again from your Main Profile.

Google Analytics Site SearchFor DSpace you need to set the query parameter to query and with EPrints set it to simple.

To view the results, follow the links shown in the image (Content -> SIte Search) and explore the results. 

Here is some interesting statistics from our repository as an example of the extra stats it can provide:

  • 89% of visits did not make use of a a site search, whilst the remaining 11% did.
  • 39% of search users left the system having performed the search without going any further (e.g. looking at one of the items found by the search)
  • 22% of searchers resulted in search refinements being undertaken by the searcher
  • 50% of searches were performed from the repository homepage, the remaining from item, collection and community pages.
  • Following a search, the average visitor stayed on the site for a further 1 minute and 30 seconds.
  • 8% of searches were performed without the visitor having entered a search term.

сайт