About to load test DEF repositories

One of the core aims of the ROAD project is to load test DSpace, EPrints and Fedora repositories to see how they scale when it comes to using them as repositories to archive large amounts of data (in the form of experimental results and metadata). According to ROAR, the largest repositories (housing open access materials) based on these platforms are 191,51059,715 and 85,982 respectively (as of 18th July 208). We want to push them further and see how they fare.

DSpace for instance has in the past suffered from ongoing bad publicity and its own honesty relating to some issues in early versions where they suffered from some instability and slowness under load (user load and content load). One of the downsides of the web (well, of some of it’s users really) is that old reports stay archived on the web, and are read and believed with no consideration of changes that may have taken place in the interim. Many or most of these issues have now been sorted for the sort of scale that used to cause problems (100,000 items to 1/4 million items) and we need to re-evaluate the platform to see where it now breaks. Indeed the following report set out to test DSpace with 1 million items, and found no particular issues:

Testing the Scalability of a DSpace-based Archive, Dharitri Misra, James Seamans, George R. Thoma, National Library of Medicine, Bethesda, Maryland, USA

I’ve not looked very hard, but there was nothing obvious on the first page of Google results about EPrints scalability, but for Fedora I found this useful page: http://fedora.fiz-karlsruhe.de/docs/Wiki.jsp?page=Main

Our new load testing hardware has arrived. We have a standard spec server to perform the testing, and a beefy little number on which to run the repositories:

We’ve not yet decided what tests we’ll run (get in contact if you have any suggestions!), but we have decided we’ll be using SWORD to perform the test deposits it allows us to throw identical packages at all three repositories which provides us with a level playing field.

We’ve done some initial work which showed some of the repositories fell down as soon as we tried to deposit more than a couple of items concurrently using SWORD, and others fell down at 50 concurrent deposits, but these are small implementation issues which have now been fixed, so full testing can start taking place.

More details will be blogged once we start getting some useful comparative data, however seeing as the report cited above took about 10 days to deposit 1 million items, it may be some weeks before we’re able to report data from meaningful tests on each platform.

These results will inform the next stage of the ROAD project which is to choose one of the repositories upon which to build a repository for the Robot Scientist, so the stakes are high!

Bookmark and Share
Posted on July 18, 2008 at 1:16 pm by Stuart · Permalink
In: Uncategorized · Tagged with: , , , ,

4 Responses

Subscribe to comments via RSS

  1. Written by Les Carr
    on July 21, 2008 at 2:14 pm
    Permalink

    Fantastic! We’ve wanted to do this for ages but haven’t had the funding / resources. I’d be really grateful if you can keep us in the picture and let us know where we can improve.

  2. Written by Tom De Mulder
    on October 1, 2008 at 2:34 pm
    Permalink

    Could I just point out that the only reason our DSpace instance here at Cambridge (http://www.dspace.cam.ac.uk/) doesn’t crumple under the scale and load is that we’re running several local patches to cope with the very bad design of DSpace’s thumbnails, indexing, etc.

    And still we have to restart the servlet container every night, lest we run out of memory (because there’s at least still one memory leak in the code which we haven’t managed to find yet).

  3. Written by stuart
    on October 8, 2008 at 7:34 am
    Permalink

    Tom: Thanks for your comments. Would you be willing to share data relating to the load that you have to cope with, so that we can accurately factor this in to our testing?

  4. Written by Stuart Lewis’ Blog » DSpace at a third of a million items
    on January 19, 2009 at 10:33 am
    Permalink

    [...] As part of the JISC-funded ROAD (Robot-generated Open Access Data) project we are load testing DSpace EPrints and Fedora to see how they cope with holding large numbers of items. For a bit of background, see an earlier blog post: ‘About to load test DEF repositories‘ [...]

Subscribe to comments via RSS

Leave a Reply