Daily Archives: February 10, 2010

DSpace 1.6 – What will be in it for me?

Soon after the release of DSpace 1.5.2 in April 2009 I wrote a blog article ‘DSpace 1.5.2: What’s in it for me?’. The final release of DSpace 1.6 is due shortly, and as the release co-ordinator I thought it might be good to write a similar blog post outlining the key changes and new features that will make it into DSpace 1.6. Soon after 1.5.2 was released we issued a survey asking the DSpace community what three new features they would like to see in DSpace. We shortlisted the responses and there were three clear winners for features that people were asking for. We therefore decided to base the release of DSpace 1.6 around those features. Once those features had been developed and tested, we’d release 1.6.

Those three features were:

Better statistics: The current statistical reporting capabilities of DSpace, whilst sufficient at the time when they were developed, have now become a bit long in the tooth. They are limited to basic reports of metrics such as how many items are in a repository, how many times each item has downloaded (with no filtering out of automatic search engine spiders which often account for over two thirds of the hits), or how many times different search terms have been used.

When we analysed the requirements that users wanted, the biggest requirement was item-level statistics. This feature has now been developed (by @mire) and works in an innovative way that we’d not thought of before they developed it. Rather than storing item views in a log file, or in a database table, they store the item view data in a solr index. What does that mean? Basically they are stored in a search engine index that can be queried very fast and efficiently and in powerful ways.

Out-the-box simple statistical views are available for each item, collection, and community in both the JSPUI and the XMLUI. Information is given about item views, bitstream downloads, and user metadata such as the location the users of the repository came from. The reports are quite basic, but fulfil the requirements we were given. In future versions there will no doubt be work undertaken to make the reports look better and provide more information. The solr index holds a lot of statistical information, we just need to find the best way of displaying it. Along with the new statistics feature comes a script to convert your old dspace.log files into the new format. This means that you can import statistics from old log files, back as far as you have kept them for.

Embargo feature: The lack of embargo functionality in DSpace has been a problem for a long time as universities in particular often need this to either manage open access journal articles that may be under a 6, 9 or 12 month embargo, or for theses that cannot be made public for a certain period. However, when we listened to further input about the requirements, it became obvious that lots of people require subtly different methods of embargoes.

The embargo feature written by Richard Rodgers and Larry Stone (MIT and Harvard respectively) takes this into account. The embargo feature has been written as a framework rather than a fixed implementation. This means that it is possible to write your own embargo rules (in Java classes). Out-the-box is included a simple implementation that should fulfil the needs of many users by allowing an embargo lift date to be set during the submission of the item. The bitstreams (but not item metadata) are locked from public view until that date has passed.

Batch metadata editing: The third of the big three features requested was for a facility to enable batch metadata editing. The users who requested this fell into two camps, and had two different requirements. One was for the ability to edit a lot of metadata easily and in bulk, whilst the other was to perform global changes across the repository (e.g. update all records with the author name ‘Stewart Lewis’ to ‘Stuart Lewis’).

Because the former of these could be used to achieve the later, we chose to implement it. I developed this feature at the University of Auckland where we are already using it regularly, and the XMLUI interface was developed by Kim Shepherd at the Library Consortium of New Zealand. The batch metadata editing tool is based around the assumption that there are better tools that DSpace for editing large amounts of metadata, so rather than trying to make DSpace provide these features, let’s enable the import and export of large amounts of metadata into these tools. This is achieved through the use of CSV (comma separated values) files. CSV files can be opened by most spreadsheet packages such as Microsoft Excel or OpenOffice. These tools have features such as global find and replace, spelling checkers, copy and past etc which all help with the editing of the metadata.

Metadata can be exported for whole collections, whole communities, search results, browse results, or for the whole repository. Once changes have been made, the file is uploaded back into DSpace which detects the changes and displays them to the user. If the user confirms that the changes are correct, then they will be made. The batch metadata editing feature can also be used to enable the creation of new metadata-only records.

Our intention was to ship DSpace 1.6 once these features were completed. However, whilst waiting for this, the DSpace community worked its magic once again, and came up with loads of new features for us to include. This list isn’t exhaustive, but contains some of the other key features that we’ve been able to include:

  • Authority control: A new authority control framework has been included which allows authority sources to be developed for metadata input. For example you may wish to link up author names with a local or national identity database, or link up publications to their ISSNs. In addition to the raw functionality, AJAX lookups are enabled to allow autocomplete functionality to show users matches to the data as they are typing (Larry Stone / Andrea Bollini).
  • Delegated administration: For a long time DSpace has suggested via some options in the user interface that it supports devolved administration of parts of the repository to different users. In some ways this was true, but it was very limited and didn’t include basic options such as delegating the ability to delete items to other users. This has now been included and is fully configurable (Andrea Bollini / Tim Donohue).
  • OpenSearch: An open xml search results system (Richard Rodgers).
  • OAI-PMH harvesting support: This isn’t the ability for DSpace to expose its items via OAI-PMH (which it has done since version 1), but instead is a facility that allows DSpace to harvest other repositories and import their data into DSpace. This could be useful if you want to mirror all or parts of another repository. (Alexey Maslov).
  • Batch imports and exports: These can now make use of zip files instead of directory hierarchies (Stuart Lewis).
  • Command launcher: A new command launcher has been written to replace all of the old DSpace command line scripts. This means that one script can be used to perform all command line functions, and works on all platforms as in the past we’ve not shipped scripts for Windows, only Unix (Stuart Lewis).

In addition to these, there have been literally dozens of other new features, improvements to current features, and bug fixes. We think and think that you’ll be happy! When you start using these features, remember to say a “Thank you” to the two-dozen developers who have worked to bring you these new tools. Also say “Thank you” to the other dozens of users who have provided input to the development of these features, who have tested it, and provided feedback. DSpace really couldn’t exist with the community around it.

Your biggest question is now probably “When will it be released?”. Later this week we hope to release a final ‘release candidate’ which can be used for some last-minute testing. Assuming this all goes well and no show-stopping bugs are found, we plan to release it during the first week of March. All this is tentative, but we’ll keep you updated.