Tag Archives: solr

The collection is dead! Long live the collection!

“The collection is dead! Long live the collection!”  That summarises my current thoughts and feelings about the collection hierarchy structure in DSpace.

When first installed, DSpace shows its need for a community and collection hierarchy, because without at least one community, containing at least one collection, it is impossible to even submit an item.  Therefore, from day one, DSpace repository managers get used to creating collections, giving them names, and creating a hierarchy.  Having created a first community with its first collection, it seems silly to have just a single community and collection.  So the repository managers creates a larger hierarchy – often mimicking the organisational structure of their institution.  Often this hierarchy extends down to departments, research groups, even individuals.

And the repository manager feels good.  There is structure which will give people a sense of belonging, and more importantly, ownership of ‘their collection’.  This will help them gather content for their repository by helping the individual or research group feel that it is their space.

Very often of course it means that a repository has a lot of structure with empty collections.  Also it leads to other problems – what about theses?  We want them to appear in their department’s collection, but also to appear in our central library thesis collection for harvesting by a service such as EThOS.  Or what a about an article that has been co-authored by people across different departments?  Or departments that move around in the organisation.  This collection structure is starting to feel a bit inflexible and restrictive!  What started of as a useful tool that made us feel ‘organised’, now feels the opposite.

Luckily, help is now at hand!  Since version 1.7 DSpace has included the ‘discovery’ module.  This is nothing ground-breaking as such, just a faceted search feature using solr contributed to DSpace by atmire.  The real beauty and power of faceted search comes with their ability to make ‘virtual collections’.  You want a collection of theses published by the faculty of science: sure – just link to the type:thesis + faculty:science facet search result page.  You want a collection for Prof. S Smith?  No problem, link to the author:s. smith faceted search result page.  Want a collection of the ‘recent’ publications of a department?  Just link to the department:foobar + year:2011.

Facets give us the ability to create ‘views’ over the data, based on properties (metadata) of items.  Maybe this is why Google and other search engines are more popular than http://www.dmoz.org/?  We like to have our own collections defined instantly by a search, not be forced to traverse a hierarchy dictated by others.  Of course this does rely on quality (consistent / present / correct) metadata to ensure that items all appear in their virtual collections.  To conclude – sometimes it feels like “The collection is dead!”.  We have better ways to create structure over the repository rather than through it.

But wait!  I cry “Long live the collection!”.

Whilst in our main ‘institutional repository’ at The University of Auckland Library (http://researchspace.auckland.ac.nz/) we have been rationalising the number of collections we have and removing most of organisational structure, we have been making use of DSpace collections in another very useful way…

We’re halfway through a project known internally as the SuperIndex.  Not (just!) because it is ‘super’, but because we are creating a super-index of many of our disparate bibliographic and digital special collections.  We have many databases of collections all over the place, and the SuperIndex project aims to bring them all together into a single system.  This will make management of the collections more consistent, while reducing the number of systems to maintain.  DSpace is our chosen central management system.

This is where the collection structure is becoming very useful. Each collection of items really is a collection.  An item in a database about fisheries in New Zealand will not (or is unlikely to!) appear in any of our other special collections.  This collection structure makes it easier to manage each collection separately.  We can run curation tasks on a collection, or control who has rights to edit a collection.  The management of this repository is much wider than the institutional repository that is just administered by one team.  We will have staff in many areas editing the items.  It also allows us to create individual websites for each collection, each with their own URL structure and branding – the end user does not know that the item they are viewing is actually managed in a DSpace somewhere, and that the DSpace contains thousands of other items in different collections.

So the ‘collection’ is starting to become less useful in the standard institutional repository of research outputs (which is, in a way, a single collection) but is having a new life for us in managing what could be seen as more traditional ‘collections’ in a single DSpace repository.

I’d be interested to hear what you think are the strengths and weaknesses, or reasons for and against forced collection hierarchies in DSpace.

“The collection is dead! Long live the collection!”vaxter-vk.ru